Microsoft shivs OpenAI with three new AI models for speech and images

Summary

Microsoft has released public preview versions of three in-house models: MAI-Transcribe-1 (speech recognition), MAI-Voice-1 (speech synthesis) and MAI-Image-2 (text-to-image). The company says these models power its own products — Copilot, Bing, PowerPoint and Azure Speech — and are now available to developers via Foundry (formerly Azure AI Studio) and Azure Speech.

Key performance claims: MAI-Transcribe-1 offers enterprise-grade accuracy across 25 languages at about 50% lower GPU cost than leading rivals; MAI-Voice-1 can produce 60 seconds of audio in under a second on a single GPU. The move underscores Microsoft’s intent to compete directly with OpenAI despite their ongoing partnership and recent renegotiation.

Key Points

Microsoft published MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2 in public preview for developers via Foundry/Azure Speech.
MAI-Transcribe-1: claimed enterprise-grade accuracy for 25 languages and ~50% lower GPU cost versus leading alternatives.
MAI-Voice-1: very fast speech generation — Microsoft claims 60 seconds of audio in less than a second on one GPU.
MAI-Image-2: a text-to-image model that joins many similar offerings and will concern digital artists and content creators.
These models already power Microsoft products (Copilot, Bing, PowerPoint, Azure Speech), signalling they’re production-ready for some use cases.
Availability is exclusive to Foundry, reflecting Microsoft’s strategy to centralise model distribution and integrate with enterprise tooling.
The launch highlights Microsoft hedging its OpenAI exposure and moving to own more of the AI stack amid investor pressure and OpenAI’s heavy burn rate.

Context and relevance

This is a notable industry development: Microsoft is shifting from partner/investor to active competitor in key model categories (speech and images). For enterprises and developers that rely on Azure, the MAI family offers a path to lower inference costs and tighter product integration. Strategically, it reduces Microsoft’s dependence on OpenAI while keeping the partnership intact through to 2032.

It also fits broader trends: cloud vendors building proprietary models to control cost, privacy and integration; rising competition among model providers; and consolidation of model access behind vendor platforms (here, Foundry/Azure Speech).

Author style

Punchy: Microsoft isn’t just bankrolling the AI boom any more — it’s building the tools too. This rollout is deliberately bold: performance claims, production use in Copilot, and restricted distribution through Foundry all point to a vendor playing for enterprise control and commercial advantage.

Why should I read this?

Quick and blunt: if you care about where AI infrastructure and costs are headed, this matters. Microsoft going full-steam on speech and image models changes the vendor landscape — developers, product teams and IT managers should know what’s available on Azure, what it might replace, and how it affects bills and integration plans.

Source

Source: https://go.theregister.com/2026/04/02/microsoft_models_homegrown_ai_models/