Microsoft shivs OpenAI with three new AI models for speech and images
Summary
Microsoft has released public preview versions of three in-house models: MAI-Transcribe-1 (speech recognition), MAI-Voice-1 (speech synthesis) and MAI-Image-2 (text-to-image). The company says these models power its own products — Copilot, Bing, PowerPoint and Azure Speech — and are now available to developers via Foundry (formerly Azure AI Studio) and Azure Speech.
Key performance claims: MAI-Transcribe-1 offers enterprise-grade accuracy across 25 languages at about 50% lower GPU cost than leading rivals; MAI-Voice-1 can produce 60 seconds of audio in under a second on a single GPU. The move underscores Microsoft’s intent to compete directly with OpenAI despite their ongoing partnership and recent renegotiation.
Key Points
- Microsoft published MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2 in public preview for developers via Foundry/Azure Speech.
- MAI-Transcribe-1: claimed enterprise-grade accuracy for 25 languages and ~50% lower GPU cost versus leading alternatives.
- MAI-Voice-1: very fast speech generation — Microsoft claims 60 seconds of audio in less than a second on one GPU.
- MAI-Image-2: a text-to-image model that joins many similar offerings and will concern digital artists and content creators.
- These models already power Microsoft products (Copilot, Bing, PowerPoint, Azure Speech), signalling they’re production-ready for some use cases.
- Availability is exclusive to Foundry, reflecting Microsoft’s strategy to centralise model distribution and integrate with enterprise tooling.
- The launch highlights Microsoft hedging its OpenAI exposure and moving to own more of the AI stack amid investor pressure and OpenAI’s heavy burn rate.
Context and relevance
This is a notable industry development: Microsoft is shifting from partner/investor to active competitor in key model categories (speech and images). For enterprises and developers that rely on Azure, the MAI family offers a path to lower inference costs and tighter product integration. Strategically, it reduces Microsoft’s dependence on OpenAI while keeping the partnership intact through to 2032.
It also fits broader trends: cloud vendors building proprietary models to control cost, privacy and integration; rising competition among model providers; and consolidation of model access behind vendor platforms (here, Foundry/Azure Speech).
Author style
Punchy: Microsoft isn’t just bankrolling the AI boom any more — it’s building the tools too. This rollout is deliberately bold: performance claims, production use in Copilot, and restricted distribution through Foundry all point to a vendor playing for enterprise control and commercial advantage.
Why should I read this?
Quick and blunt: if you care about where AI infrastructure and costs are headed, this matters. Microsoft going full-steam on speech and image models changes the vendor landscape — developers, product teams and IT managers should know what’s available on Azure, what it might replace, and how it affects bills and integration plans.
Source
Source: https://go.theregister.com/2026/04/02/microsoft_models_homegrown_ai_models/
