Microsoft AI (MAI)'s First in-house Models

Sascha Corti

29 Aug 2025 • 2 min read

Microsoft AI (MAI) has recently announced two pivotal in-house models: the highly expressive MAI-Voice-1 for advanced speech generation, and the versatile MAI-1-preview model for instruction-following and general-purpose AI tasks. These foundational technologies chart a clear path for MAI’s strategy to empower users and developers with state-of-the-art generative AI solutions, purpose-built for real-world interaction and at-scale deployment.

MAI-Voice-1: High-Fidelity Speech Generation

MAI-Voice-1 marks a significant leap forward in the development of natural, expressive speech AI. Built from the ground up to serve as the next-generation voice interface for AI-powered experiences, this model achieves extremely low latency, generating up to a full minute of audio in under a second using a single GPU. This efficiency not only enables rapid prototyping but also supports scalable, user-facing applications without the bottlenecks of heavy compute costs.

MAI-Voice-1 is engineered for versatility across both single and multi-speaker scenarios, delivering high-fidelity audio with nuanced expressiveness. Its production deployment already powers features like Copilot Daily and Podcasts, and is available for user exploration through Copilot Labs. Showcase demos include interactive storytelling and custom guided meditations, illustrating practical use cases for creators and consumers alike.

MAI-1-preview: Scalable Foundation Model

Running in public test on LMArena and selectively available through API to trusted testers, MAI-1-preview is MAI’s first large-scale foundation model trained entirely in-house. Designed as a mixture-of-experts architecture, MAI-1-preview was pre-trained and refined on approximately 15,000 NVIDIA H100 GPUs—underscoring Microsoft’s commitment to high-performance infrastructure and next-generation AI capabilities.

MAI-1-preview excels at understanding and following complex user instructions, making it ideally suited for powering Copilot’s text-based features and a variety of everyday user interactions. Its architecture and ongoing model improvements benefit from constant feedback cycles, with the explicit goal of maximizing helpfulness, reliability, and safe deployment at scale.

Infrastructure and Future Roadmap

Both models are underpinned by MAI’s significant investments in AI compute, including the operational launch of a next-generation GB200 GPU cluster. Looking forward, Microsoft plans to orchestrate a range of specialized models—serving diverse user intents and industry-specific use cases—further unlocking enterprise and consumer value.

MAI also continues to foster a collaborative ecosystem, integrating best-in-class models from its own teams, partner organizations, and the open-source community in a modular, adaptive framework for Copilot and beyond.

Get Involved and Explore

Developers and AI enthusiasts can already interact with MAI-Voice-1 in Copilot Daily, Podcasts, and Copilot Labs. Those interested in evaluating or integrating MAI-1-preview can apply for API access, or participate via community platforms like LMArena. Both models exemplify Microsoft AI’s mission to provide responsible, reliable, and highly personalized AI that serves humanity’s evolving needs.

Microsoft’s ongoing roadmap for these foundational models signals a commitment not just to technical advancement, but to creating deeply trusted AI platforms that enable new forms of creativity, productivity, and societal impact.