Microsoft has announced Maia 200, a new AI accelerator designed specifically for large scale inference workloads. Maia 200 is built to improve the performance and cost efficiency of AI token generation across Microsoft’s cloud infrastructure.
Maia 200 is manufactured using a 3 nanometer process by TSMC and features native FP8 and FP4 tensor cores. The chip integrates 216GB of HBM3e memory with high bandwidth data movement and a large on chip SRAM to support large AI models. The design focuses on keeping models highly utilized while reducing inference cost per token.
The accelerator will support multiple AI services including Microsoft Foundry and Microsoft 365 Copilot and will be used to run advanced models such as GPT 5.2 from OpenAI. Maia 200 will also support synthetic data generation and reinforcement learning for Microsoft’s internal AI development.
Maia 200 is currently deployed in Microsoft’s US Central datacenter region with additional regions planned. The accelerator integrates natively with Microsoft Azure and will be supported by a Maia SDK including PyTorch integration and model optimization tools.
Leave a comment