The Information Machine

Microsoft's new MAI models

Simon Willison · Simon Willison · 2026-06-02

Microsoft unveiled MAI-Thinking-1, a 1-trillion-parameter MoE reasoning model with 35B active parameters claimed to outperform Sonnet 4.6, and MAI-Code-1-Flash (137B/5B active) for GitHub Copilot, though both train on web crawls with the same licensing problems as other major LLMs.

Open original ↗

Appears in

Extraction

Topics: llm-releasemicrosoft-aimodel-training-datamixture-of-experts

Claims

  • MAI-Thinking-1 is a 1-trillion-parameter MoE model with 35B active parameters that Microsoft claims outperforms Sonnet 4.6 in blind human evaluations.
  • MAI-Code-1-Flash is a 137B/5B active parameter model purpose-built for GitHub Copilot and VS Code.
  • Both models are trained on web crawls including Common Crawl, contrary to Microsoft's initial framing suggesting clean commercial licensing.
  • The MAI-Thinking-1 training corpus involved crawling 1.2 trillion web pages, filtered down to 794 billion pages.

Key quotes

I would very much like to learn more about this 'appropriately licensed' data! Could these be the first generally useful code-specialist models that didn't train on an unlicensed dump of the web? (Update: the answer is no, see note below.)
We trained [MAI-Thinking-1] from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models.
I did not cover this one at all well, which is somewhat ironic since I was at the Microsoft Build conference when I wrote this up!