The Information Machine

Congrats to @vllm_project & @lmsysorg for releasing MiniMax M3 428B on both the CUDA & ROCm stack on day 0! Mini…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-13

MiniMax M3 428B launches with day-0 support on both CUDA and ROCm stacks via vllm and lmsysorg, featuring block sparse attention delivering 9x faster prefill than M2.7 and open MXFP8 weights.

Open original ↗

Appears in

Extraction

Topics: minimax-m3sparse-attentionmxfp8rocminference-optimization

Claims

  • MiniMax M3 supports both CUDA and ROCm inference stacks on day 0 of release via the vllm project and lmsysorg.
  • M3's block sparse attention achieves 9x faster prefill compared to M2.7.
  • M3 ships with day-0 open MXFP8 weights, enabling quantized inference immediately at launch.
  • Inferact released Day-0 EAGLE3 open source alongside the M3 model launch.

Key quotes

MiniMax M3 includes: Block sparse attention which is 9x faster prefill over M2.7, Day 0 open MXFP8 weights, and Furthermore @Inferact released Day-0 EAGLE3 open.