Congrats to @vllm_project & @lmsysorg for releasing MiniMax M3 428B on both the CUDA & ROCm stack on day 0! Mini…
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-13
MiniMax M3 428B launches with day-0 support on both CUDA and ROCm stacks via vllm and lmsysorg, featuring block sparse attention delivering 9x faster prefill than M2.7 and open MXFP8 weights.
Appears in
Extraction
Topics: minimax-m3sparse-attentionmxfp8rocminference-optimization
Claims
- MiniMax M3 supports both CUDA and ROCm inference stacks on day 0 of release via the vllm project and lmsysorg.
- M3's block sparse attention achieves 9x faster prefill compared to M2.7.
- M3 ships with day-0 open MXFP8 weights, enabling quantized inference immediately at launch.
- Inferact released Day-0 EAGLE3 open source alongside the M3 model launch.
Key quotes
MiniMax M3 includes: Block sparse attention which is 9x faster prefill over M2.7, Day 0 open MXFP8 weights, and Furthermore @Inferact released Day-0 EAGLE3 open.