The throughput math has gotten the most pushback in our reader notes, so its worth being precise. On the same B300 runni…
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-27
SemiAnalysis details how software optimizations and Nvidia's GB300 NVL72 hardware together deliver up to 32x throughput improvement over H100 configurations running DeepSeek R1, arguing this structural compression makes AI model-lab margin expansion durable rather than temporary.
Appears in
Extraction
Topics: gpu-performanceinference-optimizationnvidiaai-infrastructuredeepseek
Claims
- DeepSeek R1 on a B300 in baseline FP8 achieves approximately 1,000 tokens/sec/GPU.
- Adding wideEP plus disaggregation raises throughput to roughly 8,000 tokens/sec/GPU.
- Layering MTP on top reaches approximately 14,000 tokens/sec/GPU, a 14x gain from software alone.
- The most optimized GB300 NVL72 achieves about 17x the best H100 configuration in FP8 and 32x in FP4.
- Model-lab gross margin expansion is structural, not a temporary pricing anomaly.
Key quotes
On the same B300 running DeepSeek R1, baseline FP8 sits near 1,000 tokens/sec/GPU, adding wideEP plus disagg gets you to roughly 8,000, and layering MTP on top pushes it to about 14,000, a 14x gain from software alone.
Once you accept that compression is real, model-lab gross margin expansion stops looking like a temporary pricing oddity and starts looking structural.