CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone…
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-22
SemiAnalysis reports that GB200 NVL72 AI serving costs dropped 2.5x in under 70 days through CUDA software optimizations alone for the Kimi model architecture—the same architecture underlying xAI's Cursor Composer 2.5—underscoring Nvidia's software moat.
Appears in
Extraction
Topics: cudagpu-software-optimizationai-infrastructurenvidiamodel-serving
Claims
- GB200 NVL72 serving costs decreased 2.5x in less than 70 days through software improvements alone.
- The cost reductions apply to the Kimi model architecture, which is also used in xAI's Cursor Composer 2.5.
- One key optimization involved rewriting CUDA kernels.
- Nvidia's CUDA software ecosystem constitutes a significant and deepening competitive moat beyond hardware.
Key quotes
In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the same model architecture as xAI's popular Cursor Composer 2.5.