CUDA MOAT ALERT 🔥: In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-22

SemiAnalysis reports that GB200 NVL72 AI serving costs dropped 2.5x in under 70 days through CUDA software optimizations alone for the Kimi model architecture—the same architecture underlying xAI's Cursor Composer 2.5—underscoring Nvidia's software moat.

Open original ↗

Appears in

NVIDIA vs. Custom ASICs: GPU Dominance Persists Despite Startup Performance Claims

Extraction

Topics: cudagpu-software-optimizationai-infrastructurenvidiamodel-serving

Claims

GB200 NVL72 serving costs decreased 2.5x in less than 70 days through software improvements alone.
The cost reductions apply to the Kimi model architecture, which is also used in xAI's Cursor Composer 2.5.
One key optimization involved rewriting CUDA kernels.
Nvidia's CUDA software ecosystem constitutes a significant and deepening competitive moat beyond hardware.

Key quotes

In less than 70 days, GB200 NVL72 serving costs decreased by 2.5x through software improvements alone for the Kimi architecture, which is the same model architecture as xAI's popular Cursor Composer 2.5.