Inference keeps getting carved up, and every cut makes intelligence cheaper.
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-07-01
SemiAnalysis argues at MLSys 2026 that LLM inference is being progressively partitioned by phase, layer, and time to recover wasted compute utilization, lowering token costs in a way that expands rather than reduces demand for AI.
Appears in
Extraction
Topics: inference-optimizationllm-economicsmlsys-2026
Claims
- LLM inference has been sequentially optimized through three distinct splitting strategies: by phase, by layer, and by time.
- Each split recovers previously wasted hardware utilization.
- Lower cost per token historically grows demand for AI rather than shrinking it.
- This demand-elasticity dynamic was the central story at MLSys 2026.
Key quotes
Each split recovers wasted utilization. Recovered utilization lowers the cost per token. We think cheaper tokens don't shrink demand, they grow it.
That was the real story of MLSys 2026.