Even more striking: ~50% of requests already exceed 128k tokens. The driver isn't user prompts getting longer. It's ever…
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-05-22
SemiAnalysis reports that approximately 50% of real-world AI inference requests already exceed 128k tokens, with the growth driven by agentic prefill — system prompts, tool definitions, MCP schemas, and prior context — rather than longer user messages.
Appears in
Extraction
Topics: agentic-workloadscontext-lengthinference-economicstoken-usage
Claims
- Roughly 50% of AI inference requests already exceed 128k tokens in real-world usage.
- Token growth is driven by agentic prefill context — system prompts, tool definitions, skills, MCP schemas, and prior turns — not by users typing longer messages.
- Agentic workloads fundamentally change the economics of AI inference at scale.
Key quotes
The driver isn't user prompts getting longer. It's everything the agent stuffs in before you even type: system prompts, tool definitions, skills, MCP schemas, prior turn context, file contents.