The Information Machine

Even more striking: ~50% of requests already exceed 128k tokens. The driver isn't user prompts getting longer. It's ever…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-05-22

SemiAnalysis reports that approximately 50% of real-world AI inference requests already exceed 128k tokens, with the growth driven by agentic prefill — system prompts, tool definitions, MCP schemas, and prior context — rather than longer user messages.

Open original ↗

Appears in

Extraction

Topics: agentic-workloadscontext-lengthinference-economicstoken-usage

Claims

  • Roughly 50% of AI inference requests already exceed 128k tokens in real-world usage.
  • Token growth is driven by agentic prefill context — system prompts, tool definitions, skills, MCP schemas, and prior turns — not by users typing longer messages.
  • Agentic workloads fundamentally change the economics of AI inference at scale.

Key quotes

The driver isn't user prompts getting longer. It's everything the agent stuffs in before you even type: system prompts, tool definitions, skills, MCP schemas, prior turn context, file contents.