It is first helpful to understand how the underlying models work. Opus, GPT 5.5, etc (the models) are all stateless -- t…
SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-07-03
SemiAnalysis explains that LLMs like Opus and GPT 5.5 are fundamentally stateless, requiring harnesses to reconstruct the full conversation on every request, which makes prompt caching critically important for cost and latency.
Appears in
Extraction
Topics: llm-architecturestateless-modelsprompt-cachingagentic-coding
Claims
- All major LLMs, including Opus and GPT 5.5, are stateless and retain no memory between requests.
- On every user turn, the harness must rebuild and resend the entire conversation history to the model.
- Prompt caching is important specifically because of this stateless architecture, avoiding redundant recomputation of repeated context.
Key quotes
each time you press 'enter' at the prompt factory, the harness rebuilds the entire conversation and ships it (this is why prompt caching is so important!). There is no memory sitting on the server.