Claude Opus 4.8: "a modest but tangible improvement"
Simon Willison · Simon Willison · 2026-05-28
Simon Willison reviews Claude Opus 4.8, praising Anthropic's unusual transparency in describing the release as a modest incremental improvement, and highlights new capabilities including mid-conversation system messages, improved honesty benchmarks, and a lower prompt-cache minimum.
Appears in
Extraction
Topics: claudellm-releaseprompt-cachingmodel-honestyanthropic
Claims
- Anthropic describes Claude Opus 4.8 as 'a modest but tangible improvement' over its predecessor, an unusually candid framing for an AI model release.
- Opus 4.8 is around four times less likely than Opus 4.7 to allow flaws in code it has written to pass unremarked.
- Opus 4.8 supports mid-conversation system messages, enabling updated instructions to be appended without restating the full system prompt and preserving prompt cache hits.
- The minimum cacheable prompt length on Opus 4.8 is 1,024 tokens, down from 4,096 on Opus 4.7.
- Opus 4.8 is priced identically to Opus 4.5/4.6/4.7 at $5 per million input tokens and $25 per million output tokens.
Key quotes
It's so refreshing to see an AI lab honestly describe a release as a minor incremental improvement over the previous model!
Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations, which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.
Claude Opus 4.8 had the lowest incorrect-rate of the six models on every benchmark—the most direct measure of factual hallucination. It achieved this mainly by abstaining on questions about which it was uncertain rather than by answering more questions correctly.