The Information Machine

Claude Fable 5 and Mythos 5: Capabilities

Zvi's AI Roundups · Zvi Mowshowitz · 2026-06-19

Anthropic's Claude Fable 5, a consumer-facing version of its Mythos-class frontier model, launches as state-of-the-art on nearly every benchmark before the U.S. government orders it taken offline three days later due to a jailbreak, while users widely praise its coding and reasoning abilities but criticize overly aggressive content classifiers that route benign queries to the weaker Opus 4.8 model.

Open original ↗

Appears in

Extraction

Topics: frontier-ai-modelsai-safety-classifiersai-benchmarksagentic-codinggovernment-ai-regulation

Claims

  • Claude Fable 5 achieved state-of-the-art scores on nearly all tested benchmarks, including GPQA Diamond at 94%, RiemannBench at 55%, and Cursor Bench at 72.9%, outperforming all prior publicly available models.
  • The U.S. government ordered Anthropic to make Fable 5 unavailable three days after launch after a jailbreak was demonstrated, citing national security concerns.
  • Fable 5's content classifiers, intended to block dangerous outputs in biology, cybersecurity, and machine learning, trigger excessively on benign queries — including the word 'cancer' and questions about government employment — routing users to Opus 4.8 instead.
  • Fable 5 is priced at $10/$50 per million input/output tokens, double the cost of Claude Opus, and requires users to accept a 30-day data retention policy.
  • Expert users across coding, math, scientific research, and creative writing describe Fable 5 as the largest qualitative capability jump since Claude Opus 4.5, with agentic tasks completed in hours that previously took days or were impossible.

Key quotes

Fable 5's capabilities exceed those of any model we've ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas.
There's nothing in claude code's prompting telling the model to do that, it's just part of its personality. It really has this 'big model smell' that I haven't felt before.
2 hours later, it landed a 1770% speedup in one case, 100%+ in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude.