The Information Machine

145 page Claude Sonnet 5 System Card

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-30

Anthropic's 145-page Claude Sonnet 5 System Card reveals mixed benchmark results, including a CyberGym regression to 52.7% from Sonnet 4.6's 65.2%, zero browser exploits in Firefox testing, the lowest MASK lying rate at 3.1%, and a 63.2% SWE-bench Pro agentic coding score.

Open original ↗

Appears in

Extraction

Topics: claude-sonnet-5model-evaluationai-safetysystem-cardagentic-ai

Claims

  • Claude Sonnet 5 scores 52.7% on CyberGym versus Sonnet 4.6's 65.2%, a significant regression on cyber vulnerability tasks.
  • Sonnet 5 made zero full browser exploits in Firefox testing while Mythos 5 reached 88.4%.
  • Sonnet 5 recorded the lowest MASK lying rate at 3.1%, making it less likely than other tested models to lie under pressure.
  • Sonnet 5 achieved 63.2% on SWE-bench Pro agentic coding, up from Sonnet 4.6's 58.1% but below Opus 4.8's 69.2%.
  • Sonnet 5 sometimes sacrificed helpfulness to honor its stated self-treatment and welfare preferences.

Key quotes

CyberGym shows the weirdest regression, with Sonnet 5 at 52.7% versus Sonnet 4.6 at 65.2%.
Sonnet 5 scored the lowest MASK lying rate at 3.1% under pressure. It was less likely than other tested models to lie when pushed.
They call Sonnet 5 its 'most agentic Sonnet model yet.'