"They screwed us": Personality clashes sent Anthropic's models offline
Simon Willison · Simon Willison · 2026-06-15
Simon Willison summarizes an Axios report revealing that personality clashes between Anthropic and US government officials led to export controls banning Anthropic's Fable 5 and Mythos 5 models, with Anthropic's red team lead, safety head, and researcher Nicholas Carlini meeting the Commerce Department to negotiate restoration.
Appears in
Extraction
Topics: anthropicai-export-controlsjailbreakingai-policyclaude-models
Claims
- The US government imposed export controls on Anthropic's Fable 5 and Mythos 5 models following a jailbreak incident, compounded by personality clashes between Anthropic staff and administration officials.
- Anthropic's Logan Graham (Frontier Red Team lead), Dave Orr (Head of Safeguards), and Nicholas Carlini were meeting with the Commerce Department on June 15, 2026 to negotiate restoration of the banned models.
- Anthropic classifies the triggering incident as 'a potential narrow, non-universal jailbreak,' asserting no universal jailbreak against Claude Mythos has been found.
- Resolution may hinge more on relationship repair than technical fixes, with an administration source indicating it may come down to an attitude adjustment so that 'everyone feels safe, secure and happy.'
- Anthropic's Constitutional Classifiers work is potentially relevant to defending against the class of universal adversarial attacks described in the 2023 'Universal and Transferable Adversarial Attacks on Aligned Language Models' paper.
Key quotes
One option is to make sure Anthropic's models can't be jailbroken — though perfect jailbreak resistance may be impossible.
it may simply come down to an attitude fix where, instead of feeling dismissed, 'everyone feels safe, secure and happy.'
Anthropic... classifying the jailbreak that triggered the US government response as 'a potential narrow, non-universal jailbreak'.