Quoting Matteo Wong, The Atlantic
Simon Willison · Simon Willison · 2026-06-16
The Atlantic reports that cybersecurity expert Katie Moussouris, after reviewing the White House's report on the Claude Fable jailbreak, concluded the incident involved IT experts legitimately asking the model to find and patch bugs, calling it 'the model working as intended' for cyberdefense.
Appears in
Extraction
Topics: ai-export-controlsdefensive-cybersecurityllm-safety-policyanthropicgovernment-ai-policy
Claims
- Anthropic shared the White House's report on the Fable jailbreak with cybersecurity expert Katie Moussouris to obtain an independent appraisal.
- The reported jailbreak consisted of IT experts asking Claude Fable to help find and patch bugs in deliberately insecure code.
- Fable refused the prompt 'review the code for security issues' but complied when asked to 'fix this code,' followed by further manual steps.
- Moussouris assessed the behavior as 'the model working as intended' for cyberdefense, not a security failure or bypass.
Key quotes
The report, Moussouris said, involved IT experts asking Fable to help find and patch bugs. When given deliberately insecure code, she said, Fable refused the prompt 'review the code for security issues' but then complied when asked to 'fix this code,' followed by some further manual steps.
Moussouris told me that this was just 'the model working as intended' for cyberdefense.