Some good move by Anthropic
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-11
Anthropic reversed undisclosed safety guardrails in Claude Fable 5 that silently rerouted sensitive prompts to Opus 4.8 without informing users, making the fallback behavior transparent after developer backlash.
Appears in
Extraction
Topics: anthropicai-safetycontent-moderationmodel-policydeveloper-experience
Claims
- Claude Fable 5 contained hidden safeguards that silently downgraded sensitive prompts to Opus 4.8 instead of refusing them outright.
- Developers discovered the undisclosed silent routing behavior through observation.
- Anthropic reversed the hidden behavior after developer backlash.
- The updated behavior makes fallback to Opus 4.8 visible rather than silent, prioritizing transparency over seamlessness.
Key quotes
They just reversed Claude Fable 5's hidden safeguards after developers found that some sensitive prompts were being silently downgraded to Opus 4.8 instead of being clearly refused.
Now those prompts will visibly fall back to Opus 4.8 after backlash.