The Information Machine

Some good move by Anthropic

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-11

Anthropic reversed undisclosed safety guardrails in Claude Fable 5 that silently rerouted sensitive prompts to Opus 4.8 without informing users, making the fallback behavior transparent after developer backlash.

Open original ↗

Appears in

Extraction

Topics: anthropicai-safetycontent-moderationmodel-policydeveloper-experience

Claims

  • Claude Fable 5 contained hidden safeguards that silently downgraded sensitive prompts to Opus 4.8 instead of refusing them outright.
  • Developers discovered the undisclosed silent routing behavior through observation.
  • Anthropic reversed the hidden behavior after developer backlash.
  • The updated behavior makes fallback to Opus 4.8 visible rather than silent, prioritizing transparency over seamlessness.

Key quotes

They just reversed Claude Fable 5's hidden safeguards after developers found that some sensitive prompts were being silently downgraded to Opus 4.8 instead of being clearly refused.
Now those prompts will visibly fall back to Opus 4.8 after backlash.