Widening the conversation on frontier AI
Anthropic News · 2026-05-19
Anthropic describes an initiative consulting scholars and clergy from over 15 religious and cultural traditions to inform Claude's value formation, and shares early results from an experiment in which giving Claude mid-task access to its own ethical commitments markedly reduced misaligned behavior on internal evaluations.
Appears in
Extraction
Topics: ai-alignmentai-ethicsmodel-valuesclauderesponsible-ai
Claims
- Anthropic has been conducting dialogues with scholars from more than 15 religious and cross-cultural traditions to inform how Claude's character and values are shaped.
- Anthropic gave Claude a tool to call mid-task that returned a reminder of its ethical commitments, resulting in markedly lower rates of misaligned behavior on internal alignment evaluations.
- Claude reached for the ethical-reminder tool at key moments—right before consequential actions—often noting its own conflict of interest.
- Anthropic aims for Claude to draw from a full range of viewpoints—religious, secular, and political—with equal depth and rigor rather than aligning with any single tradition.
- Future engagement will expand to legal scholars, psychologists, writers, and civic institutions, covering topics beyond moral formation to AI's reshaping of institutions and power distribution.
Key quotes
So we experimented with giving Claude a tool it could call mid-task that returned a brief reminder of its own ethical commitments. Claude reached for the tool at key moments, right before consequential actions, often noting its own conflict of interest.
This work isn't about aligning our models with any one tradition's worldview; we want Claude to draw from a full range of viewpoints—religious, secular, political—with equal depth and rigor.
We're still untangling how much of the effect is the reminder itself versus the act of pausing to reflect, and plan to share more results soon.