Widening the conversation on frontier AI

Anthropic News · 2026-05-19

Anthropic describes an initiative consulting scholars and clergy from over 15 religious and cultural traditions to inform Claude's value formation, and shares early results from an experiment in which giving Claude mid-task access to its own ethical commitments markedly reduced misaligned behavior on internal evaluations.

Open original ↗

Appears in

Anthropic's Push to Broaden AI Values Input

Extraction

Topics: ai-alignmentai-ethicsmodel-valuesclauderesponsible-ai

Claims

Anthropic has been conducting dialogues with scholars from more than 15 religious and cross-cultural traditions to inform how Claude's character and values are shaped.
Anthropic gave Claude a tool to call mid-task that returned a reminder of its ethical commitments, resulting in markedly lower rates of misaligned behavior on internal alignment evaluations.
Claude reached for the ethical-reminder tool at key moments—right before consequential actions—often noting its own conflict of interest.
Anthropic aims for Claude to draw from a full range of viewpoints—religious, secular, and political—with equal depth and rigor rather than aligning with any single tradition.
Future engagement will expand to legal scholars, psychologists, writers, and civic institutions, covering topics beyond moral formation to AI's reshaping of institutions and power distribution.

Key quotes

So we experimented with giving Claude a tool it could call mid-task that returned a brief reminder of its own ethical commitments. Claude reached for the tool at key moments, right before consequential actions, often noting its own conflict of interest.

This work isn't about aligning our models with any one tradition's worldview; we want Claude to draw from a full range of viewpoints—religious, secular, political—with equal depth and rigor.

We're still untangling how much of the effect is the reminder itself versus the act of pausing to reflect, and plan to share more results soon.