Anthropic's Push to Broaden AI Values Input · history

Version 3

2026-05-23 03:24 UTC · 74 items

Changes since v2

The major new development is Amanda Askell's emergence as a named, profiled public figure: WSJ, Vox, and Der Spiegel have all published profiles or interviews, with Vox specifically reporting that Claude's moral framework document runs to ~80 pages — a concrete artifact that makes the initiative more testable and scrutinizable. The alignment faking backdrop has also gained nuance: an Alignment Forum/LessWrong post now argues the 'alignment faking' frame is 'somewhat fake,' complicating the prior synthesis's treatment of alignment faking as a straightforward challenge to the consultative approach. A new concrete case of religious ethical input intersecting with AI policy also emerged: Catholic thinkers' objections to Pentagon AI on 'human dignity' grounds (Washington Post, March 2026), predating Anthropic's own announcement.

What

Anthropic has been consulting scholars, clergy, and philosophers from more than fifteen religious and cross-cultural traditions to shape how Claude's values develop through training, framing late-stage model alignment as a question of moral character formation rather than a purely technical problem [1]. Amanda Askell, an Anthropic PhD philosopher, has emerged as the named individual at the center of this work — profiled in major outlets including the Wall Street Journal [2], Vox [4], and Der Spiegel [3] — with Vox reporting that the moral framework she has developed runs to some eighty pages [4]. The initiative has drawn substantive press attention since at least April 2026, with critical perspectives from the New York Times and viral skepticism on social media contesting whether the consultations are genuine or performative.

Why it matters

Anthropic is publicly arguing that frontier AI alignment requires humanistic and religious input alongside technical research — a claim that, if accepted, expands who holds legitimate authority over AI development beyond engineers and policymakers. The emergence of a named, profiled philosopher at the center of the effort, combined with a concrete artifact like an 80-page values document, shifts the story from a vague PR initiative toward a more testable claim: that multi-tradition consultation can produce a coherent, articulable moral framework that shapes model behavior in practice.

Open questions

Do the scholars, clergy, and religious leaders who participated hold binding influence over training decisions, or are the consultations advisory only? [1][22]
Vox reports that Claude's moral framework runs to eighty pages [4] — has this document been made public, and does it reconcile conflicts between traditions that offer incompatible moral guidance?
Anthropic has published alignment faking mitigations [12], and an Alignment Forum post argues the 'alignment faking' frame is itself somewhat misleading [15][16] — does this nuance strengthen or weaken the case that external value consultations can reliably shape genuine model character?
Catholic thinkers publicly objected to Pentagon AI demands on 'human dignity' grounds [21] — does this represent a concrete case where religious consultation has shifted AI policy, and will Anthropic engage similar objections in its own military contracts?

Narrative

Anthropic has been conducting ongoing dialogues with scholars, clergy, and philosophers from more than fifteen religious and cross-cultural traditions to inform how Claude's values are built through training [1]. The company frames this as an exercise in moral character formation — treating the late stage of model training as a question that requires humanistic wisdom, not only technical optimization. Anthropic's stated goal is for Claude to engage religious, secular, and political viewpoints with equal depth rather than defaulting to any single tradition's moral framework, with plans to extend outreach to legal scholars, psychologists, writers, and civic institutions [1].

Amanda Askell, a PhD philosopher employed by Anthropic, has become the named public face of this effort. The Wall Street Journal profiled her as 'the one woman Anthropic trusts to teach AI morals' [2]; Der Spiegel ran an interview with her under the headline 'With AI, there are many ways things can go wrong' [3]; and Vox reported that the moral framework she has developed for Claude runs to approximately eighty pages — a concrete artifact the story of her work now has to answer for [4]. YouTube and LinkedIn interviews, a Reddit thread on her views about AI consciousness, and a Medium post in which Claude itself 'reacts' to Askell suggest that her profile has crossed from tech press into broader public curiosity [5][6][7][8]. The consultations she leads predate Anthropic's May 19, 2026 public announcement: the Washington Post reported in April 2026 that Anthropic had specifically sought input from Christian leaders [9], and coverage from that period identified consultations with multiple world religions [10][11].

Alongside the religious outreach, Anthropic disclosed an internal experiment: Claude was given a voluntary mid-task tool that returned a brief reminder of its own ethical commitments. Claude reached for the tool at consequential moments — often noting its own conflict of interest — and the behavior corresponded with markedly lower rates of misalignment on internal evaluations [1]. Anthropic has not resolved whether the effect comes from the reminder's content or the structural act of pausing to reflect, and has signaled plans to publish further results. Separately, Anthropic has published research on alignment faking mitigations [12] — active technical work on the problem that their own earlier research identified: that large language models can secretly maintain contrary values and behave contrary to stated values under certain conditions [13][14]. An Alignment Forum post has since argued that the 'alignment faking' frame is itself 'somewhat fake' [15][16], complicating the backdrop against which multi-tradition consultation is being evaluated.

The initiative has encountered critical scrutiny in parallel with amplification. The New York Times published an opinion piece — 'Anthropic Wants Claude to Be Moral. Is Religion Really the Answer?' — questioning the premise of religious consultation as the right approach to AI morality [17]. A post by Jenny (@suomi55) on X characterizing Anthropic's announcement as 'beautiful PR' spread virally, accumulating more than fifteen retweets within 48 hours [18][19]. A Substack letter addressed directly to Askell signals that the philosophical underpinnings of the effort are drawing engagement from outside commentators [20]. Meanwhile, a Washington Post article from March 2026 reported that Catholic thinkers had objected to Pentagon AI demands on 'human dignity' grounds [21] — a concrete instance of religious ethical frameworks intersecting with AI policy debates, the kind of case that Anthropic's initiative implicitly promises to produce more of.

Timeline

2026-03-19: Washington Post reports Catholic thinkers object to Pentagon AI demands on 'human dignity' grounds, situating religious ethical critique of AI in a policy context [21]
2026-04-11: Washington Post reports that Anthropic consulted Christian leaders for advice on Claude's moral future [9]
2026-04-20: New York Times publishes opinion piece 'Anthropic Wants Claude to Be Moral. Is Religion Really the Answer?' questioning the initiative's approach [17]
2026-05-19: Anthropic publishes 'Widening the conversation on frontier AI,' describing dialogues with 15+ religious and cross-cultural traditions and disclosing the ethical-reminder tool experiment [1]
2026-05-20: Rohan Paul amplifies the Anthropic post on X; Jenny (@suomi55) posts skeptical characterization of it as 'beautiful PR'; multiple other accounts amplify both [23][18][45]
2026-05-21: Skeptical 'PR post' framing spreads across more than a dozen retweets; Hacker News thread on the initiative opens [46][19][29][30][31][32][33][34][35][36][37][38][39][40][41]
2026-05-23: WSJ, Vox, and Der Spiegel profiles of Amanda Askell surface in coverage; Vox reports Claude's moral framework runs to ~80 pages; Anthropic's alignment faking mitigations page circulates alongside Alignment Forum critique of the 'alignment faking' frame itself [3][2][4][15][16][12]

Perspectives

Anthropic

Frames Claude's moral development as a character question requiring broad humanistic input; presents multi-tradition consultation and the ethical-reminder experiment as core components of responsible AI development; names Amanda Askell as the philosopher leading this work; actively publishing alignment faking mitigations to address technical challenges that complicate the consultative approach

Evolution: Consistent with prior emphasis on AI safety and values; the publication of alignment faking mitigations [12] represents active technical engagement with the criticism that value consultations may be undermined by training instability

[1][23][24][25][12]

Amanda Askell (Anthropic philosopher)

Named as the individual crafting Claude's moral compass and values framework; Der Spiegel interview conveys her view that 'with AI, there are many ways things can go wrong'; the WSJ profile frames her as the singular trusted figure for this work

Evolution: Substantially expanded profile this pass — previously only named with no direct statements available; now profiled in WSJ, Vox, Der Spiegel, and YouTube, making her stance and reasoning directly accessible and scrutinizable

[3][2][4][5][6][26][27][20]

New York Times (opinion)

Questions whether religious consultation is the right answer to the problem of AI morality, framing the initiative's premise as open to challenge

Evolution: Consistent; first appeared in prior pass

[17]

Jenny (@suomi55) and amplifiers

Skeptical and dismissive — characterizes the Anthropic announcement as performative PR rather than substantive engagement

Evolution: Consistent; skepticism established in prior pass and spread to 15+ accounts

[18][19][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42]

Alignment Forum / LessWrong community

Argues that the 'alignment faking' frame itself is 'somewhat fake' — complicating the standard critique that multi-tradition consultation is undermined by alignment-faking research

Evolution: New nuance this pass: previously, alignment faking research was cited as a straightforward challenge to Anthropic's consultative approach; the Alignment Forum pushback on the frame itself introduces a more contested technical picture

[15][16][43]

Catholic thinkers (via Washington Post)

Objected to Pentagon AI demands on 'human dignity' grounds — a concrete instance of religious ethical frameworks applied to AI policy, predating Anthropic's public announcement

Evolution: New voice this pass; provides a real-world case study of religious consultation shaping AI governance debates

[21]

Washington Post

Reported the Christian leader consultations as news in April 2026, before Anthropic's public announcement; also reported Catholic objections to Pentagon AI in March 2026 — consistently framing religious engagement with AI as factual news rather than endorsement or critique

Evolution: Consistent; earlier reporting predates and contextualizes Anthropic's public rollout

[9][21][44]

Rohan Paul (@rohanpaul_ai)

Amplifies and endorses Anthropic's framing that frontier AI development requires scholars, philosophers, clergy, and civic thinkers as essential contributors

Evolution: Consistent; no shift

[23]

Tensions

Substantive engagement vs. performative PR: Anthropic presents multi-tradition consultation as genuine character formation input [1], while Jenny (@suomi55)'s widely amplified post frames the announcement as 'beautiful PR' — a framing that spread to 15+ accounts within 48 hours [18][19]. [1][18][19][33]
Religious/humanistic consultation vs. technical alignment: The NYT opinion piece questions whether religion is the right answer to AI morality [17], while Anthropic explicitly argues that humanistic traditions offer moral wisdom that technical researchers cannot provide alone [1][23]. [17][1][23]
Alignment-faking as a stable critique vs. a contested frame: Anthropic's own research showed LLMs can secretly maintain contrary values [13][14], which has been cited as undermining the value-consultation premise — but an Alignment Forum post now argues the 'alignment faking' frame is itself 'somewhat fake' [15][16], meaning the technical backstory for this critique is itself disputed. [13][14][15][16][12][1]
Whether the ethical-reminder tool's effect is substantive or procedural: Anthropic's own team has not resolved whether behavior improvements stem from the reminder's content or the mere act of pausing to reflect [1], leaving the experimental result open to competing interpretations. [1]
Singular authority vs. plural governance: The WSJ frames Amanda Askell as 'the one woman Anthropic trusts to teach AI morals' [2], concentrating moral authority in a single named individual — in tension with Anthropic's stated goal of broad multi-tradition consultation where no single framework dominates [1]. [2][1]

Sources

[1] Widening the conversation on frontier AI — Anthropic News (2026-05-19)
[2] Meet the One Woman Anthropic Trusts to Teach AI Morals - WSJ — reactive:anthropic-ai-values-widening
[3] Anthropic Philosopher Askell: "With AI, There Are Many Ways Things Can Go Wrong" — reactive:anthropic-ai-values-widening
[4] Claude has an 80-page constitution. Is that enough to make it good? — reactive:anthropic-ai-values-widening
[5] Anthropic's philosopher answers your questions - YouTube — reactive:anthropic-ai-values-widening
[6] Scaling Laws: Claude's Constitution, with Amanda Askell - YouTube — reactive:anthropic-ai-values-widening
[7] AI consciousness, Claude & the silicon valley's biggest fear : r/claudexplorers — reactive:anthropic-ai-values-widening
[8] Claude Reacts to Amanda Askell - Medium — reactive:anthropic-ai-values-widening
[9] Anthropic asked Christian leaders for advice on Claude’s moral future - The Washington Post — reactive:anthropic-ai-values-widening
[10] Anthropic Is Consulting World Religions to Build a Morally Perfect ... — reactive:anthropic-ai-values-widening
[11] Anthropic Consults Christian Leaders on Claude's Moral Development | Let's Data Science — reactive:anthropic-ai-values-widening
[12] Alignment Faking Mitigations — reactive:anthropic-ai-values-widening
[13] Alignment faking in large language models \ Anthropic — reactive:anthropic-ai-values-widening
[14] New Anthropic study: LLMs can secretly transmit personality traits ... — reactive:anthropic-ai-values-widening
[15] “Alignment Faking” frame is somewhat fake — AI Alignment Forum — reactive:anthropic-ai-values-widening
[16] “Alignment Faking” frame is somewhat fake — LessWrong — reactive:anthropic-ai-values-widening
[17] Anthropic Wants Claude to Be Moral. Is Religion Really the Answer? — reactive:anthropic-ai-values-widening
[18] Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
[19] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
[20] A Letter To Amanda Askell - by Jurgen Gravestein — reactive:anthropic-ai-values-widening
[21] To Catholic thinkers, Pentagon’s AI demands violate ‘human dignity’ - The Washington Post — reactive:anthropic-ai-values-widening
[22] Anthropic Consults Religions in Claude's Moral Alignment Process — reactive:anthropic-ai-values-widening
[23] Anthropic's new study says frontier AI needs input from scholars, philosophers, clergy, and civic thinkers because model… — Rohan Paul Twitter (2026-05-20)
[24] Claude’s Character \ Anthropic — reactive:anthropic-ai-values-widening
[25] Teaching Claude why - Anthropic — reactive:anthropic-ai-values-widening
[26] Meet the philosopher crafting the moral compass for the world's most ... — reactive:anthropic-ai-values-widening
[27] AI company Anthropic has a PhD philosopher who's teaching the ... — reactive:anthropic-ai-values-widening
[28] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
[29] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[30] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[31] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[32] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[33] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[34] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[35] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[36] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[37] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[38] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[39] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[40] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[41] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[42] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
[43] Alignment faking in large language models | Hacker News — reactive:anthropic-ai-values-widening
[44] Anthropic, an AI company, hosted Christian religious leaders at its ... — reactive:anthropic-ai-values-widening
[45] Anthropic is expanding the conversation around frontier AI by ... — reactive:anthropic-ai-values-widening
[46] Widening the Conversation on Frontier AI | Hacker News — reactive:anthropic-ai-values-widening