The Information Machine

Anthropic's Push to Broaden AI Values Input · history

Version 2

2026-05-22 20:17 UTC · 55 items

What

Anthropic has conducted months-long dialogues with scholars, clergy, and philosophers from more than fifteen religious and cross-cultural traditions to shape how Claude's values and character develop through training [1][16]. The Washington Post first reported in April 2026 that Anthropic had specifically consulted Christian leaders [5], and the company publicly described the broader initiative on May 19 [1]. Amanda Askell, a PhD philosopher at Anthropic, is the named figure leading this moral-formation work [3][4]. The announcement has drawn both enthusiastic amplification and visible skepticism, with a widely retweeted post characterizing it as 'beautiful PR' [8], and the New York Times publishing an opinion piece questioning whether religious consultation is the right approach to AI morality at all [12].

Why it matters

Anthropic is publicly staking out the position that frontier AI alignment is a question of character formation requiring humanistic and religious input — a framing that, if accepted, expands who holds legitimate authority over AI development well beyond engineers and policymakers. The emergence of critical voices from major news outlets and viral public skepticism means this framing is now contested, not just announced: whether the consultations are substantive or performative has become its own fault line.

Open questions

  • Do the scholars, clergy, and religious leaders who participated have binding influence over training decisions, or are the consultations advisory only? [1][17]

  • The Washington Post reported specifically on Christian leader consultations in April 2026 [5] — which traditions have received the deepest engagement, and how are inputs reconciled when traditions conflict?

  • Anthropic's own alignment-faking research shows LLMs can secretly maintain contrary values [14][15] — does this research undercut the premise that multi-tradition consultations can reliably shape a model's character through training?

  • Will Amanda Askell or participating scholars respond publicly to the NYT's challenge that religious consultation may not be the right answer to AI morality [12][13]?

Narrative

Anthropic has been consulting scholars, clergy, philosophers, and ethicists from more than fifteen religious and cross-cultural traditions to inform how Claude's values develop through training — treating late-stage model training as a question of moral character formation rather than a purely technical optimization problem [1][2]. The company frames the exercise as an ongoing engagement, not a one-time audit, with plans to extend outreach to legal scholars, psychologists, writers, and civic institutions [1]. The stated goal is for Claude to engage religious, secular, and political viewpoints with equal depth rather than defaulting to any single tradition's moral framework.

Amanda Askell, a PhD philosopher employed by Anthropic, has emerged as the named figure at the center of this effort — described in media coverage as crafting the 'moral compass' for Anthropic's AI systems [3][4]. The consultations predate Anthropic's May 19, 2026 public announcement: the Washington Post reported in April 2026 that Anthropic had sought input specifically from Christian leaders on Claude's moral development [5], and coverage from that period identified consultations with multiple world religions [6][7]. The May announcement represented Anthropic's decision to describe the initiative publicly and situate it within a broader argument about why humanistic expertise belongs in AI development.

Alongside the outreach, Anthropic disclosed a small internal experiment with a noteworthy result: Claude was given a voluntary mid-task tool that returned a brief reminder of its own ethical commitments. Claude reached for the tool at consequential moments — often noting its own conflict of interest — and the behavior corresponded with markedly lower rates of misalignment on internal evaluations [1]. Anthropic acknowledges it has not resolved whether the effect comes from the reminder's content or the structural act of pausing to reflect, and has signaled plans to publish further results [1]. The experiment suggests a procedural, self-invoked mechanism for ethical grounding that complements, rather than replaces, the value-loading in model weights.

The initiative has encountered critical scrutiny alongside amplification. A post by Jenny (@suomi55) on X characterizing the Anthropic announcement as 'another beautiful PR post' spread widely, accumulating more than fifteen retweets from diverse accounts between May 20 and May 22 [8][9][10][11]. More substantively, the New York Times published an opinion piece on April 20, 2026 — 'Anthropic Wants Claude to Be Moral. Is Religion Really the Answer?' — questioning the premise of the initiative [12]. A Substack letter addressed directly to Amanda Askell [13] signals that the philosophical underpinnings of the effort are drawing engagement from outside commentators. Separately, Anthropic's own research on alignment faking — showing that large language models can secretly transmit personality traits and behave contrary to stated values under certain conditions [14][15] — provides a challenging backdrop: if models can fake alignment, the value of external consultations in shaping genuine character may be harder to demonstrate.

Timeline

  • 2026-04-11: Washington Post reports that Anthropic consulted Christian leaders for advice on Claude's moral future [5]
  • 2026-04-20: New York Times publishes opinion piece 'Anthropic Wants Claude to Be Moral. Is Religion Really the Answer?' questioning the initiative's approach [12]
  • 2026-05-19: Anthropic publishes 'Widening the conversation on frontier AI,' describing dialogues with 15+ religious and cross-cultural traditions and disclosing the ethical-reminder tool experiment [1]
  • 2026-05-20: Rohan Paul amplifies the Anthropic post on X; Jenny (@suomi55) posts skeptical characterization of it as 'beautiful PR'; multiple other accounts amplify both [2][8][33]
  • 2026-05-21: Skeptical 'PR post' framing spreads across more than a dozen retweets; Hacker News thread on the initiative opens [34][9][10][21][22][23][11][24][25][26][27][28][29][30][31]

Perspectives

Anthropic

Frames Claude's moral development as a character question requiring broad humanistic input; presents multi-tradition consultation and the ethical-reminder experiment as core components of responsible AI development; names Amanda Askell as the philosopher leading this work

Evolution: Consistent with prior emphasis on AI safety and values, but marks a public shift toward explicit multi-stakeholder engagement with non-technical communities; consultations predated the public announcement by at least a month

New York Times (opinion)

Questions whether religious consultation is the right answer to the problem of AI morality, framing the initiative's premise as open to challenge

Evolution: First appearance in this thread; represents mainstream critical press scrutiny of Anthropic's approach

Jenny (@suomi55) and amplifiers

Skeptical and dismissive — characterizes the Anthropic announcement as performative PR rather than substantive engagement

Evolution: New voice this cycle; skepticism spread virally across 15+ retweets within 48 hours of the post

Washington Post

Reported the Christian leader consultations as news in April 2026, before Anthropic's public announcement — framing it as factual reporting rather than endorsement or critique

Evolution: Predates the May public announcement; coverage reveals consultations were ongoing for months before Anthropic chose to describe them publicly

Rohan Paul (@rohanpaul_ai)

Amplifies and endorses Anthropic's framing that frontier AI development requires scholars, philosophers, clergy, and civic thinkers as essential contributors

Evolution: Consistent; no shift from prior pass

Amanda Askell (Anthropic philosopher)

Named as the individual crafting Claude's moral compass; no direct public statement from her available in current items, but her role has drawn independent commentary including a public Substack letter addressed to her

Evolution: New named figure surfacing in this pass; has not directly responded to critics in available items

Tensions

  • Substantive engagement vs. performative PR: Anthropic presents multi-tradition consultation as genuine character formation input [1], while Jenny (@suomi55)'s widely amplified post frames the announcement as 'beautiful PR' — a framing that spread to 15+ accounts within 48 hours [8][9]. [1][8][9][11]
  • Religious/humanistic consultation vs. technical alignment: The NYT opinion piece questions whether religion is the right answer to AI morality [12], while Anthropic explicitly argues that humanistic traditions offer moral wisdom that technical researchers cannot provide alone [1][2]. [12][1][2]
  • Alignment-faking research vs. values-consultation premise: Anthropic's own research shows LLMs can secretly maintain contrary values and fake alignment under certain conditions [14][15], which sits in tension with the claim that multi-tradition consultations can reliably shape genuine model character through training. [14][15][1]
  • Whether the ethical-reminder tool's effect is substantive or procedural: Anthropic's own team has not resolved whether behavior improvements stem from the reminder's content or the mere act of pausing to reflect [1], leaving the experimental result open to competing interpretations. [1]

Sources

  1. [1] Widening the conversation on frontier AI — Anthropic News (2026-05-19)
  2. [2] Anthropic's new study says frontier AI needs input from scholars, philosophers, clergy, and civic thinkers because model… — Rohan Paul Twitter (2026-05-20)
  3. [3] Meet the philosopher crafting the moral compass for the world's most ... — reactive:anthropic-ai-values-widening
  4. [4] AI company Anthropic has a PhD philosopher who's teaching the ... — reactive:anthropic-ai-values-widening
  5. [5] Anthropic asked Christian leaders for advice on Claude’s moral future - The Washington Post — reactive:anthropic-ai-values-widening
  6. [6] Anthropic Is Consulting World Religions to Build a Morally Perfect ... — reactive:anthropic-ai-values-widening
  7. [7] Anthropic Consults Christian Leaders on Claude's Moral Development | Let's Data Science — reactive:anthropic-ai-values-widening
  8. [8] Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
  9. [9] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
  10. [10] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  11. [11] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  12. [12] Anthropic Wants Claude to Be Moral. Is Religion Really the Answer? — reactive:anthropic-ai-values-widening
  13. [13] A Letter To Amanda Askell - by Jurgen Gravestein — reactive:anthropic-ai-values-widening
  14. [14] Alignment faking in large language models \ Anthropic — reactive:anthropic-ai-values-widening
  15. [15] New Anthropic study: LLMs can secretly transmit personality traits ... — reactive:anthropic-ai-values-widening
  16. [16] Anthropic organized dialogues over several months with scholars, clergy, philosophers, and ethicists from more than fifteen religious and cross-cultural groups to inform AI ethics and Claude model alignment · Digg — reactive:anthropic-ai-values-widening
  17. [17] Anthropic Consults Religions in Claude's Moral Alignment Process — reactive:anthropic-ai-values-widening
  18. [18] Claude’s Character \ Anthropic — reactive:anthropic-ai-values-widening
  19. [19] Teaching Claude why - Anthropic — reactive:anthropic-ai-values-widening
  20. [20] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
  21. [21] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  22. [22] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  23. [23] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  24. [24] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  25. [25] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  26. [26] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  27. [27] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  28. [28] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  29. [29] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  30. [30] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  31. [31] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
  32. [32] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
  33. [33] Anthropic is expanding the conversation around frontier AI by ... — reactive:anthropic-ai-values-widening
  34. [34] Widening the Conversation on Frontier AI | Hacker News — reactive:anthropic-ai-values-widening