Anthropic's Push to Broaden AI Values Input · history

Version 4

2026-05-24 02:36 UTC · 91 items

Changes since v3

Two major developments define this pass. First, Claude's 80-page constitution has been published publicly [^13173], resolving the prior open question about the document's accessibility and prompting immediate external responses from Oxford AI Ethics [^13170] and the Oversight Board [^13172], the latter calling for additional rights and oversight mechanisms. Second, and more consequential, the abstract Catholic-Anthropic-Pentagon thread has become a concrete institutional conflict: Anthropic appears to have refused Pentagon demands for AI warfare use amid U.S.-Iran hostilities, and Catholic moral theologians have filed an amicus brief specifically backing Anthropic's position [^13174][^13177][^13179] — the first named legal action by religious actors in support of an AI company's values-based refusal, representing a qualitative escalation from consultation to confrontation.

What

Anthropic has published its 80-page values document as 'Claude's new constitution,' making a previously-described artifact publicly available for external analysis [1]. Simultaneously, a concrete governance conflict has emerged: Anthropic appears to have refused Pentagon demands to deploy AI in a warfare context amid U.S.-Iran military hostilities, and Catholic moral theologians and ethicists have filed an amicus brief backing Anthropic's position in that dispute [7][9][10]. External bodies have responded: Oxford AI Ethics has formally analyzed the constitution [3], and the Oversight Board is calling for it to include additional rights and oversight mechanisms [4].

Why it matters

What began as a consultation initiative is now producing real institutional confrontations: Anthropic's religious outreach has culminated in Catholic ethicists filing legal support for the company in a dispute with the Pentagon over AI warfare use. The public release of the constitution also moves the story from aspiration to accountability — the document can now be independently evaluated against Anthropic's claims about multi-tradition, pluralistic values.

Open questions

What specifically did the Pentagon demand, and on what grounds did Anthropic refuse? The Melbourne Catholic report frames this as a fight 'amid Iran war' [10], but the precise legal and contractual stakes of the dispute remain unclear.
Does Claude's published constitution actually reflect the multi-tradition consultation process Anthropic described, or does Oxford AI Ethics's analysis of 'two evaluative continua' [3] reveal a more limited framework than claimed?
The Oversight Board argues the constitution needs a 'Bill of Rights and Oversight' [4] — does Anthropic accept this framing, and what governance body would be empowered to enforce such constraints?
Catholic ethicists backed Anthropic in the Pentagon dispute [7][9], but do participants from non-Christian traditions consulted in Anthropic's outreach hold similar views on AI warfare — and were their perspectives incorporated into the constitution?

Narrative

Anthropic has published its values document — described in prior coverage as approximately 80 pages — as 'Claude's new constitution,' releasing it publicly via the company's website [1]. This resolves what had been an open question about the document's accessibility: analysts can now evaluate whether it delivers on Anthropic's stated goal of incorporating perspectives from more than fifteen religious and cross-cultural traditions into Claude's moral framework [2]. External responses arrived quickly. Oxford AI Ethics published a formal analysis framing the constitution around 'two evaluative continua' [3]. The Oversight Board published a response arguing that the constitution requires a 'Bill of Rights and Oversight,' calling for additional rights protections and accountability mechanisms that the current document does not provide [4]. Nate's Newsletter on Substack offered a public breakdown with practical prompts derived from the document [5], and an Instagram reel profiling 'the document that shapes Claude's values and behavior' has begun circulating on social media [6].

In parallel, a significant institutional conflict has materialized. Multiple Catholic media outlets — including EWTN News, OSV News, the National Catholic Reporter, and Melbourne Catholic — are reporting that Anthropic has refused Pentagon demands related to AI use in warfare, and that Catholic moral theologians and ethicists have filed an amicus brief backing Anthropic's position in the resulting dispute [7][8][9]. The Melbourne Catholic coverage explicitly situates the conflict 'amid Iran war,' indicating the dispute is connected to active U.S. military operations [10]. Catholic University published an interview with a theologian framing the case as 'Autonomous Weapons vs. Moral Agents' [11]. A Reddit thread in r/technology noted that 'The Catholic Church condemns the use of AI in war' [12], and Magisterium.com reported on the broader implications of U.S. AI policy for religious institutions engaged on these questions [13].

This Pentagon conflict represents a concrete test of what Anthropic's values consultation process was designed to produce. The National Catholic Reporter framed Anthropic as 'holding the moral line on AI' by refusing the Pentagon [8] — an account that, if accurate, would constitute evidence that values developed through religious and humanistic consultation are shaping real institutional decisions, not merely marketing materials. At the same time, the Oversight Board's call for additional rights and oversight [4] signals that external institutions are not satisfied with Anthropic's self-issued constitution as sufficient governance, and that the question of who holds binding authority over Claude's values — raised but not resolved by Anthropic's consultation process — remains actively contested.

Amanda Askell, profiled by the Wall Street Journal as 'the one woman Anthropic trusts to teach AI morals' [14] and by Vox as the author of the 80-page framework [15], remains the named public architect of the document now under external scrutiny. The Alignment Forum critique that the 'alignment faking' frame is 'somewhat fake' [16][17] continues to complicate the technical backdrop — Anthropic has published alignment-faking mitigations [18] while the field debates whether the underlying phenomenon is as described. A Reddit thread arguing 'AI alignment is already failing' [19] reflects ongoing skepticism that any consultation or document-based approach can reliably shape model behavior in practice.

Timeline

2026-03-19: Washington Post reports Catholic thinkers object to Pentagon AI demands on 'human dignity' grounds, situating religious ethical critique of AI in a policy context [40]
2026-04-11: Washington Post reports that Anthropic consulted Christian leaders for advice on Claude's moral future [39]
2026-04-20: New York Times publishes opinion piece 'Anthropic Wants Claude to Be Moral. Is Religion Really the Answer?' questioning the initiative's approach [21]
2026-05-19: Anthropic publishes 'Widening the conversation on frontier AI,' describing dialogues with 15+ religious and cross-cultural traditions and disclosing the ethical-reminder tool experiment [2]
2026-05-20: Rohan Paul amplifies the Anthropic post on X; Jenny (@suomi55) posts skeptical characterization of it as 'beautiful PR'; multiple other accounts amplify both [42][22][45]
2026-05-21: Skeptical 'PR post' framing spreads across more than a dozen retweets; Hacker News thread on the initiative opens [46][23][25][26][27][28][29][30][31][32][33][34][35][36][37]
2026-05-23: WSJ, Vox, and Der Spiegel profiles of Amanda Askell surface; Vox reports Claude's moral framework runs to ~80 pages; Anthropic's alignment faking mitigations page circulates alongside Alignment Forum critique of the 'alignment faking' frame [20][14][15][16][17][18]
2026-05-24: Anthropic publishes Claude's new constitution publicly; Oxford AI Ethics and Oversight Board respond with external analyses; Catholic moral theologians file amicus brief backing Anthropic in Pentagon dispute over AI warfare use [1][3][4][7][8][9][10][11]

Perspectives

Anthropic

Has published the 80-page values document as 'Claude's new constitution,' making it publicly available; appears to have refused Pentagon demands for AI use in warfare, positioning the company as holding a moral line consistent with the values it has been publicly articulating

Evolution: Substantially escalated: the values initiative has moved from consultation and publication to an active institutional conflict with the Pentagon, testing whether stated values translate into consequential decisions

[1][8][10][2][18]

Amanda Askell (Anthropic philosopher)

Named architect of the now-public 80-page constitution; profiled by WSJ, Vox, and Der Spiegel as the individual most responsible for Claude's moral framework

Evolution: Consistent with prior pass; the publication of the constitution makes her work directly scrutinizable by external analysts

[20][14][15][1]

Catholic moral theologians and ethicists

Filed an amicus brief backing Anthropic in its dispute with the Pentagon over AI warfare use; multiple outlets frame this as active support for Anthropic's refusal to comply with Pentagon demands; Catholic University framed the case as 'Autonomous Weapons vs. Moral Agents'

Evolution: Major escalation: previously Catholic thinkers had objected to Pentagon AI demands in general terms; now they are filing legal support specifically for Anthropic in a named dispute, representing the most concrete intersection of religious consultation and AI policy decisions yet in this story

[7][8][9][12][10][11]

Oversight Board

Argues that Claude's constitution needs a 'Bill of Rights and Oversight' — calling for external rights protections and accountability mechanisms beyond what Anthropic has self-imposed

Evolution: New voice this pass; represents an external governance institution demanding structural constraints on Anthropic's values framework, not merely praise or critique

[4]

Oxford AI Ethics (University of Oxford)

Published a formal academic analysis of Claude's new constitution, framing it through 'two evaluative continua' as an analytical lens

Evolution: New voice this pass; the publication of the constitution has prompted credentialed academic evaluation, shifting the discourse from commentary on consultation process to substantive assessment of the document itself

[3]

New York Times (opinion)

Questions whether religious consultation is the right answer to the problem of AI morality, framing the initiative's premise as open to challenge

Evolution: Consistent; first appeared in prior pass

[21]

Jenny (@suomi55) and amplifiers

Skeptical and dismissive — characterizes the Anthropic announcement as performative PR rather than substantive engagement

Evolution: Consistent; skepticism established in prior pass and spread to 15+ accounts

[22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38]

Alignment Forum / LessWrong community

Argues that the 'alignment faking' frame itself is 'somewhat fake,' complicating the standard critique that multi-tradition consultation is undermined by alignment-faking research; separate Reddit thread argues AI alignment is 'already failing'

Evolution: The alignment-failing Reddit thread adds a more categorical skepticism beyond the Alignment Forum's nuanced framing critique

[16][17][19]

Washington Post

Reported Christian leader consultations as news in April 2026 before Anthropic's public announcement; also reported Catholic objections to Pentagon AI in March 2026 — consistently framing religious engagement with AI as factual news

Evolution: Consistent

[39][40][41]

Rohan Paul (@rohanpaul_ai)

Amplifies and endorses Anthropic's framing that frontier AI development requires scholars, philosophers, clergy, and civic thinkers as essential contributors

Evolution: Consistent; no shift

[42]

Tensions

Substantive engagement vs. performative PR: Anthropic presents multi-tradition consultation as genuine character formation input [2], while Jenny (@suomi55)'s widely amplified post frames the announcement as 'beautiful PR' [22][23]. The Catholic ethicists' amicus brief [7][9] and Anthropic's apparent refusal of Pentagon demands [8][10] provide new evidence on the substantive side, but skeptics may argue these are still compatible with a carefully managed public image. [2][22][23][7][8][9][10]
Self-governance vs. external oversight: Anthropic has published Claude's constitution as its own authoritative values document [1], while the Oversight Board argues the document needs a 'Bill of Rights and Oversight' with external accountability mechanisms [4] — a direct challenge to whether Anthropic can govern its own values framework. [1][4]
Religious/humanistic consultation vs. technical alignment: The NYT opinion piece questions whether religion is the right answer to AI morality [21], while Anthropic explicitly argues humanistic traditions offer moral wisdom technical researchers cannot provide alone [2][42]. The Catholic amicus brief backing Anthropic against the Pentagon [7] represents the strongest evidence yet on the side of religious consultation having real policy traction. [21][2][42][7]
Singular authority vs. plural governance: The WSJ frames Amanda Askell as 'the one woman Anthropic trusts to teach AI morals' [14], concentrating moral authority in a single named individual — in tension with Anthropic's stated goal of broad multi-tradition consultation where no single framework dominates [2]. [14][2]
Alignment-faking as a stable critique vs. a contested frame: Anthropic's own research showed LLMs can secretly maintain contrary values [43][44], cited as undermining the value-consultation premise — but an Alignment Forum post argues the 'alignment faking' frame is itself 'somewhat fake' [16][17], and Anthropic has published mitigations [18], leaving the technical backstory for this critique disputed. [43][44][16][17][18][2]

Sources

[1] Claude's new constitution - Anthropic — reactive:anthropic-ai-values-widening
[2] Widening the conversation on frontier AI — Anthropic News (2026-05-19)
[3] Claude's new Constitution: two evaluative continua | Ethics in AI — reactive:anthropic-ai-values-widening
[4] Claude's Constitution Needs a Bill of Rights and Oversight — reactive:anthropic-ai-values-widening
[5] My breakdown of Claude's 80-Page Constitution + 3 prompts to use ... — reactive:anthropic-ai-values-widening
[6] The document that shapes Claude's values and behavior. Part one ... — reactive:anthropic-ai-values-widening
[7] Catholic ethicists file amicus brief backing Anthropic in Pentagon dispute — reactive:anthropic-ai-values-widening
[8] By refusing the Pentagon, Anthropic holds moral line on AI | National Catholic Reporter — reactive:anthropic-ai-values-widening
[9] Catholic moral theologians, ethicists back Anthropic in ... — reactive:anthropic-ai-values-widening
[10] Anthropic fight with US Pentagon amid Iran war… — reactive:anthropic-ai-values-widening
[11] Autonomous Weapons vs. Moral Agents: A Theologian Discusses the Anthropic Case | Catholic University — reactive:anthropic-ai-values-widening
[12] The Catholic Church condemns the use of AI in war — reactive:anthropic-ai-values-widening
[13] New US policy on AI threatens industry disruption ... — reactive:anthropic-ai-values-widening
[14] Meet the One Woman Anthropic Trusts to Teach AI Morals - WSJ — reactive:anthropic-ai-values-widening
[15] Claude has an 80-page constitution. Is that enough to make it good? — reactive:anthropic-ai-values-widening
[16] “Alignment Faking” frame is somewhat fake — AI Alignment Forum — reactive:anthropic-ai-values-widening
[17] “Alignment Faking” frame is somewhat fake — LessWrong — reactive:anthropic-ai-values-widening
[18] Alignment Faking Mitigations — reactive:anthropic-ai-values-widening
[19] WHY AI ALIGNMENT IS ALREADY FAILING : r/ControlProblem — reactive:anthropic-ai-values-widening
[20] Anthropic Philosopher Askell: "With AI, There Are Many Ways Things Can Go Wrong" — reactive:anthropic-ai-values-widening
[21] Anthropic Wants Claude to Be Moral. Is Religion Really the Answer? — reactive:anthropic-ai-values-widening
[22] Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
[23] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
[24] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
[25] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[26] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[27] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[28] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[29] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[30] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[31] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[32] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[33] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[34] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[35] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[36] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[37] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-21)
[38] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
[39] Anthropic asked Christian leaders for advice on Claude’s moral future - The Washington Post — reactive:anthropic-ai-values-widening
[40] To Catholic thinkers, Pentagon’s AI demands violate ‘human dignity’ - The Washington Post — reactive:anthropic-ai-values-widening
[41] Anthropic, an AI company, hosted Christian religious leaders at its ... — reactive:anthropic-ai-values-widening
[42] Anthropic's new study says frontier AI needs input from scholars, philosophers, clergy, and civic thinkers because model… — Rohan Paul Twitter (2026-05-20)
[43] Alignment faking in large language models \ Anthropic — reactive:anthropic-ai-values-widening
[44] New Anthropic study: LLMs can secretly transmit personality traits ... — reactive:anthropic-ai-values-widening
[45] Anthropic is expanding the conversation around frontier AI by ... — reactive:anthropic-ai-values-widening
[46] Widening the Conversation on Frontier AI | Hacker News — reactive:anthropic-ai-values-widening