Anthropic's Push to Broaden AI Values Input · history

Version 9

2026-05-27 03:06 UTC · 164 items

What

Anthropic co-founder Chris Olah traveled to the Vatican to help launch Pope Leo XIV's first encyclical 'Magnifica Humanitas' [25], openly admitting at the event that frontier AI labs—including Anthropic—'operate inside incentives that can conflict with doing the right thing' [26]. The encyclical, covered by Forbes, Time, and Politico, calls for robust AI regulation [18] and Pope Leo simultaneously announced the formation of a Vatican AI commission [22]. Anthropic continues to face a standing federal ban after refusing Pentagon demands to disable its AI safeguards [5], while its Claude's constitution has been dismissed by the Oversight Board as 'about vibes, not rights' [13].

Why it matters

An AI company co-founder standing at the Vatican to help launch a papal document calling for binding regulation—while publicly admitting his own lab's structural conflicts of interest—is qualitatively different from publishing an internal ethics code. The move collapses the distance between Anthropic's voluntary ethics framework and the Vatican's call for enforceable governance, raising the question of whether Anthropic's consultation strategy was always leading toward an endorsement of external regulatory constraints or whether it remains a sophisticated legitimacy exercise.

Open questions

Does Anthropic's active role co-launching 'Magnifica Humanitas' signal that the company supports binding AI regulation—and if so, will it lobby for specific enforcement mechanisms [26][25]?
What mandate, composition, and authority will Pope Leo's new Vatican AI commission hold, and will it coordinate with existing regulatory bodies [22]?
Will Olah's Vatican admission that frontier labs face structural conflicts of interest [26] change how regulators, investors, or courts assess Anthropic's self-governance claims—especially given the standing federal ban [5]?
With Anthropic's co-founder now publicly aligned with calls for external moral scrutiny, does the Oversight Board's 'vibes not rights' critique [13] become Anthropic's own position—and if so, what enforceable framework does Anthropic actually support?

Narrative

Anthropic's conflict with the U.S. government began when the Pentagon demanded the company disable its AI safeguards as a condition for military use. CEO Dario Amodei refused [1], and the Trump administration responded by banning Anthropic from federal use [2] and branding it a supply chain risk. A federal judge temporarily blocked that designation [3][4], but a federal appeals court overturned the reprieve on April 8, 2026, leaving the ban intact [5][6]. OpenAI moved within hours of the original ban announcement to claim the Pentagon AI contract [7][8], creating a direct market demonstration that values-based refusals carry quantifiable competitive costs when a less restrictive competitor is available.

Against this legal backdrop, Anthropic pursued a formal values framework for its AI systems. The company disclosed consultations with more than fifteen religious and cross-cultural traditions [9] and released Claude's constitution—approximately 80 pages [10]—whose architect Amanda Askell was profiled by WSJ, Vox, and Der Spiegel [11][10][12]. The institutional response was divided: the Oversight Board's Suzanne N. dismissed the document as 'a constitution that is about vibes, not rights' [13], and multiple sources confirmed that OpenAI is independently consulting the same religious communities in parallel [14][15], situating Anthropic's program within an industry-wide pattern rather than a distinctive ethical strategy.

Pope Leo XIV had previously warned that children and adolescents are vulnerable to AI manipulation [16]; on May 25, 2026, he published his first encyclical, 'Magnifica Humanitas,' a 42,300-word document calling for robust AI regulation and elevating AI ethics to a religious imperative [17][18][19]. The document received coverage from Forbes, Time, PBS, and the Washington Post [20][21]. Pope Leo simultaneously announced the formation of a Vatican AI commission [22], and the University of Notre Dame issued a formal statement in response [23]. The Rome Call for AI Ethics, a Vatican-led interfaith document with Muslim, Jewish, and evangelical signatories [24], forms the broader institutional context in which the encyclical sits.

The most consequential development is that Anthropic co-founder Chris Olah traveled to the Vatican to help launch the encyclical [25], speaking with unusual candor for a company executive. Olah acknowledged that 'every frontier AI lab—including Anthropic—operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing' [26], and disclosed that Anthropic's interpretability research has found internal states in AI models that 'functionally mirror joy, satisfaction, fear, grief, and unease' [26][27]. He framed religious communities and civil society as necessary external critics 'whose values cannot be bent by market and competitive pressures' [26]—a position that aligns Anthropic's leadership with the encyclical's regulatory call rather than treating it as an external challenge to manage. Whether this represents a genuine move toward endorsing enforceable constraints, or an extension of a consultation-as-legitimacy strategy, is now the story's central unresolved question.

Timeline

2025-11: Pope Leo XIV warns that children and adolescents are vulnerable to AI manipulation and states the real danger of AI is not technology itself [16][34]
2026-02-27: Trump administration bans Anthropic from Pentagon use; OpenAI announces a Pentagon AI deal to fill the resulting contract gap within hours [7][8][2]
2026-03-16: Catholic moral theologians file amicus brief (case 26-1049) backing Anthropic's refusal to comply with Pentagon demands on human dignity grounds [30]
2026-03-26: Federal judge temporarily blocks the Pentagon from branding Anthropic a supply chain risk and halts the Trump administration's federal ban [3][4][35]
2026-04-08: Federal appeals court overturns the temporary block, leaving the Pentagon's blacklisting of Anthropic intact [5][6][36]
2026-04-11: Washington Post reports Anthropic consulted Christian leaders for advice on Claude's moral future [37]
2026-04-20: New York Times publishes opinion piece questioning whether religion is the right answer to AI morality [33]
2026-05-19: Anthropic publishes 'Widening the conversation on frontier AI,' describing dialogues with 15+ religious and cross-cultural traditions [9]
2026-05-21: Skeptical 'beautiful PR' framing spreads across social media; Hacker News thread on the initiative opens [31][32][38]
2026-05-23: WSJ, Vox, and Der Spiegel profile Amanda Askell as Claude's moral framework architect; alignment-faking research and Alignment Forum critique circulate [11][10][12][39][40]
2026-05-24: Anthropic publishes Claude's constitution; Oversight Board member calls it 'about vibes, not rights'; multiple sources confirm OpenAI is also consulting religious leaders in parallel [28][13][29][14][15]
2026-05-24: BBC reports Anthropic CEO explicitly rejected Pentagon demands; Seattle Times frames tech's religious consultations as an industry-wide pattern [1][41]
2026-05-25: Pope Leo XIV's first encyclical 'Magnifica Humanitas' (42,300 words) published, calling for robust AI regulation and covered by Forbes, Time, PBS, and the Washington Post [17][42][18][19][20][21]
2026-05-25: Pope Leo announces the formation of a Vatican AI commission alongside the encyclical [22]
2026-05-25: Anthropic co-founder Chris Olah speaks at the Vatican for the encyclical launch, admitting frontier labs face structural conflicts of interest and inviting external moral scrutiny [26][43][25]

Perspectives

Anthropic / Chris Olah

Co-founder Chris Olah traveled to the Vatican to help launch 'Magnifica Humanitas' [25], publicly admitting that Anthropic—like all frontier labs—operates inside incentives that can conflict with doing the right thing [26]; the company simultaneously maintains a values-based refusal of Pentagon demands [1] while its published Claude's constitution faces an enforceability critique [13]

Evolution: Major shift in visible posture: from publishing an internal ethics code and consulting religious communities, to an Anthropic co-founder standing at the Vatican endorsing external moral scrutiny and admitting structural conflicts of interest—while the underlying legal and competitive situation (federal ban, OpenAI contract capture) remains unchanged

[26][25][1][13][28][7][5]

Trump administration / U.S. Department of Defense

Banned Anthropic from federal use after the company refused to disable AI safeguards [2]; branded it a supply chain risk; prevailed in federal appeals court [5][6] with OpenAI as a replacement vendor

Evolution: Consistent; ban stands and the DoD has a replacement vendor in OpenAI [7]

[7][2][5][6][1]

OpenAI

Claimed the Pentagon AI contract within hours of the Trump ban on Anthropic [7][8]; also independently consulting Hindu, Sikh, and Christian religious leaders [14][15], positioning OpenAI as both the direct market beneficiary of Anthropic's refusal and a parallel ethics-consultation actor

Evolution: Consistent; the convergence of both companies on religious consultation—and now Anthropic's Vatican co-launch role—strengthens the 'industry credential' interpretation of the practice rather than distinguishing either company

[7][8][14][15]

Pope Leo XIV / Vatican

Published 'Magnifica Humanitas' calling for robust AI regulation [18] and formed a Vatican AI commission [22]; collaborated with Anthropic's co-founder on the encyclical's public launch [25]

Evolution: Decisively escalated from informal papal warnings [16] to formal magisterial teaching demanding binding regulation; the Vatican's relationship with Anthropic has shifted from subject-of-critique to active co-presenter

[17][18][19][22][25][16]

Oversight Board

Member Suzanne N. characterized Claude's constitution as 'a constitution that is about vibes, not rights' [13][29]—a direct indictment of enforceability rather than a call for iterative improvement

Evolution: Consistent and unaddressed by Anthropic; Olah's Vatican remarks that labs need external critics whose values 'cannot be bent by market pressures' [26] arguably concede the structural point without answering the specific rights-vs-vibes critique

[13][29][26]

Catholic moral theologians and ethicists

Filed a formal amicus brief (case 26-1049) backing Anthropic's refusal on human dignity grounds [30]; multiple Catholic outlets framed Anthropic as holding the moral line on AI

Evolution: Consistent in support of Anthropic's refusal; their legal intervention did not succeed [5][6]; Notre Dame issued a formal statement on the encyclical [23], and the Vatican's regulatory demand now represents a different institutional track from their earlier legal support for Anthropic's autonomy

[30][5][6][23]

Amanda Askell (Anthropic philosopher)

Named architect of Claude's constitution, profiled as the individual most responsible for Claude's moral framework [11][10][12]

Evolution: The Oversight Board's 'vibes not rights' verdict [13] challenged the document's enforceability; Olah's Vatican remarks now signal Anthropic's leadership agrees external constraints are necessary [26], placing new pressure on whether the document she authored is a step toward or a substitute for binding governance

[11][10][12][13][26]

Skeptics and critics

Characterize Anthropic's consultation program as performative PR rather than substantive ethics engagement [31][32]; NYT questions whether religion is the right answer to AI morality [33]

Evolution: Reinforced by OpenAI's parallel consultations [14][15]; Olah's Vatican co-launch role [25] can be read as either genuine ethical commitment or an escalation of the same legitimacy strategy, depending on whether Anthropic follows the encyclical's regulatory call with concrete policy action

[31][32][33][14][15][25]

Tensions

Voluntary ethical codes vs. binding regulation: Anthropic's Claude's constitution is the code-based approach [28], the Oversight Board calls it 'vibes not rights' [13], and 'Magnifica Humanitas' demands enforceable regulation [18]—tensions that Olah's Vatican remarks now partially concede rather than rebut [26] [28][13][18][26]
Anthropic's values-based refusal vs. OpenAI's compliance: Anthropic refused Pentagon demands and lost the contract [1][7]; OpenAI filled the gap within hours [8]—a direct market demonstration that ethical self-restriction carries quantifiable competitive costs [7][1][8]
Differentiated ethics strategy vs. convergent industry practice: Anthropic presents multi-tradition consultation as genuine values formation [9], but OpenAI consults the same communities in parallel [14][15], and both companies' executives now appear at high-profile religious forums, supporting the 'industry credential' interpretation [9][14][15][25]
Substantive ethical engagement vs. performative legitimacy signaling: CEO-level refusal [1] and a formal amicus brief [30] suggest genuine commitment, while Olah's Vatican appearance [25] can be read as either the strongest signal yet of authentic alignment with religious authority or a high-profile escalation of the same legitimacy strategy [1][30][25][26]
Values-based corporate autonomy vs. federal judicial and executive authority: Anthropic's CEO refused Pentagon demands [1] and Catholic theologians backed the refusal legally [30], but the appeals court ruled against Anthropic [5][6], establishing that voluntary ethics frameworks do not prevent government actors from prevailing [1][30][5][6]

Sources

[1] Anthropic boss rejects Pentagon demand to drop AI safeguards — reactive:anthropic-ai-values-widening
[2] Trump orders US agencies to stop using Anthropic technology in ... — reactive:anthropic-ai-values-widening
[3] Federal judge temporarily blocks the Pentagon from branding AI firm ... — reactive:anthropic-ai-values-widening
[4] Judge blocks Pentagon from labeling Anthropic AI a "supply chain risk" and halts Trump's ban on federal use — reactive:anthropic-ai-values-widening
[5] US court declines to block Pentagon's Anthropic blacklisting for now — reactive:anthropic-partnerships-expansion
[6] Anthropic loses appeals court bid to temporarily block DOD ruling — reactive:anthropic-ai-values-widening
[7] OpenAI announces Pentagon deal after Trump bans Anthropic - NPR — reactive:openai-advanced-account-security
[8] San Francisco-based OpenAI strikes deal with Pentagon hours after President Donald Trump's administration bans Anthropic - ABC7 New York — reactive:anthropic-ai-values-widening
[9] Widening the conversation on frontier AI — Anthropic News (2026-05-19)
[10] Meet the One Woman Anthropic Trusts to Teach AI Morals - WSJ — reactive:anthropic-ai-values-widening
[11] Anthropic Philosopher Askell: "With AI, There Are Many Ways Things Can Go Wrong" — reactive:anthropic-ai-values-widening
[12] Claude has an 80-page constitution. Is that enough to make it good? — reactive:anthropic-ai-values-widening
[13] Oversight Board - Facebook — reactive:anthropic-ai-values-widening
[14] BREAKING: OpenAI & Anthropic Consult Hindu, Sikh & Christian Leaders to Build AGI's Moral Compass — reactive:anthropic-ai-values-widening
[15] Anthropic and OpenAI sit down with religious leaders to seek ethical advice — reactive:anthropic-ai-values-widening
[16] Pope Leo XIV: Children and adolescents are vulnerable to AI manipulation - Vatican News — reactive:anthropic-ai-values-widening
[17] Pope elevates AI ethics to a religious imperative with first encyclical — reactive:anthropic-ai-values-widening
[18] Pope calls for robust regulation of AI in manifesto that ponders the future of humanity — reactive:anthropic-ai-values-widening
[19] Encyclical Letter of His Holiness Leo XIV Magnifica Humanitas (15 ... — reactive:anthropic-ai-values-widening
[20] Pope Leo's First Encyclical Calls For Safeguarding Humans From Impact Of AI - Forbes — reactive:anthropic-ai-values-widening
[21] Pope Leo Uses First Major Papal Text to Warn About Dangers of AI — reactive:anthropic-ai-values-widening
[22] Pope Leo launches AI commission — reactive:anthropic-ai-values-widening
[23] University of Notre Dame — reactive:anthropic-ai-values-widening
[24] [PDF] Rome Call for AI Ethics - National Association of Evangelicals — reactive:anthropic-ai-values-widening
[25] Why is AI company Anthropic helping launch Pope Leo ... — reactive:anthropic-ai-values-widening
[26] Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas" — Anthropic News (2026-05-25)
[27] "There is a "real possibility that AI will displace human labor at a very large scale.... We find internal states that f… — Rohan Paul Twitter (2026-05-25)
[28] Claude's new constitution - Anthropic — reactive:anthropic-ai-values-widening
[29] “A constitution that is about vibes, not rights.” Oversight Board ... — reactive:anthropic-ai-values-widening
[30] [PDF] 26-1049 IN THE UNITED STATES COURT OF APPEALS FOR THE ... — reactive:anthropic-ai-values-widening
[31] Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-20)
[32] RT @suomi55: Anthropic just dropped another beautiful PR post: — reactive:anthropic-ai-values-widening (2026-05-22)
[33] Anthropic Wants Claude to Be Moral. Is Religion Really the Answer? — reactive:anthropic-ai-values-widening
[34] Pope Leo XIV warns that the real danger of AI is not technology itself — reactive:anthropic-ai-values-widening
[35] Federal judge temporarily blocks the Pentagon from branding AI firm Anthropic a supply chain risk | Federal News Network — reactive:anthropic-ai-values-widening
[36] Appeals Court Keeps Anthropic Pentagon Case Blacklist Intact — reactive:anthropic-partnerships-expansion
[37] Anthropic asked Christian leaders for advice on Claude’s moral future - The Washington Post — reactive:anthropic-ai-values-widening
[38] Widening the Conversation on Frontier AI | Hacker News — reactive:anthropic-ai-values-widening
[39] “Alignment Faking” frame is somewhat fake — AI Alignment Forum — reactive:anthropic-ai-values-widening
[40] Alignment Faking Mitigations — reactive:anthropic-ai-values-widening
[41] Tech is turning increasingly to religion in a quest to create ethical AI | The Seattle Times — reactive:anthropic-ai-values-widening
[42] Pope Leo Warns of Risks From A.I. in 42,300-Word Encyclical — reactive:anthropic-ai-values-widening
[43] Few things Anthropic’s co-founder Chris Olah told the Vatican today. — Rohan Paul Twitter (2026-05-25)