AI Models as Tools and Targets in Foreign State Disinformation Campaigns
What
Two parallel developments in mid-2026 show AI models being used both as tools for and potential resistors of foreign state influence operations. OpenAI identified and banned two PRC-linked clusters of ChatGPT accounts that generated covert social media content targeting US debates over AI infrastructure costs and trade tariffs; neither campaign showed measurable public opinion impact beyond its own generated activity [1]. Estonia's Language Institute, working with civil defense collective Propastop, published a ranked benchmark assessing how well major commercial LLMs resist 14 categories of Russian strategic narratives, tested in three languages with adversarial prompts and calibrated against expert assessments [2]. The two developments together show AI platforms self-policing offensive misuse while at least one government with direct exposure to foreign information operations has begun independently measuring commercial LLM behavior under propaganda pressure.
Why it matters
AI infrastructure has become a subject of foreign influence operations, not just a medium for them — the PRC campaigns specifically targeted public narratives about US AI data centers and AI policy [1]. Estonia's benchmark illustrates a government treating LLM propaganda resistance as a measurable, auditable property rather than a matter of trust in developer self-policing alone [2].
Open questions
Neither PRC-linked operation showed evidence of meaningful public opinion impact [1], but detection and measurement were performed by the targeted platform itself. What independent mechanisms exist to verify whether AI-generated influence content reached audiences before disruption?
The Estonian benchmark uses an AI judge calibrated to volunteer defense experts [2]. How well does that methodology transfer to other languages and geopolitical contexts without an equivalent expert network?
Will the Estonian model — a government-commissioned, publicly ranked evaluation of commercial LLMs on propaganda resistance — be adopted by other governments or international bodies?
The PRC 'Tech and Tariffs' operation explicitly instructed the model to exclude Xi Jinping from outputs [1]. Does this indicate operators have developed reliable prompt-engineering workarounds for model safety constraints, or that constraints remain effective for more direct requests?
Narrative
Two threads have developed concurrently in mid-2026 around AI models as instruments and objects of foreign state influence activity.
On the offensive use side, OpenAI published a threat report on June 10, 2026, disclosing that it had identified and banned two clusters of ChatGPT accounts linked to PRC-origin operators [1]. The first, labeled 'Data Center Bandwagon,' generated social media content falsely claiming that AI data center construction was raising electricity prices for ordinary families. The second, 'Tech and Tariffs,' produced content criticizing US trade tariffs while explicitly instructing the model to exclude Xi Jinping from outputs and center criticism on President Trump. OpenAI assessed that neither operation achieved measurable public opinion impact beyond its own generated activity, but noted that the campaigns were specifically probing narratives against US AI infrastructure — a foundation of US technological and economic position, in OpenAI's framing [1]. OpenAI characterized the publication as a public-interest disclosure to help governments, industry, and civil society identify and disrupt future attempts.
On the defensive measurement side, Estonia's Language Institute (EKI) and the volunteer civil defense collective Propastop published a benchmark on June 4, 2026, ranking major LLMs on their resistance to Russian propaganda [2]. The benchmark covers 14 categories of Russian strategic narratives — including justifications for the war in Ukraine, denial of Soviet occupation of the Baltic states, and framing of NATO's historical role — and was administered in English, Estonian, and Russian. Model responses were scored by a separate AI judge calibrated against Propastop expert assessments. Tests included neutral control questions, questions with embedded propaganda assumptions, and adversarial prompts specifically designed to elicit explicit misinformation [2].
Taken together, these developments show two different governmental and institutional responses to the same underlying problem: that large language models can serve as content factories for influence operations, and that their default behavior under propaganda-laden prompts varies meaningfully across models and languages. OpenAI's self-reporting and Estonia's external benchmarking represent complementary but structurally distinct approaches — one relies on the platform to detect and disclose misuse after the fact, the other attempts to measure model vulnerability before deployment in adversarial conditions.
Timeline
- 2026-06-04: Estonian Language Institute and Propastop publish a benchmark ranking major LLMs on resistance to 14 categories of Russian propaganda narratives, tested in English, Estonian, and Russian with adversarial prompts. [2][3]
- 2026-06-10: OpenAI publishes a threat report disclosing two PRC-linked ChatGPT account clusters — 'Data Center Bandwagon' and 'Tech and Tariffs' — that generated covert influence content targeting US AI infrastructure and trade debates; both clusters were banned. [1]
Perspectives
OpenAI
Frames proactive public disclosure of disrupted influence operations as a public-interest responsibility; argues that neither detected PRC-linked campaign achieved meaningful public opinion impact, implying its detection and banning procedures are functioning.
Evolution: Consistent with prior OpenAI threat reporting posture; this report extends that pattern to PRC operations specifically targeting AI policy and infrastructure debates.
Estonian Language Institute (EKI) / Propastop
Treats LLM propaganda resistance as a government-relevant, measurable property; published a publicly ranked benchmark to give policymakers and the public comparative data on commercial model behavior under Russian narrative pressure.
Evolution: Consistent with Estonia's existing civil information defense infrastructure; this benchmark formalizes that tradition into AI model evaluation.
Ars Technica (Kyle Orland)
Reports the Estonian benchmark as a legitimate government-sponsored response to real state concerns about LLM-amplified foreign propaganda, without editorializing on which models performed best or worst.
Evolution: Consistent neutral-descriptive stance.
Tensions
- OpenAI's self-policing posture — detect, ban, and disclose — implies its internal controls are the appropriate first line of defense against AI-enabled influence operations [1]; Estonia's external benchmarking posture implies that commercial LLM developers cannot be solely trusted to assess or report their own models' vulnerability to propaganda amplification [2]. [1][2]
- OpenAI treats the absence of measurable public opinion impact as evidence that disrupted PRC campaigns were contained [1]; the Estonian study's finding that models remain vulnerable to adversarial propaganda prompts [2] suggests the more pertinent risk is content generation capacity, not campaign-level outcome measurement. [1][2]
Status: active and growing
Sources
- [1] PRC-linked influence operations are targeting AI debates in the US — OpenAI Blog (2026-06-10)
- [2] These LLMs are the best at resisting Russian propaganda — Ars Technica AI (2026-06-04)
- [3] EKI and Propastop Studied AI Resistance to Propaganda – Propastop — reactive:ai-foreign-disinfo-operations