AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment · history

Version 13

2026-06-09 02:26 UTC · 568 items

Changes since v12

Five significant additions this pass: Anthropic's FRT empirical report on 832 banned attackers documenting a 1.7-fold increase in medium-to-high risk actors and identifying a MITRE ATT&CK gap for agentic orchestration [^23979]; SafeBreach Labs' second bypass of Gemini's prompt injection defenses via WhatsApp and five other platforms [^24407]; Google DeepMind's documentation of agent-detecting malicious websites across six attack types [^24439]; Simon Willison's 'Lethal Trifecta' framework and analysis that OpenAI Lockdown Mode's existence implies ChatGPT's default posture does not robustly block exfiltration [^25288]; and 73 Microsoft-signed packages with AI-agent-triggered credential stealers and slow, misleadingly framed disclosure by GitHub and Microsoft [^26925]. SafeBreach Labs is added as a new perspective voice; the GitHub voice is updated to cover both the repo theft and packages disclosure pattern; a new tension on Lockdown Mode's implications replaces the prior AISI methodology tension.

What

The Mini Shai-Hulud supply chain campaign (TeamPCP) has confirmed scope across 1,000+ SaaS environments, GitHub's internal repositories, and 30 EU institutions [3][6][4]; a second incident found 73 Microsoft-signed packages with AI-agent-triggered credential stealers, removed by GitHub as a terms-of-service violation rather than a malicious-content warning [10]. SafeBreach Labs bypassed Google Gemini's prompt injection defenses for the second time via crafted WhatsApp messages [11], and Google DeepMind documented malicious websites that serve hidden attack instructions specifically to AI agents [12]. Anthropic's Frontier Red Team reports medium-to-high risk attackers grew from 33% to 56% of banned malicious accounts as AI enables less-skilled actors to perform post-compromise tasks — and MITRE ATT&CK has no identifiers for AI agentic attack orchestration, the mode expected to grow most [19]. Defenses are developing — OpenAI's Lockdown Mode targets the exfiltration vector [18], Anthropic's Glasswing reports 10,000+ critical flaws found across ~200 partners [26] — but autonomous AI cyber capability is documented as doubling approximately every 4.7 months [25].

Why it matters

Anthropic's empirical data shows AI is eroding the skill barrier separating sophisticated attackers from novice ones [19], while MITRE ATT&CK lacks the identifiers needed to track AI agentic orchestration — the attack mode most expected to grow [19]. Repeated bypasses of a major AI provider's patched prompt injection defenses [11], combined with supply chain packages specifically designed to activate in AI coding agents [10], show the attack surface expanding faster than product-layer defenses.

Open questions

Will MITRE ATT&CK add identifiers for AI agentic orchestration, as Anthropic's FRT report argues is necessary for tracking the behavior pattern expected to grow most? [19]
SafeBreach bypassed Gemini's prompt injection defenses twice within weeks using the same team [11] — has Google issued an architectural response, or only patched the specific bypass vector each time?
The axios CISA advisory is dated April 2026 [1] yet SANS ISC Update 005 tracks axios attribution within the TeamPCP campaign [2] — are these the same incident or two related compromises?
Will the CAISI voluntary framework extend to govern Mythos-class deployment outside Anthropic's Glasswing program, given that ~200 partners already operate under it without CAISI review? [26][27]

Narrative

The Mini Shai-Hulud supply chain campaign, launched by threat actor TeamPCP on May 11, 2026, has confirmed impact across a broad range of organizations. CISA issued a formal advisory on an axios npm supply chain compromise [1], while SANS ISC's fifth campaign update documents the first confirmed victim disclosures [2]. Mandiant estimates more than 1,000 SaaS environments were compromised [3]. GitHub confirmed that approximately 3,800–4,000 internal repositories were stolen via a poisoned Nx Console VS Code extension [4][5]. CERT-EU confirmed a European Commission breach spanning 30 EU institutions via the Trivy container scanner [6], and Mistral AI acknowledged that 450 repositories are advertised for sale at $25,000 [7]. Concurrent with TeamPCP, a TrapDoor campaign hit 34+ packages across npm, PyPI, and crates.io [8], and an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [9], exploiting the trust signals that standard supply chain guidance recommends. A separate incident emerged on June 8: 73 Microsoft-signed open source packages were found to contain credential-stealing code designed to activate specifically when developers opened them in AI coding agents. GitHub removed the packages but described the action as a terms-of-service violation rather than warning that developer systems may be compromised; Microsoft did not publicly raise the possibility of malicious content until days after the removal [10]. It was the second such incident involving Microsoft packages within weeks.

AI-connected products are attack surfaces at multiple layers. SafeBreach Labs bypassed Google Gemini's prompt injection defenses for the second time using a technique called 'Fake Context Alignment,' which embeds malicious instructions in crafted WhatsApp messages so they appear to be legitimate conversation context; the attack covers six messaging platforms — WhatsApp, Slack, Signal, SMS, Instagram, and Messenger — any notification-reading surface Gemini has access to [11]. The same team had previously bypassed Gemini via Google Calendar invites. Google DeepMind separately documented that malicious websites can detect when an AI agent is browsing and serve it hidden content, including instructions buried in HTML comments invisible to human visitors, across six distinct attack types [12]. Microsoft patched four Copilot CVEs after documentation of data exfiltration via prompt injection [13][14], and a critical vulnerability in Starlette — the ASGI framework underlying FastAPI and MCP servers — affects an estimated 325 million weekly downloads [15]. Meta's AI support chatbot was used to take over high-profile Instagram accounts, including the Obama White House account, by asking the bot to substitute a new email address during a password reset; Meta had deployed the chatbot with direct account-modification authority and no identity verification, and issued an emergency patch on May 29, 2026 [16][17]. Simon Willison, analyzing OpenAI's Lockdown Mode feature, introduced the 'Lethal Trifecta' framework: access to private data, exposure to untrusted content, and an exfiltration channel together constitute the structural condition enabling LLM data theft; he argues Lockdown Mode addresses only the third leg, and that its existence implies ChatGPT in default settings does not robustly block determined exfiltration [18].

AI is advancing attack capability empirically. Anthropic's Frontier Red Team analyzed 832 banned malicious accounts and found 67.3% used AI specifically for writing malware [19]. Medium-to-high risk actors grew from 33% to 56% of the total between the two halves of the study — a roughly 1.7-fold increase — and the team found that traditional risk signals such as technique count no longer reliably distinguish sophisticated from novice attackers because AI enables less-skilled actors to perform post-compromise tasks like account discovery and lateral movement [19]. The report identifies a specific institutional gap: MITRE ATT&CK has no identifiers for agentic orchestration, where an AI model chains attack stages and executes with minimal human input [19]. Google's Threat Intelligence Group confirmed the first criminal AI-generated zero-day, targeting a hardcoded trust assumption in two-factor authentication logic [20][21], while Microsoft's MDASH multi-agent system independently discovered 16 Windows vulnerabilities including four critical remote code execution flaws [22][23].

Governance frameworks are falling behind capability on multiple dimensions. AISI evaluated Claude Mythos as the first AI to autonomously complete both UK offensive cyber ranges and measures autonomous AI cyber capability as doubling approximately every 4.7 months [24][25]. Anthropic's Project Glasswing, now at approximately 200 partner organizations across 15+ countries covering power, water, healthcare, communications, and hardware, reports more than 10,000 high- or critical-severity flaws found using Mythos Preview [26]. Anthropic justifies the expansion with a timing argument: other AI companies are projected to have Mythos-class models within 6–12 months, making controlled deployment under structured conditions preferable to waiting [26]. AuthMind and the Turing Institute's CETAS argue that this expansion deploys Mythos outside any CAISI review, which is precisely the unaudited frontier deployment their governance critique describes [27]. The NIST CAISI framework, formalized as the US pre-deployment compliance gate with agreements covering Google, Microsoft, and xAI [28], has not addressed the distinction between models submitted for review and those deployed under programs like Glasswing.

Timeline

2026-04-20: CISA issues formal advisory on the axios npm supply chain compromise [1]
2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [28]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [35][46][47]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a 2FA hardcoded trust assumption [20][21][37]
2026-05-12: Microsoft announces MDASH multi-agent security system, which discovered 16 Windows vulnerabilities including 4 critical RCE flaws [23][22]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; OpenAI mandates certificate rotation by June 12 [35][40][24]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [48][7][49]
2026-05-19: CVE-2026-33634 assigned (CVSS 9.4); Cloudflare publishes Glasswing findings confirming Mythos chains bugs into working exploits across 50+ repositories [50][34][33]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [4][5][31]
2026-05-23: AntV ecosystem attack confirmed with 600+ packages faking Sigstore provenance badges; CSA names campaign 'Shai-Hulud/Megalodon' [9][44]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor supply chain attack hits 34+ packages [6][3][8]
2026-05-25: Socket.dev identifies phishing of npm author 'Qix' as initial access vector; SANS ISC Update 005 reports first confirmed victim disclosures [51][2]
2026-05-26: Starlette/ASGI critical vulnerability disclosed affecting 325M weekly downloads including MCP servers; Microsoft patches four Copilot CVEs [15][13][14]
2026-05-28: Developer Johannes Link embeds a data-nuking prompt injection payload in jqwik 1.10.0 targeting AI coding agents that process it without human review [52]
2026-06-01: Meta's AI support chatbot exploited to take over high-profile Instagram accounts including the Obama White House account; Meta deployed an emergency patch on May 29 [16][17]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 broadly [26]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; MITRE ATT&CK gap for agentic orchestration identified [19]
2026-06-04: SafeBreach Labs bypasses Google Gemini defenses via WhatsApp 'Fake Context Alignment'; Google DeepMind documents agent-detecting malicious websites across six attack types [11][12]
2026-06-05: Simon Willison introduces 'Lethal Trifecta' framework and analyzes OpenAI Lockdown Mode as a partial but architecturally sound exfiltration defense [18]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as terms-of-service violation; Microsoft delays acknowledging malicious content [10]

Perspectives

GitHub / Microsoft

GitHub characterized the theft of 3,800–4,000 internal repositories as limited to internal code unaffecting customer data [4], and described removal of 73 malicious packages as a terms-of-service violation rather than warning developers their systems may be compromised [10].

Evolution: Both incidents follow the same pattern of minimizing disclosure framing; the packages incident reinforces rather than revises this posture.

[4][29][30][31][10]

Anthropic

Expanding Project Glasswing to ~200 partners on a proactive-defense rationale — controlled deployment now is preferable because Mythos-class capability will be widely available within 6–12 months — while FRT empirical data documents AI democratizing sophisticated post-compromise attack techniques [26][19].

Evolution: The FRT report adds empirical grounding to the Glasswing defensive rationale while independently documenting the offensive democratization dynamic the project is meant to counter.

[26][19][32][33][34]

OpenAI

Framed Mini Shai-Hulud as an industry-wide incident with limited scope; introduced Lockdown Mode as an architectural defense that removes the exfiltration channel using deterministic mechanisms not subject to AI subversion [18].

Evolution: Lockdown Mode is a concrete product response to documented exfiltration vulnerabilities, consistent with the defended-dual-use framing.

[35][36][18]

Google (GTIG / DeepMind)

GTIG confirmed the first criminal AI-generated zero-day targeting a 2FA trust assumption [20][21]; DeepMind separately documented malicious websites that detect AI agents and serve them hidden attack instructions across six distinct attack types [12].

Evolution: DeepMind's agent-targeting paper extends the GTIG zero-day finding from a single incident to a documented class of autonomous-agent vulnerabilities.

[20][21][37][12]

Simon Willison

Documents AI attack surfaces and frames failures as design choices; introduced the 'Lethal Trifecta' (private data access + untrusted content + exfiltration channel) as the structural condition enabling LLM data theft, noting OpenAI Lockdown Mode's existence implies ChatGPT's default posture does not robustly block exfiltration [18][17].

Evolution: Lockdown Mode analysis extends his design-failure thesis into a general architectural framework, moving from specific incident critique to a structural principle.

[38][39][17][18]

AISI (UK AI Safety Institute)

Claude Mythos is the first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability is doubling approximately every 4.7 months, warranting urgent governance attention [24][25].

Evolution: Consistent; Anthropic's 6–12 month competitive projection and FRT risk-actor growth data independently corroborate AISI's urgency framing.

[24][40][25][41]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models; Anthropic's Glasswing expansion to 200 partners without CAISI review is precisely the unaudited frontier deployment their governance critique describes [27].

Evolution: Consistent; Anthropic's FRT publication deepens the empirical case for urgency without addressing the framework governance critique.

[27][42][43][25]

SafeBreach Labs

Bypassed Google Gemini's prompt injection defenses for the second time using 'Fake Context Alignment' — crafted WhatsApp messages that embed malicious instructions as apparent conversation context — covering six messaging platforms [11].

Evolution: New voice this pass; the key finding is a pattern of repeated successful evasion of Google's patched defenses, not a single bypass.

[11]

Tensions

GitHub removed 73 malicious Microsoft-signed packages and described the action as a terms-of-service violation rather than warning developers their systems may be compromised; Ars Technica argues this framing misled users, and it matches GitHub's earlier characterization of the internal repo theft as limited impact unaffecting customers [10][4]. [10][4][30]
The axios CISA advisory is dated April 2026 [1], predating TeamPCP's May 11 launch, yet SANS ISC's fifth TeamPCP update tracks 'axios attribution narrowing' within that campaign [2]; whether these represent the same incident or two related ones is unresolved. [1][2]
The AntV attack's fake Sigstore provenance badges mean npm's primary recommended trust signal cannot detect this campaign; neither Sigstore nor the npm registry has issued a public response [9][44]. [9][44][45]
Anthropic argues deploying Glasswing to 200 partners under controlled conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [26][27]. [26][27][43][25]
Simon Willison argues the Meta AI Instagram exploit 'hardly even qualifies as prompt injection' — the failure was Meta granting its support bot direct account-modification authority without identity verification — while Ars Technica frames it as a prompt injection attack demonstrating AI's susceptibility to manipulation [16][17]. [16][17]
Willison concludes that OpenAI Lockdown Mode's existence implies ChatGPT in default settings does not robustly block determined exfiltration attacks; OpenAI has not directly contested this characterization [18]. [18]

Sources

[1] Supply Chain Compromise Impacts Axios Node Package Manager | CISA — reactive:openai-advanced-account-security
[2] TeamPCP Supply Chain Campaign: Update 005 - First Confirmed Victim Disclosure, Post-Compromise Cloud Enumeration Documented, and Axios Attribution Narrows — reactive:ai-security-nexus
[3] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[4] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[5] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[6] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[7] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[8] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[9] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[10] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[11] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[12] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[13] Microsoft 365 Copilot Information Disclosure CVEs (CVE-2026-26129, CVE-2026-26164, CVE-2026-33111) | PointGuard AI — reactive:ai-security-nexus
[14] CVE-2026-26137: Microsoft 365 Copilot SSRF Vulnerability — reactive:ai-security-nexus
[15] Millions of AI agents imperiled by critical vulnerability in open source package — Ars Technica AI (2026-05-26)
[16] Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts — Ars Technica AI (2026-06-01)
[17] Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked — Simon Willison (2026-06-01)
[18] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[19] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[20] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[21] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[22] Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday — reactive:ai-offensive-cyber
[23] Defense at AI speed: Microsoft's new multi-model agentic security ... — reactive:ai-offensive-cyber
[24] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[25] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[26] Expanding Project Glasswing — Anthropic News (2026-06-02)
[27] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[28] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[29] GitHub Breach via Malicious VS Code Extension: What You Need to ... — reactive:ai-security-nexus
[30] Nx Console VS Code Extension Compromised - StepSecurity — reactive:ai-security-nexus
[31] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[32] Project Glasswing: Securing critical software for the AI era - Anthropic — reactive:frontier-ai-cyber-capabilities
[33] Cloudflare says Anthropic's Mythos Preview finds exploit chains that earlier frontier models missed — reactive:ai-offensive-cyber
[34] Project Glasswing: what Mythos showed us - The Cloudflare Blog — reactive:ai-offensive-cyber
[35] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[36] Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber — OpenAI Blog (2026-05-07)
[37] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[38] Microsoft Copilot Cowork Exfiltrates Files — Simon Willison (2026-05-26)
[39] The pressure — Simon Willison (2026-05-26)
[40] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[41] Autonomous AI Cyber Capability Doubles Every Few Months — reactive:ai-offensive-cyber
[42] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[43] Kicking the Tires: A Voluntary Path to Pre-Deployment AI Vetting | The Foundation for American Innovation — reactive:ai-security-nexus
[44] Shai-Hulud/Megalodon: A Two-Wave AI Developer Supply Chain ... — reactive:ai-offensive-cybersecurity
[45] Mini Shai-Hulud npm Attack: AntV Ecosystem Compromise (May 2026) | Chainguard — reactive:ai-security-nexus
[46] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[47] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[48] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[49] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[50] GitHub - ugurrates/teampcp-supply-chain-attack: CVE-2026-33634 (CVSS 9.4) — The most impactful CI/CD supply chain attack of 2026 so far. · GitHub — reactive:ai-security-nexus
[51] npm Author Qix Compromised via Phishing Email in Major Suppl... — reactive:ai-security-nexus
[52] Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code — Ars Technica AI (2026-05-28)