AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment · history

Version 14

2026-06-10 08:12 UTC · 585 items

What

The Mini Shai-Hulud supply chain campaign (TeamPCP) confirmed scope across 1,000+ SaaS environments, GitHub's internal repositories, and 30 EU institutions [1][4][2]; a parallel incident found 73 Microsoft-signed packages with AI-agent-triggered credential stealers removed by GitHub as a terms-of-service violation, with slow disclosure of possible system compromise [7]. SafeBreach has now documented three distinct successful bypasses of Google Gemini's prompt injection defenses — via voice/audio injection [8], via Google Calendar invites [9], and via WhatsApp 'Fake Context Alignment' across six messaging platforms [10]. Anthropic's Frontier Red Team reports medium-to-high risk attackers grew 1.7-fold among 832 banned accounts, while MITRE ATT&CK lacks identifiers for agentic orchestration; the adjacent MITRE ATLAS framework documents 16 AI-specific tactics and 84 techniques [17][18]. Autonomous AI cyber capability is doubling approximately every 4.7 months [22], and governance frameworks have not kept pace.

Why it matters

AI is simultaneously widening the attacker pool — lowering the skill floor for post-compromise work [17] — and expanding the attack surface through supply chain packages that activate in AI coding agents [7] and AI assistants that execute attacker instructions embedded in ordinary notifications [10]. Three repeated bypasses of the same product's patched defenses by the same research team [8][9][10] indicate the structural problem remains unaddressed, not just the specific vectors.

Open questions

SafeBreach bypassed Gemini's defenses at least three times using three different techniques [8][9][10] — has Google issued an architectural response, or only patched each vector individually?
MITRE ATLAS documents AI-specific threats with 16 tactics and 84 techniques [18], but whether it covers autonomous agentic orchestration — the mode Anthropic FRT specifically flags as absent from ATT&CK [17] — is unresolved.
Will the CAISI voluntary framework extend to govern Mythos-class deployment outside Anthropic's Glasswing program, given that ~200 partners already operate under it without CAISI review? [24][25]
The axios CISA advisory is dated April 2026 [26] yet SANS ISC tracks axios attribution within the TeamPCP campaign [27] — whether these represent the same incident or two related compromises remains unresolved.

Narrative

The Mini Shai-Hulud campaign, launched by threat actor TeamPCP on May 11, 2026, confirmed impact across more than 1,000 SaaS environments [1], approximately 3,800–4,000 GitHub internal repositories stolen via a poisoned VS Code extension [2][3], and 30 EU institutions breached via the Trivy container scanner [4]. Concurrent campaigns hit additional trust surfaces: an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [5], TrapDoor compromised 34+ packages across npm, PyPI, and crates.io [6], and 73 Microsoft-signed packages were found to contain credential-stealing code that activated specifically when developers opened them in AI coding agents [7]. GitHub removed the packages citing a terms-of-service violation rather than alerting developers that their systems may have been compromised; Microsoft did not publicly raise the possibility of malicious content until days after the removal [7].

AI-connected products are attack surfaces at multiple layers. SafeBreach documented at least three successful bypasses of Google Gemini's prompt injection defenses: via voice and audio prompt injection [8], via Google Calendar invites [9], and via crafted WhatsApp messages using 'Fake Context Alignment,' a technique that embeds malicious instructions as apparent conversation context and covers six messaging platforms including Slack, Signal, and Instagram [10]. Google DeepMind separately documented malicious websites that detect when an AI agent is browsing and serve it hidden attack instructions across six distinct attack types [11]. Microsoft patched four Copilot CVEs following documented data exfiltration via prompt injection [12][13]. Meta's AI support chatbot was exploited to take over high-profile Instagram accounts by asking it to substitute a new email during a password reset; Meta had deployed the bot with direct account-modification authority and no identity verification before issuing an emergency patch [14][15]. Simon Willison's 'Lethal Trifecta' framework — private data access, exposure to untrusted content, and an exfiltration channel — describes the structural condition enabling LLM data theft; he argues OpenAI's Lockdown Mode addresses only the exfiltration leg, and its existence implies ChatGPT's default configuration does not robustly block determined exfiltration [16].

AI is advancing attack capability on a documented empirical basis. Anthropic's Frontier Red Team analyzed 832 banned malicious accounts over one year and found 67.3% used AI for writing malware, while medium-to-high risk actors grew from 33% to 56% of the total — a roughly 1.7-fold increase — with traditional risk signals such as technique count no longer reliably separating sophisticated from novice attackers [17]. The report identifies a specific gap: MITRE ATT&CK has no identifiers for agentic orchestration, where an AI model chains attack stages and executes with minimal human input [17]; the related MITRE ATLAS framework documents 16 tactics and 84 techniques specifically for AI-system threats [18], though whether it addresses autonomous multi-stage agentic attacks is unresolved. Google's Threat Intelligence Group confirmed the first criminal AI-generated zero-day targeting a hardcoded 2FA trust assumption [19][20], and AISI measured autonomous AI cyber capability as doubling approximately every 4.7 months, with Claude Mythos as the first AI to complete both UK offensive cyber ranges autonomously [21][22].

Governance is running behind capability. NIST's CAISI framework, formalized as the US pre-deployment compliance gate with agreements covering Google, Microsoft, and xAI [23], evaluates submitted models but does not cover deployment programs. Anthropic's Project Glasswing — now at approximately 200 partners in 15+ countries across power, water, healthcare, communications, and hardware — reports 10,000+ high- or critical-severity flaws found using Mythos Preview [24], and Anthropic argues controlled deployment under structured conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards [24]. AuthMind and the Turing Institute's CETAS counter that Glasswing's expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [25].

Timeline

2026-04-20: CISA issues formal advisory on the axios npm supply chain compromise [26]
2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [23]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [35][45][46]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a 2FA hardcoded trust assumption [19][20][34]
2026-05-12: Microsoft announces MDASH multi-agent security system, which discovered 16 Windows vulnerabilities including 4 critical RCE flaws [47][48]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; OpenAI mandates certificate rotation by June 12 [35][39][21]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [49][50][51]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [2][3][33]
2026-05-23: AntV ecosystem attack confirmed with 600+ packages faking Sigstore provenance badges; CSA names campaign 'Shai-Hulud/Megalodon' [5][43]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor supply chain attack hits 34+ packages [4][1][6]
2026-05-25: Socket.dev identifies phishing of npm author 'Qix' as initial access vector; SANS ISC Update 005 reports first confirmed victim disclosures [52][27]
2026-05-26: Starlette/ASGI critical vulnerability disclosed affecting 325M weekly downloads including MCP servers; Microsoft patches four Copilot CVEs [53][12][13]
2026-05-28: Developer embeds a data-nuking prompt injection payload in jqwik 1.10.0 targeting AI coding agents that process it without human review [54]
2026-05-29: Meta issues emergency patch after its AI support chatbot is exploited to take over high-profile Instagram accounts including the Obama White House account [14][15]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 [24]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; MITRE ATT&CK gap for agentic orchestration identified [17]
2026-06-04: SafeBreach Labs bypasses Google Gemini defenses via WhatsApp 'Fake Context Alignment' (third documented bypass); Google DeepMind documents agent-detecting malicious websites across six attack types [10][11][8][9]
2026-06-05: Simon Willison introduces 'Lethal Trifecta' framework and analyzes OpenAI Lockdown Mode as addressing only the exfiltration leg of the structural condition enabling LLM data theft [16]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as terms-of-service violation; Microsoft delays acknowledging malicious content [7]
2026-06-09: Community analysis of Anthropic FRT report confirms AI is pushing attacks into later kill-chain stages previously requiring skilled operators [55][56]

Perspectives

SafeBreach Labs

Has documented at least three successful bypasses of Google Gemini's prompt injection defenses using three distinct techniques: voice/audio injection [8], Google Calendar invites [9], and WhatsApp 'Fake Context Alignment' across six messaging platforms [10].

Evolution: Three bypasses now documented rather than two; the pattern of repeated evasion of patched defenses is more established this pass.

[8][9][10]

Anthropic

Expanding Project Glasswing to ~200 partners on a proactive-defense rationale — controlled deployment now is preferable because Mythos-class capability will be widely available within 6–12 months — while FRT empirical data documents AI democratizing sophisticated post-compromise attack techniques [24][17].

Evolution: FRT report adds empirical grounding to the Glasswing defensive rationale while independently documenting the offensive democratization dynamic the project is meant to counter.

[24][17][28][29][30]

GitHub / Microsoft

GitHub characterized the theft of 3,800–4,000 internal repositories as limited to internal code unaffecting customer data [2], and described removal of 73 malicious packages as a terms-of-service violation rather than warning developers their systems may be compromised [7].

Evolution: Both incidents follow the same pattern of minimizing disclosure framing; the packages incident reinforces rather than revises this posture.

[2][31][32][33][7]

Google (GTIG / DeepMind)

GTIG confirmed the first criminal AI-generated zero-day targeting a 2FA trust assumption [19][20]; DeepMind documented malicious websites that detect AI agents and serve them hidden attack instructions across six attack types [11].

Evolution: DeepMind's agent-targeting paper extends the single GTIG zero-day finding to a documented class of autonomous-agent vulnerabilities.

[19][20][34][11]

OpenAI

Framed Mini Shai-Hulud as an industry-wide incident with limited scope; introduced Lockdown Mode as an architectural defense removing the exfiltration channel via deterministic mechanisms not subject to AI subversion [16].

Evolution: Lockdown Mode is a concrete product response to documented exfiltration vulnerabilities, consistent with a defended-dual-use posture.

[35][36][16]

Simon Willison

Introduced the 'Lethal Trifecta' (private data access + untrusted content + exfiltration channel) as the structural condition enabling LLM data theft; argues Lockdown Mode addresses only the third leg, and its existence implies ChatGPT's default posture does not robustly block exfiltration [16][15].

Evolution: Lockdown Mode analysis extends his design-failure thesis into a general architectural principle, moving from specific incident critique to a structural framework.

[37][38][15][16]

AISI (UK AI Safety Institute)

Claude Mythos is the first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability is doubling approximately every 4.7 months, warranting urgent governance attention [21][22].

Evolution: Consistent; Anthropic's 6–12 month competitive projection and FRT risk-actor growth data independently corroborate AISI's urgency framing.

[21][39][22][40]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models; Anthropic's Glasswing expansion to 200 partners without CAISI review is precisely the unaudited frontier deployment their governance critique describes [25].

Evolution: Consistent; Anthropic's FRT publication deepens the empirical case for urgency without addressing the framework governance critique.

[25][41][42][22]

Tensions

GitHub removed 73 malicious Microsoft-signed packages and described the action as a terms-of-service violation rather than warning developers their systems may be compromised; Ars Technica argues this framing misled users, and it matches GitHub's earlier characterization of the internal repo theft as limited impact [7][2]. [7][2][32]
Anthropic argues deploying Glasswing to 200 partners under controlled conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [24][25]. [24][25][42][22]
The axios CISA advisory is dated April 2026 [26], predating TeamPCP's May 11 launch, yet SANS ISC's fifth TeamPCP update tracks axios attribution within that campaign [27]; whether these represent the same incident or two related ones is unresolved. [26][27]
Willison concludes that OpenAI Lockdown Mode's existence implies ChatGPT in default settings does not robustly block determined exfiltration; OpenAI has not directly contested this characterization [16]. [16]
Willison argues the Meta AI Instagram exploit 'hardly even qualifies as prompt injection' — the failure was Meta granting its support bot direct account-modification authority without identity verification — while Ars Technica frames it as a prompt injection attack demonstrating AI's susceptibility to manipulation [14][15]. [14][15]
The AntV attack's fake Sigstore provenance badges mean npm's primary recommended trust signal cannot detect this campaign; neither Sigstore nor the npm registry has issued a public response [5][43]. [5][43][44]

Sources

[1] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[2] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[3] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[4] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[5] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[6] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[7] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[8] Exploiting Gemini via Prompt Injection | SafeBreach Original Research — reactive:ai-security-nexus
[9] Invitation Is All You Need: Hacking Gemini | SafeBreach — reactive:ai-security-nexus
[10] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[11] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[12] Microsoft 365 Copilot Information Disclosure CVEs (CVE-2026-26129, CVE-2026-26164, CVE-2026-33111) | PointGuard AI — reactive:ai-security-nexus
[13] CVE-2026-26137: Microsoft 365 Copilot SSRF Vulnerability — reactive:ai-security-nexus
[14] Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts — Ars Technica AI (2026-06-01)
[15] Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked — Simon Willison (2026-06-01)
[16] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[17] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[18] MITRE ATLAS: AI security framework with 16 tactics and 84 techniques — reactive:ai-security-nexus
[19] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[20] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[21] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[22] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[23] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[24] Expanding Project Glasswing — Anthropic News (2026-06-02)
[25] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[26] Supply Chain Compromise Impacts Axios Node Package Manager | CISA — reactive:openai-advanced-account-security
[27] TeamPCP Supply Chain Campaign: Update 005 - First Confirmed Victim Disclosure, Post-Compromise Cloud Enumeration Documented, and Axios Attribution Narrows — reactive:ai-security-nexus
[28] Project Glasswing: Securing critical software for the AI era - Anthropic — reactive:frontier-ai-cyber-capabilities
[29] Cloudflare says Anthropic's Mythos Preview finds exploit chains that earlier frontier models missed — reactive:ai-offensive-cyber
[30] Project Glasswing: what Mythos showed us - The Cloudflare Blog — reactive:ai-offensive-cyber
[31] GitHub Breach via Malicious VS Code Extension: What You Need to ... — reactive:ai-security-nexus
[32] Nx Console VS Code Extension Compromised - StepSecurity — reactive:ai-security-nexus
[33] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[34] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[35] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[36] Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber — OpenAI Blog (2026-05-07)
[37] Microsoft Copilot Cowork Exfiltrates Files — Simon Willison (2026-05-26)
[38] The pressure — Simon Willison (2026-05-26)
[39] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[40] Autonomous AI Cyber Capability Doubles Every Few Months — reactive:ai-offensive-cyber
[41] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[42] Kicking the Tires: A Voluntary Path to Pre-Deployment AI Vetting | The Foundation for American Innovation — reactive:ai-security-nexus
[43] Shai-Hulud/Megalodon: A Two-Wave AI Developer Supply Chain ... — reactive:ai-offensive-cybersecurity
[44] Mini Shai-Hulud npm Attack: AntV Ecosystem Compromise (May 2026) | Chainguard — reactive:ai-security-nexus
[45] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[46] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[47] Defense at AI speed: Microsoft's new multi-model agentic security ... — reactive:ai-offensive-cyber
[48] Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday — reactive:ai-offensive-cyber
[49] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[50] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[51] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[52] npm Author Qix Compromised via Phishing Email in Major Suppl... — reactive:ai-security-nexus
[53] Millions of AI agents imperiled by critical vulnerability in open source package — Ars Technica AI (2026-05-26)
[54] Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code — Ars Technica AI (2026-05-28)
[55] Highlights from Anthropic Red Team Report on “how threat actors are weaponizing AI to conduct cyber operations, mapped o... — reactive:ai-security-nexus (2026-06-09)
[56] Anthropic mapped 832 banned accounts (Mar 2025-Mar 2026) to MITRE ATT&CK. Key signal: AI is pushing attacks into lat... — reactive:ai-security-nexus (2026-06-03)