AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment · history

Version 16

2026-06-13 19:03 UTC · 595 items

What

A connected set of campaigns and findings confirms AI is simultaneously weaponized by attackers, exploited as a new attack surface, and outpacing governance. The Mini Shai-Hulud supply chain campaign (TeamPCP) compromised 1,000+ SaaS environments, GitHub's internal repositories, and 30 EU institutions [1][4][2]. Google sued the Chinese Outsider Enterprise network for using Gemini to build phishing sites at scale — nearly 300 scam templates targeting customers without technical skills, generating 9,000 fake websites and 2.5 million fraudulent texts [17]. SafeBreach documented three distinct bypasses of Google Gemini's prompt injection defenses [8][9][10], and Anthropic's Frontier Red Team found medium-to-high risk attackers grew 1.7-fold among 832 banned accounts [18]. The Cloud Security Alliance confirmed neither MITRE ATT&CK nor MITRE ATLAS fully covers autonomous agentic orchestration [24].

Why it matters

AI is expanding both sides of the threat equation: it lowers the technical floor for attackers (phishing-as-a-service customers who lack independent skills [17], malware authorship up to 67% of banned accounts [18]) while opening new attack surfaces through AI assistants that execute attacker instructions and AI coding agents that activate malicious packages. The two primary threat-modeling frameworks defenders rely on both have documented blind spots for the attack mode growing fastest.

Open questions

SafeBreach bypassed Gemini's defenses using three distinct techniques [8][9][10] — has Google issued an architectural response, or only patched each vector individually?
The CSA finds both MITRE ATT&CK and MITRE ATLAS lack coverage for autonomous agentic orchestration [24] — are either framework bodies actively working to address this, and on what timeline?
Will the CAISI voluntary framework extend to govern Mythos-class deployment outside Anthropic's Glasswing program, given that ~200 partners already operate under it without CAISI review? [20][23]
The CISA axios advisory is dated April 2026 [25] yet SANS ISC tracks axios attribution within the TeamPCP campaign [26] — whether these represent the same incident or two related compromises remains unresolved.

Narrative

The Mini Shai-Hulud campaign, launched by threat actor TeamPCP on May 11, 2026, confirmed impact across more than 1,000 SaaS environments [1], approximately 3,800–4,000 GitHub internal repositories stolen via a poisoned VS Code extension [2][3], and 30 EU institutions breached via the Trivy container scanner [4]. Concurrent campaigns hit additional trust surfaces: an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [5], TrapDoor compromised 34+ packages across npm, PyPI, and crates.io [6], and 73 Microsoft-signed packages were found to contain credential-stealing code that activated specifically when developers opened them in AI coding agents [7]. GitHub removed the packages citing a terms-of-service violation rather than alerting developers that their systems may have been compromised; Microsoft did not publicly raise the possibility of malicious content until days after the removal [7].

AI-connected products are attack surfaces at multiple layers. SafeBreach documented at least three successful bypasses of Google Gemini's prompt injection defenses: via voice and audio prompt injection [8], via Google Calendar invites [9], and via crafted WhatsApp messages using 'Fake Context Alignment,' a technique that covers six messaging platforms including Slack, Signal, and Instagram [10]. Google DeepMind separately documented malicious websites that detect when an AI agent is browsing and serve it hidden attack instructions across six distinct attack types [11]. Microsoft patched four Copilot CVEs following documented data exfiltration via prompt injection [12][13]. Meta's AI support chatbot was exploited to take over high-profile Instagram accounts by asking it to substitute a new email during a password reset; Meta had deployed the bot with direct account-modification authority and no identity verification before issuing an emergency patch [14][15]. Simon Willison's 'Lethal Trifecta' framework — private data access, exposure to untrusted content, and an exfiltration channel — describes the structural condition enabling LLM data theft; he argues OpenAI's Lockdown Mode addresses only the exfiltration leg, implying ChatGPT's default configuration does not robustly block determined exfiltration [16].

AI is also being directly weaponized to run attacks at scale with reduced skill requirements. Google sued the Chinese Outsider Enterprise network, which operated via Telegram offering phishing-as-a-service toolkits that included instructions for using Gemini to build fraudulent websites impersonating Google, YouTube, and government agencies [17]. The group offered nearly 300 scam templates to customers lacking the technical skills to run phishing campaigns independently, enabling more than 2.5 million fraudulent texts to Android users and generating 9,000 fake websites and 1 million tracked URLs [17]. This operational case matches what Anthropic's Frontier Red Team found in aggregate data: among 832 banned malicious accounts, 67.3% used AI for writing malware, while medium-to-high risk actors grew from 33% to 56% — a roughly 1.7-fold increase — with traditional signals such as technique count no longer reliably separating sophisticated from novice attackers [18].

Governance is running behind capability. NIST's CAISI framework, formalized as the US pre-deployment compliance gate with agreements covering Google, Microsoft, and xAI [19], evaluates submitted models but does not cover deployment programs. Anthropic's Project Glasswing — now at approximately 200 partners in 15+ countries across power, water, healthcare, communications, and hardware — reports 10,000+ high- or critical-severity flaws found using Mythos Preview [20], and Anthropic argues controlled deployment under structured conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards [20]. AISI measured autonomous AI cyber capability as doubling approximately every 4.7 months, with Claude Mythos as the first AI to complete both UK offensive cyber ranges autonomously [21][22]. AuthMind and the Turing Institute's CETAS counter that Glasswing's expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [23].

Timeline

2026-04-20: CISA issues formal advisory on the axios npm supply chain compromise [25]
2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [19]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [45][46][47]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a 2FA hardcoded trust assumption [34][35][36]
2026-05-12: Microsoft announces MDASH multi-agent security system, which discovered 16 Windows vulnerabilities including 4 critical RCE flaws [48][49]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; OpenAI mandates certificate rotation by June 12 [45][39][21]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [50][51][52]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [2][3][33]
2026-05-23: AntV ecosystem attack confirmed with 600+ packages faking Sigstore provenance badges; CSA names campaign 'Shai-Hulud/Megalodon' [5][43]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor supply chain attack hits 34+ packages [4][1][6]
2026-05-25: Socket.dev identifies phishing of npm author 'Qix' as initial access vector; SANS ISC Update 005 reports first confirmed victim disclosures [53][26]
2026-05-26: Starlette/ASGI critical vulnerability disclosed affecting 325M weekly downloads including MCP servers; Microsoft patches four Copilot CVEs [54][12][13]
2026-05-29: Meta issues emergency patch after its AI support chatbot is exploited to take over high-profile Instagram accounts including the Obama White House account [14][15]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 [20]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; MITRE ATT&CK gap for agentic orchestration identified [18]
2026-06-04: SafeBreach bypasses Google Gemini defenses via WhatsApp 'Fake Context Alignment' (third documented bypass); Google DeepMind documents agent-detecting malicious websites across six attack types [10][11][8][9]
2026-06-05: Simon Willison introduces 'Lethal Trifecta' framework and analyzes OpenAI Lockdown Mode as addressing only the exfiltration leg of the structural condition enabling LLM data theft [16]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as terms-of-service violation; Microsoft delays acknowledging malicious content [7]
2026-06-11: CSA gap analysis confirms neither MITRE ATT&CK nor MITRE ATLAS fully covers autonomous agentic orchestration [24]
2026-06-12: Google sues Chinese Outsider Enterprise network for using Gemini to run phishing-as-a-service at scale: ~300 scam templates, 2.5M fraudulent texts, 9,000 fake websites, 1M tracked URLs [17]

Perspectives

SafeBreach Labs

Has documented at least three successful bypasses of Google Gemini's prompt injection defenses using three distinct techniques: voice/audio injection [8], Google Calendar invites [9], and WhatsApp 'Fake Context Alignment' across six messaging platforms [10].

Evolution: Three bypasses documented; the pattern of repeated evasion of patched defenses is consistent across all three findings.

[8][9][10][27]

Anthropic

Expanding Project Glasswing to ~200 partners on a proactive-defense rationale — controlled deployment now is preferable because Mythos-class capability will be widely available within 6–12 months — while FRT empirical data documents AI democratizing sophisticated post-compromise attack techniques [20][18].

Evolution: FRT report adds empirical grounding to the Glasswing defensive rationale while independently documenting the offensive democratization dynamic the project is meant to counter.

[20][18][28][29][30]

GitHub / Microsoft

GitHub characterized the theft of 3,800–4,000 internal repositories as limited to internal code unaffecting customer data [2], and described removal of 73 malicious packages as a terms-of-service violation rather than warning developers their systems may be compromised [7].

Evolution: Both incidents follow the same pattern of minimizing disclosure framing; the packages incident reinforces rather than revises this posture.

[2][31][32][33][7]

Google (GTIG / DeepMind / Legal)

GTIG confirmed the first criminal AI-generated zero-day [34][35]; DeepMind documented malicious websites targeting AI agents across six attack types [11]; Google Legal sued Outsider Enterprise for using Gemini to deliver phishing-as-a-service to non-technical customers at scale [17].

Evolution: The Outsider Enterprise lawsuit adds a legal enforcement dimension to Google's previously research-only posture, while simultaneously illustrating that Google's own model was weaponized against its users.

[34][35][36][11][17]

Cloud Security Alliance

Published a formal gap analysis concluding that both MITRE ATT&CK and MITRE ATLAS lack adequate coverage for autonomous agentic orchestration, confirming the agentic blind spot is not remedied by using ATLAS alongside ATT&CK [24].

Evolution: Consistent; their finding extends the ATT&CK agentic gap Anthropic FRT identified to cover ATLAS as well.

[24]

Simon Willison

Introduced the 'Lethal Trifecta' (private data access + untrusted content + exfiltration channel) as the structural condition enabling LLM data theft; argues Lockdown Mode addresses only the third leg, implying ChatGPT's default posture does not robustly block exfiltration [16][15].

Evolution: Lockdown Mode analysis extends his design-failure thesis into a general architectural principle, moving from specific incident critique to a structural framework.

[37][38][15][16]

AISI (UK AI Safety Institute)

Claude Mythos is the first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability is doubling approximately every 4.7 months, warranting urgent governance attention [21][22].

Evolution: Consistent; Anthropic's 6–12 month competitive projection and FRT risk-actor growth data independently corroborate AISI's urgency framing.

[21][39][22][40]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models; Anthropic's Glasswing expansion to 200 partners without CAISI review is precisely the unaudited frontier deployment their governance critique describes [23].

Evolution: Consistent; Anthropic's FRT publication deepens the empirical case for urgency without addressing the framework governance critique.

[23][41][42][22]

Tensions

GitHub removed 73 malicious Microsoft-signed packages and described the action as a terms-of-service violation rather than warning developers their systems may be compromised; Ars Technica argues this framing misled users, matching GitHub's earlier characterization of the internal repo theft as limited impact [7][2]. [7][2][32]
Anthropic argues deploying Glasswing to 200 partners under controlled conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [20][23]. [20][23][42][22]
Anthropic FRT identified MITRE ATT&CK's lack of agentic orchestration identifiers as a gap [18]; the CSA gap analysis confirms MITRE ATLAS also fails to cover this attack mode [24], but neither MITRE body has publicly responded to or acknowledged the gap. [18][24]
Willison concludes that OpenAI Lockdown Mode's existence implies ChatGPT in default settings does not robustly block determined exfiltration; OpenAI has not directly contested this characterization [16]. [16]
Willison argues the Meta AI Instagram exploit 'hardly even qualifies as prompt injection' — the failure was Meta granting its support bot direct account-modification authority without identity verification — while Ars Technica frames it as a prompt injection attack demonstrating AI's susceptibility to manipulation [14][15]. [14][15]
The AntV attack's fake Sigstore provenance badges mean npm's primary recommended trust signal cannot detect this campaign; neither Sigstore nor the npm registry has issued a public response [5][43]. [5][43][44]

Sources

[1] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[2] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[3] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[4] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[5] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[6] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[7] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[8] Exploiting Gemini via Prompt Injection | SafeBreach Original Research — reactive:ai-security-nexus
[9] Invitation Is All You Need: Hacking Gemini | SafeBreach — reactive:ai-security-nexus
[10] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[11] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[12] Microsoft 365 Copilot Information Disclosure CVEs (CVE-2026-26129, CVE-2026-26164, CVE-2026-33111) | PointGuard AI — reactive:ai-security-nexus
[13] CVE-2026-26137: Microsoft 365 Copilot SSRF Vulnerability — reactive:ai-security-nexus
[14] Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts — Ars Technica AI (2026-06-01)
[15] Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked — Simon Willison (2026-06-01)
[16] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[17] Google sues Chinese cybercrime network that used Gemini to automate scams — Ars Technica AI (2026-06-12)
[18] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[19] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[20] Expanding Project Glasswing — Anthropic News (2026-06-02)
[21] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[22] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[23] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[24] MITRE ATT&CK and ATLAS Agentic Gap Analysis - Lab Space — reactive:ai-security-nexus
[25] Supply Chain Compromise Impacts Axios Node Package Manager | CISA — reactive:openai-advanced-account-security
[26] TeamPCP Supply Chain Campaign: Update 005 - First Confirmed Victim Disclosure, Post-Compromise Cloud Enumeration Documented, and Axios Attribution Narrows — reactive:ai-security-nexus
[27] Exploiting Gemini via Prompt Injection | Or Yair - LinkedIn — reactive:ai-security-nexus
[28] Project Glasswing: Securing critical software for the AI era - Anthropic — reactive:frontier-ai-cyber-capabilities
[29] Cloudflare says Anthropic's Mythos Preview finds exploit chains that earlier frontier models missed — reactive:ai-offensive-cyber
[30] Project Glasswing: what Mythos showed us - The Cloudflare Blog — reactive:ai-offensive-cyber
[31] GitHub Breach via Malicious VS Code Extension: What You Need to ... — reactive:ai-security-nexus
[32] Nx Console VS Code Extension Compromised - StepSecurity — reactive:ai-security-nexus
[33] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[34] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[35] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[36] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[37] Microsoft Copilot Cowork Exfiltrates Files — Simon Willison (2026-05-26)
[38] The pressure — Simon Willison (2026-05-26)
[39] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[40] Autonomous AI Cyber Capability Doubles Every Few Months — reactive:ai-offensive-cyber
[41] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[42] Kicking the Tires: A Voluntary Path to Pre-Deployment AI Vetting | The Foundation for American Innovation — reactive:ai-security-nexus
[43] Shai-Hulud/Megalodon: A Two-Wave AI Developer Supply Chain ... — reactive:ai-offensive-cybersecurity
[44] Mini Shai-Hulud npm Attack: AntV Ecosystem Compromise (May 2026) | Chainguard — reactive:ai-security-nexus
[45] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[46] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[47] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[48] Defense at AI speed: Microsoft's new multi-model agentic security ... — reactive:ai-offensive-cyber
[49] Microsoft's MDASH AI System Finds 16 Windows Flaws Fixed in Patch Tuesday — reactive:ai-offensive-cyber
[50] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[51] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[52] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[53] npm Author Qix Compromised via Phishing Email in Major Suppl... — reactive:ai-security-nexus
[54] Millions of AI agents imperiled by critical vulnerability in open source package — Ars Technica AI (2026-05-26)