AI as Attack Tool and Attack Target: May 2026 Cybersecurity Moment

cooling · v24 · 2026-07-02 · 651 items · history

What's new in v24

Brave documented indirect prompt injection in Perplexity Comet [14], giving the AI browser attack class — previously a research demonstration against unnamed systems — a specific named commercial product. Industry security teams at NVIDIA [15], Cisco [16], and HiddenLayer [17] each published analyses characterizing prompt injection bypass as a structural problem, suggesting the critique is moving from research frontier into mainstream enterprise security. No new governance developments, supply chain events, or shifts in the Glasswing/Daybreak picture this pass.

What

AI operates simultaneously as an offensive tool and a production attack surface. Two frontier cyber programs — Anthropic's Glasswing (~200 partners, 15+ countries) and OpenAI's Daybreak (GPT-5.5-Cyber at 85.6% on CyberGym, seven government agreements) — are deployed under self-certified access controls with no common external audit standard [18][20]. Brave has documented indirect prompt injection in Perplexity Comet specifically [14], moving the AI browser attack class from a June 30 research demonstration [13] to a named commercial product. The Five Eyes alliance warned that AI capable of severe attacks on governments could arrive within months [24][23], consistent with autonomous AI cyber capability doubling every 4.7 months [25].

Why it matters

Prompt injection in AI browsers has moved from a research demonstration against unnamed systems to a documented vulnerability in a specific commercial product, extending the structural guardrail critique to shipped software. Two high-capability cyber AI programs operating at scale without common governance, against measured capability growth and official intelligence warnings, leave the question of what happens when those programs cross current safety thresholds entirely unaddressed.

Open questions

Brave documented indirect prompt injection in Perplexity Comet [14] and researchers showed AI browsers can be tricked into accepting a false reality disabling guardrails [13] — does this attack class extend to agentic desktop AI systems broadly, or is the attack surface bounded to browser-specific context handling?
The Five Eyes warned severe AI-enabled attacks could arrive 'within months' [24][23] — do Glasswing or Daybreak deployment terms include circuit-breakers if capability thresholds are crossed during their operating windows [25]?
Willison's experiment showed 0 successes in 6,000 injection attempts via anti-injection training [27], while role confusion research identifies style-based privilege parsing as the structural cause of defense failure [26] — does training-based resistance address the structural issue or only raise the difficulty bar?
Neither Glasswing nor Daybreak is subject to CAISI review [20][18][22] — is any common governance standard being developed, and will the Five Eyes advisory accelerate that work?

Narrative

The Mini Shai-Hulud supply chain campaign, launched by threat actor TeamPCP on May 11, 2026, compromised more than 1,000 SaaS environments [1], stole approximately 3,800–4,000 GitHub internal repositories via a poisoned VS Code extension [2][3], and breached 30 EU institutions via the Trivy container scanner [4]. Concurrent campaigns hit adjacent developer trust surfaces: an AntV ecosystem attack faked Sigstore provenance badges across 600+ packages [5], TrapDoor compromised 34+ packages across npm, PyPI, and crates.io [6], and 73 Microsoft-signed packages contained credential-stealing code that activated when developers opened them in AI coding agents [7]. GitHub described the packages' removal as a terms-of-service violation; Ars Technica argued that framing misled users about potential system compromise.

AI-connected products are under attack at multiple layers. A maximum-critical vulnerability in Microsoft 365 Copilot allowed attackers to steal 2FA codes and sensitive email data by embedding instructions in third-party content [8]. SafeBreach documented three bypasses of Google Gemini's defenses via voice injection, Calendar invites, and WhatsApp Fake Context Alignment [9][10][11]. Google sued the Chinese Outsider Enterprise network for phishing-as-a-service at $88/week — 290+ pre-built templates and instructions for using Gemini to build fake websites — with the FBI linking the network to 3.87 million stolen credit card numbers and an estimated $1.9 billion in losses [12]. Researchers reported June 30 that AI browsers can be tricked into accepting a false reality where safety guardrails no longer apply, enabling extraction of credentials from built-in password managers or code from private repositories; Ars Technica argued that current guardrail approaches treat symptoms rather than the underlying flaw of blurring passive web browsing with active LLM instruction execution [13]. Brave subsequently documented indirect prompt injection in Perplexity Comet specifically [14], moving the AI browser attack class from a research demonstration against unnamed systems to a named commercial product. Industry security teams at NVIDIA, Cisco, and HiddenLayer have each published analyses characterizing prompt injection as a structural problem that guardrails alone cannot solve [15][16][17], suggesting this view is broadening from research community to mainstream enterprise security discourse.

Two competing frontier programs are deployed for defense with contested governance. Anthropic's Project Glasswing operates with approximately 200 partners in 15+ countries across critical infrastructure sectors [18]; OpenAI's Daybreak, launched June 22, includes GPT-5.5-Cyber at 85.6% on CyberGym — the highest single-model score measured — and Codex Security, which has scanned 30 million commits across 30,000+ codebases [19][20]. OpenAI signed Trusted Access for Cyber agreements with seven governments including ENISA and restricts GPT-5.5-Cyber to verified defenders [20]. Neither program is subject to CAISI review, and the Cloud Security Alliance confirmed that both MITRE ATT&CK and MITRE ATLAS lack coverage for autonomous agentic orchestration [21]. Governance critics argue two programs competing on capability while self-certifying access controls creates the gap CAISI was designed to prevent [22].

On June 23, the Five Eyes alliance issued an advisory confirmed via the official NSA press release [23], warning that AI models capable of severe attacks on governments and businesses could arrive within months [24] — consistent with AISI's earlier measurement of autonomous AI cyber capability doubling approximately every 4.7 months [25]. The prompt injection picture is contested at the structural level: Ars Technica and role confusion researchers argue current LLMs have no fundamental fix, with style-based privilege parsing identified as the underlying cause [8][26]. Against this, Simon Willison reported June 26 that 6,000 injection attempts by approximately 2,000 participants failed to extract secrets from a Claude Opus 4.6 instance, crediting investment by frontier labs in anti-injection training; Willison simultaneously cautioned that the result provides no formal guarantee and that deploying systems where injection could cause irreversible damage remains inadvisable [27].

Timeline

2026-05-05: NIST's CAISI formalized as US pre-deployment AI compliance gate with agreements covering Google, Microsoft, and xAI [44]
2026-05-11: TeamPCP launches Mini Shai-Hulud; 160+ npm and PyPI packages compromised; two OpenAI employee devices breached with code-signing certificates exfiltrated [45][46][47]
2026-05-11: Google GTIG intercepts the first confirmed criminal AI-generated zero-day targeting a hardcoded 2FA trust assumption [39][48][49]
2026-05-13: AISI evaluates Claude Mythos Preview as first AI to autonomously complete both UK offensive cyber ranges; autonomous AI cyber capability measured as doubling every 4.7 months [45][50][25]
2026-05-18: TeamPCP advertises Mistral AI source code — 450 repositories — for sale at $25,000; Mistral confirms impact [51][52][53]
2026-05-20: GitHub confirms theft of approximately 3,800–4,000 internal repositories via a poisoned Nx Console VS Code extension [2][3][54]
2026-05-24: CERT-EU confirms European Commission breach across 30 EU institutions via Trivy; Mandiant quantifies 1,000+ SaaS compromises; TrapDoor hits 34+ packages; AntV ecosystem fakes Sigstore badges across 600+ packages [4][1][6][5]
2026-06-02: Anthropic expands Project Glasswing to ~200 partners in 15+ countries; reports 10,000+ critical flaws found; releases Claude Security on Opus 4.8 [18]
2026-06-03: Anthropic FRT publishes empirical data on 832 banned attackers: medium-to-high risk actors up 1.7-fold; AI use in post-compromise phases rose 8.9% [32][33]
2026-06-04: SafeBreach documents third Gemini bypass via WhatsApp 'Fake Context Alignment'; Google DeepMind documents agents detecting malicious websites across six attack types [11][40][9][10]
2026-06-08: 73 Microsoft-signed packages with AI-agent-triggered credential stealers blocked; GitHub describes removal as a terms-of-service violation [7]
2026-06-12: Google sues Chinese Outsider Enterprise for using Gemini to run phishing-as-a-service: $88/week, 290+ templates, $1.9B in estimated losses, 3.87M stolen credit card numbers per FBI [41][55][12]
2026-06-16: Ars Technica reports maximum-critical M365 Copilot vulnerability allowed 2FA code theft via prompt injection; researchers characterize prompt injection as structurally unfixable in current LLMs [8]
2026-06-22: OpenAI launches Daybreak: GPT-5.5-Cyber achieves 85.6% on CyberGym (highest single-model score); Codex Security has scanned 30M+ commits; Trusted Access for Cyber signed with 7 governments including ENISA [19][20][35][36][37]
2026-06-22: Role confusion research: LLMs parse privilege from text style rather than prompt position; destyling drops injection success from 61% to 10% [26]
2026-06-23: Five Eyes alliance issues joint advisory — confirmed via official NSA statement — warning that AI models capable of severe attacks on governments and businesses could arrive within months [24][23][56][57][58]
2026-06-26: Willison reports 6,000 prompt injection attempts across ~2,000 participants failed against a Claude Opus 4.6 instance; credits anti-injection training while cautioning against treating the result as a production security guarantee [27]
2026-06-30: Researchers demonstrate 'false reality' attack on AI browsers enabling credential and code theft by disabling guardrails; Ars Technica argues the architectural flaw cannot be resolved through reactive guardrail additions [13]
2026-07-01: Brave documents indirect prompt injection in Perplexity Comet, moving the AI browser attack class from research demonstration to a named commercial product [14]

Perspectives

Five Eyes Alliance

Issued a joint advisory, confirmed on NSA.gov, that AI models capable of severe attacks on governments and businesses could arrive within months, characterizing AI as making devastating cyberattacks far easier for malicious actors in the near term.

Evolution: Consistent; broad mainstream coverage followed without adding new claims.

[24][23][28][29][30][31]

Anthropic

Expanding Project Glasswing to ~200 partners under a proactive-defense rationale; FRT empirical data documents AI democratizing sophisticated post-compromise techniques.

Evolution: Consistent.

[18][32][33][34]

OpenAI

Launched Daybreak with GPT-5.5-Cyber (85.6% CyberGym, highest single-model score) and Codex Security; argues AI has shifted the bottleneck from discovering to patching vulnerabilities; restricts GPT-5.5-Cyber to verified defenders under Trusted Access for Cyber agreements with seven governments.

Evolution: Consistent since Daybreak's launch.

[19][20][35][36][37]

Simon Willison

His June 26 experiment — 6,000 failed injection attempts across ~2,000 participants — provides empirical evidence that anti-injection training works in controlled settings, while he advises against deploying systems where injection could cause irreversible damage.

Evolution: His experiment partially qualifies his prior 'perpetual whack-a-mole' framing: training-based resistance is measurably effective in controlled settings, but he stops short of endorsing current architectures as structurally fixed.

[38][26][27]

SafeBreach Labs

Has documented three successful bypasses of Google Gemini's prompt injection defenses: voice/audio injection, Google Calendar invites, and WhatsApp 'Fake Context Alignment' across six messaging platforms.

Evolution: Consistent; the pattern of repeated circumvention of individually patched defenses independently supports the view that per-vector guardrails are insufficient.

[9][10][11]

GitHub / Microsoft

Characterized theft of ~3,800 internal repositories as limited impact, described removal of 73 malicious packages as a terms-of-service violation, and patched a maximum-critical Copilot vulnerability enabling 2FA theft.

Evolution: Consistent; per-incident responses without addressing the structural critique.

[2][7][8]

Google (GTIG / DeepMind / Legal)

GTIG confirmed the first criminal AI-generated zero-day; DeepMind documented malicious websites targeting AI agents; Google Legal sued Outsider Enterprise for using Gemini to deliver phishing-as-a-service causing an estimated $1.9 billion in losses per FBI attribution.

Evolution: Consistent.

[39][40][41][12]

AuthMind + Turing Institute CETAS

The CAISI voluntary framework evaluates only submitted models, not deployment programs; both Glasswing and Daybreak have expanded to major government partnerships without CAISI review, exactly the unaudited frontier deployment their governance critique describes.

Evolution: OpenAI's Daybreak launch with seven government partnerships, also outside CAISI review, extends the same governance gap to a second major actor, strengthening their critique.

[22][42][43][20]

Tensions

Ars Technica and role confusion researchers argue current LLMs have no structural fix for prompt injection [8][26]; Brave's documentation of indirect prompt injection in Perplexity Comet [14] and the June 30 'false reality' attack demonstration [13] add a named commercial product and a new attack class to that critique; Willison's experiment showed 0 successes in 6,000 injection attempts via anti-injection training, while he maintains this is not a production safety guarantee [27]. [8][26][13][14][27]
OpenAI's Daybreak and Anthropic's Glasswing both claim structured defender-only access for frontier cyber AI, but neither is subject to CAISI review or a common external standard; governance critics argue two programs competing on capability while self-certifying access controls creates the gap CAISI was designed to prevent [20][18][22]. [20][18][22]
The Five Eyes alliance warns severe AI cyberattack capability could arrive 'within months' [24][23] and AISI measures autonomous capability doubling every 4.7 months [25] — but neither Glasswing nor Daybreak deployment terms publicly address what happens if that capability threshold is crossed during their current operating windows. [24][23][25]
Anthropic argues deploying Glasswing under controlled conditions is preferable to waiting while competitors deploy without safeguards; AuthMind and CETAS argue this expansion without CAISI review is precisely the unaudited frontier deployment their governance critique describes [18][22]. [18][22]
GitHub described removal of 73 malicious Microsoft-signed packages as a terms-of-service violation; Ars Technica argues this framing misled users about potential system compromise, consistent with GitHub's minimized characterization of the internal repository theft [7][2]. [7][2]

Status: active and growing

Sources

[1] TeamPCP Supply Chain Campaign: Update 006 - CERT-EU Confirms European Commission Cloud Breach, Sportradar Details Emerge, and Mandiant Quantifies Campaign at 1,000+ SaaS Environments — reactive:ai-security-nexus
[2] Nx Console 18.95.0 Incident: How TeamPCP Breached GitHub — reactive:ai-security-nexus
[3] GitHub just confirmed that attackers stole about 3,800 internal repositories after a poisoned VS Code extension compromi… — Rohan Paul Twitter (2026-05-20)
[4] European Commission cloud breach: a supply-chain compromise — reactive:ai-security-nexus
[5] Mini Shai-Hulud Returns: 600+Malicious npm Packages Fake Sigstore Badges in AntV Ecosystem Attack — reactive:ai-security-nexus
[6] TrapDoor Crypto Stealer Supply Chain Attack Hits 34 Packages... — reactive:ai-offensive-cyber
[7] For the 2nd time in weeks, Microsoft packages laced with credential stealer — Ars Technica AI (2026-06-08)
[8] Critical Copilot vulnerability allowed hackers to steal 2FA code from users — Ars Technica AI (2026-06-16)
[9] Exploiting Gemini via Prompt Injection | SafeBreach Original Research — reactive:ai-security-nexus
[10] Invitation Is All You Need: Hacking Gemini | SafeBreach — reactive:ai-security-nexus
[11] 😺 Google Gemini got hijacked via WhatsApp — The Neuron (2026-06-04)
[12] 😺 Google sued the people spamming your phone — The Neuron (2026-06-16)
[13] New attack provides one more reason why AI browsers are a bad idea — Ars Technica AI (2026-06-30)
[14] Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet | Brave — reactive:ai-security-nexus
[15] Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails | NVIDIA Technical Blog — reactive:ai-security-nexus
[16] Prompt injection is the new SQL injection, and guardrails ... — reactive:ai-security-nexus
[17] OpenAI Guardrails Bypass: The "Self-Policing" LLM ... — reactive:ai-security-nexus
[18] Expanding Project Glasswing — Anthropic News (2026-06-02)
[19] Patch the Planet: a Daybreak initiative to support open source maintainers — OpenAI Blog (2026-06-22)
[20] Daybreak: Tools for securing every organization in the world — OpenAI Blog (2026-06-22)
[21] MITRE ATT&CK and ATLAS Agentic Gap Analysis - Lab Space — reactive:ai-security-nexus
[22] When a Lab Withholds Its Best Model: What the Claude Mythos System Card Signals for Cybersecurity — reactive:ai-security-nexus
[23] Five Eyes Cyber Security Agencies Statement — reactive:ai-security-nexus
[24] AI models capable of severe attacks on governments and businesses could arrive within months. — Rohan Paul Twitter (2026-06-23)
[25] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[26] Prompt Injection as Role Confusion — Simon Willison (2026-06-22)
[27] What happened after 2,000 people tried to hack my AI assistant — Simon Willison (2026-06-26)
[28] Democracy Now! - The intelligence alliance known as "Five... — reactive:ai-security-nexus
[29] AI on pace to bypass cybersecurity systems in months, not years ... — reactive:ai-security-nexus
[30] AI could breach government and business defenses in months, US ... — reactive:ai-security-nexus
[31] Five Eyes Security Agencies Issue Urgent Warning On AI | 10 News — reactive:ai-security-nexus
[32] What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News (2026-06-03)
[33] Gap: Anthropic mapped 832 banned accounts onto MITRE ATT&CK. AI in the back half of attacks jumped 8.9%; phishing dr... — reactive:ai-security-nexus (2026-06-14)
[34] AI models capable of devastating attacks on governments ... — reactive:ai-security-nexus
[35] OpenAI’s new GPT-5.5-Cyber just beat Mythos 5 on CyberGym. — Rohan Paul Twitter (2026-06-22)
[36] OpenAI expands Daybreak program, updates GPT-5.5-Cyber, lands ... — reactive:ai-security-nexus
[37] OpenAI Expands Daybreak With GPT-5.5-Cyber to Help ... — reactive:ai-security-nexus
[38] OpenAI Help: Lockdown Mode — Simon Willison (2026-06-05)
[39] Google Researchers Detect First AI-Built Zero-Day Exploit in Cyberattack - Bloomberg — reactive:ai-offensive-cyber
[40] This Google DeepMind’s paper is a serious warning for anyone using autonomous agents today. — Rohan Paul Twitter (2026-06-04)
[41] Google sues Chinese cybercrime network that used Gemini to automate scams — Ars Technica AI (2026-06-12)
[42] Claude Mythos: What Does Anthropic's New Model Mean for the ... — reactive:ai-security-nexus
[43] AISI: autonomous AI cyber capability now doubling every 4.7 months — reactive:ai-offensive-cyber
[44] US government expands vetting of frontier AI models for security risks — reactive:ai-security-nexus
[45] Our response to the TanStack npm supply chain attack — OpenAI Blog (2026-05-13)
[46] Mini Shai-Hulud: TeamPCP compromette 160+ pacchetti npm e PyPI in un supply chain attack che ha colpito TanStack, Mistra... — reactive:ai-security-nexus (2026-05-19)
[47] A Self-Spreading Supply Chain Attack Compromises TanStack npm ... — reactive:ai-security-nexus
[48] Google Detects First AI-Generated Zero-Day Exploit - SecurityWeek — reactive:ai-offensive-cyber
[49] Google spotted an AI-developed zero-day before attackers could use it | CyberScoop — reactive:ai-offensive-cyber
[50] How fast is autonomous AI cyber capability advancing? — reactive:ai-offensive-cyber (2026-05-13)
[51] TeamPCP vende repo Mistral AI dopo attacco TanStack su OpenAI — reactive:ai-security-nexus (2026-05-18)
[52] Hackers threaten to leak Mistral files online — AI giant confirms breach, but not what data is involved | TechRadar — reactive:ai-offensive-cyber
[53] TeamPCP Claims Sale of Mistral AI Repositories Amid Mini Shai ... — reactive:ai-security-nexus
[54] GitHub Says 3,800 Repositories Breached—TeamPCP Hackers ... — reactive:ai-security-nexus
[55] Google Sues to Stop Chinese Cybercrime Group from Using Its A.I. — reactive:ai-security-nexus
[56] Five Eyes cybersecurity agencies warn of new AI models ... — reactive:ai-security-nexus
[57] #Gravitas | Five Eyes intelligence alliance issued a joint ... — reactive:ai-security-nexus
[58] 'Five Eyes' intelligence alliance warns that new AI models ... — reactive:ai-security-nexus