Frontier AI Offensive Cybersecurity Benchmarks: GPT-5.5 vs. Claude Mythos · history

Version 3

2026-05-01 20:19 UTC · 110 items

Narrative

The benchmark data anchoring this story has grown significantly more specific since the last synthesis pass. AISI's official post confirmed GPT-5.5 achieved a 71.4% pass rate on its narrow cyber task suite,[1][2] providing the first concrete quantitative anchor to what had been described qualitatively as 'comparable to Mythos.' But two new sources push beyond the AISI tie framing: VentureBeat explicitly reports that GPT-5.5 'narrowly beats' Anthropic's Claude Mythos Preview on Terminal Bench 2.0,[3] and Reddit's r/singularity notes that GPT-5.5 'slightly outperformed Mythos on a multi-step cyber-attack' scenario.[4] These findings do not necessarily contradict AISI's top-line 'statistical tie' — Terminal Bench 2.0 and AISI's narrow tasks are distinct evaluation frameworks — but together they suggest the tie framing is a conservative aggregate that masks sub-benchmark edges for GPT-5.5 on at least some task types, a nuance likely to re-energize the ranking debate. The Information has affirmed AISI's comparable framing,[5] while Reddit's r/AIGuild cites AISI calling GPT-5.5 'one of the strongest cyber models it has tested,'[6] a characterization that may be slightly stronger than prior summaries. Spanish- and Portuguese-language social media accounts are now spreading the AISI findings independently,[7][8] indicating the story has crossed language barriers for the first time.

Institutional engagement has expanded substantially on two fronts. The Cloud Security Alliance has escalated far beyond its research note from the prior cycle, publishing a full PDF guidance document titled 'The AI Vulnerability Storm: Building a Mythos-ready Security Program,'[9][10] with at least two versioned drafts circulating in April 2026 — including one dated April 13 in its filename — signaling active iteration toward actionable enterprise guidance. A LinkedIn post explicitly flags this as a draft on 'Mythos-Class Capability,'[11] and the CSA's prior work on agentic risk assessment tools provides institutional grounding for the effort.[12] More significant as a counter-signal is the entry of the Center for Strategic and International Studies (CSIS): its Strategic Technologies Blog has published 'Beyond Autonomous Attacks: The Reality of AI-Enabled Cyber Threats,'[13] a title explicitly positioning itself as corrective to the dominant alarmed framing — the first major DC think-tank to introduce a skeptical counter-narrative in this thread, giving institutional ballast to dissenting views that were previously limited to Tessl's capability-limitation observations. OECD.AI has also formally catalogued the frontier AI cyber capability jump as an incident in its registry (dated April 20, 2026),[14] granting the episode quasi-official status in international AI policy tracking. Dark Reading, a major cybersecurity trade publication, has joined with coverage asking 'Here's What Comes Next' for Mythos,[15] typifying a shift now visible across specialized security press: from initial alarm toward operational response questions.

The OpenAI program architecture is now better documented. Sam Altman's original X post announcing the GPT-5.5-Cyber rollout is confirmed in the evidence base,[16] and a formal pilot request portal has been published.[17] Axios coverage dated April 14 — two weeks before the April 30 AISI benchmarking — suggests OpenAI may have pre-established the Trusted Access framework before the public comparison to Mythos emerged,[18] implying the governance posture was proactive rather than reactive to the benchmark controversy. Regional coverage frames the program's ambition as deploying OpenAI's 'most powerful model at all levels of government to fight hackers,'[19] a notably broader scope than the critical-infrastructure framing in OpenAI's own materials. Ilya Kabanov's LinkedIn post on the launch has attracted 39 comments,[20] suggesting meaningful professional security community engagement. A significant naming discrepancy has also surfaced: SecureWorld refers to a 'GPT-5.4-Cyber' launch alongside the Trusted Access program expansion,[21] while all other coverage refers to GPT-5.5-Cyber — raising unresolved questions about whether the program has prior model versions or this reflects a reporting error.

The overall arc has shifted from shock to structured engagement. The dominant question is no longer 'has this threshold been crossed?' — that is now consensus across security firms, standards bodies, government agencies, trade press, and international policy bodies — but rather how defenders, enterprises, and governments should respond. The entry of CSIS as a skeptical counterpoint, the CSA's escalation from commentary to iterative guidance, OECD.AI's formal incident cataloguing, and the 'what comes next' framing in trade press all indicate the story is entering a more mature, response-oriented phase. The unresolved central tension — whether OpenAI's tiered Trusted Access program provides meaningful governance when the general GPT-5.5 (rated Mythos-class by AISI) remains broadly available — now has more institutional actors weighing in, but no resolution.

Timeline

2026-04-01: UK AISI publishes evaluation of Claude Mythos Preview's cyber capabilities, marking the first time AISI formally benchmarks a frontier model on offensive cybersecurity tasks [23]
2026-04-01: Anthropic publishes Claude Mythos Preview alignment risk report; CrowdStrike named as founding security partner [46][47]
2026-04-13: Cloud Security Alliance circulates early draft of 'The AI Vulnerability Storm: Building a Mythos-ready Security Program' PDF guidance document (version date embedded in filename) [10]
2026-04-14: Axios reports OpenAI is rolling out tiered access to advanced AI cyber models, suggesting the Trusted Access framework was established before the April 30 GPT-5.5 public benchmarking [18]
2026-04-15: IBM announces new autonomous security measures to help enterprises confront agentic AI-driven attacks [65][66]
2026-04-20: OECD.AI formally catalogs the frontier AI cyber capability jump as an incident in its international AI incident registry [14]
2026-04-24: Early social media debate emerges over whether Mythos or GPT-5.5 leads on the AISI cyber benchmark, with some suggesting Mythos won [67]
2026-04-30: UK AISI publishes formal evaluation of GPT-5.5 cyber capabilities, finding it comparable to Claude Mythos Preview; AISI's official X post confirms 71.4% pass rate on narrow cyber tasks [22][24][25][2][1][5]
2026-04-30: VentureBeat reports GPT-5.5 'narrowly beats' Anthropic's Claude Mythos Preview on Terminal Bench 2.0, adding granular texture to AISI's top-line 'comparable' framing; Reddit r/singularity similarly notes GPT-5.5 slightly outperformed Mythos on a multi-step cyber-attack scenario [3][4]
2026-04-30: OpenAI officially introduces GPT-5.5 and simultaneously launches 'Trusted Access for Cyber' with a formal pilot request portal; Sam Altman promotes the rollout via X post; SecureWorld refers to the restricted variant as 'GPT-5.4-Cyber' in a naming discrepancy with all other coverage [26][27][28][29][31][16][32][33][17][21][36][34][35]
2026-04-30: XBOW publishes 'GPT-5.5: Mythos-Like Hacking, Open To All,' highlighting public accessibility of GPT-5.5 vs. gated Mythos; framing rapidly adopted by secondary tech media [39][40][41][68]
2026-04-30: Cloud Security Alliance publishes updated version of full PDF guidance document 'The AI Vulnerability Storm: Building a Mythos-ready Security Program' [9][11]
2026-04-30: CSIS publishes 'Beyond Autonomous Attacks: The Reality of AI-Enabled Cyber Threats,' the first major DC think-tank skeptical counter-framing; Dark Reading asks 'What Comes Next' for Mythos in specialized security trade press; Hacker News thread on Mythos cybersecurity capabilities opens [13][15][69][70]
2026-04-30: OpenAI announces expansion of Trusted Access for Cyber with additional tiers; Ilya Kabanov's LinkedIn post on the launch draws 39 professional comments [30][20]
2026-05-01: Story spreads to Spanish and Portuguese social media; The Agent Times frames frontier LLMs as enabling both industrialized cyberattacks and advanced defensive operations; BSCN and other accounts amplify the AISI 'GPT-5.5 matches Mythos' finding internationally [7][8][71][53][54][55][52]

Perspectives

UK AI Security Institute (AISI)

Neutral independent evaluator: GPT-5.5 is comparable to Claude Mythos Preview on cybersecurity benchmarks, with a confirmed 71.4% pass rate on narrow cyber tasks; GPT-5.5 described as 'one of the strongest cyber models' AISI has tested; both models represent a new capability tier far ahead of prior-generation models

Evolution: Now quantitatively anchored: the 71.4% pass rate figure is newly public, providing a concrete number to what was previously a qualitative 'comparable' characterization; the 'one of the strongest' framing in community summaries is slightly stronger than AISI's prior language

[22][23][24][25][2][1][6]

OpenAI

Proactively defensive with product differentiation: has formalized a multi-tiered 'Trusted Access for Cyber' program with a dedicated pilot request portal, and a distinct GPT-5.5-Cyber model variant for government and critical infrastructure defenders; Sam Altman personally announced the rollout via X post; program scope described by third-party coverage as extending to 'all levels of government to fight hackers'

Evolution: Sam Altman's original X post is now confirmed in the evidence base; Axios coverage dated April 14 suggests the governance framework preceded the public benchmarking controversy; program scope is framed more expansively by third-party coverage than OpenAI's own materials; SecureWorld's 'GPT-5.4-Cyber' reference raises unresolved program version history questions

[26][27][28][29][30][31][16][32][17][33][19][21][34][35][36][37][38][20][18]

XBOW (security firm)

Alarmed but framing as democratization: GPT-5.5 brings Mythos-class offensive hacking capability to the general public, removing the gating Anthropic uses for Mythos

Evolution: Framing remains dominant in secondary amplification; no new XBOW statements in this cycle, but the thesis continues to propagate through aggregators

[39][40][41]

Cloud Security Alliance

Formally engaged and producing actionable enterprise guidance: has escalated from a research note to a full iterative PDF guidance document — 'The AI Vulnerability Storm: Building a Mythos-ready Security Program' — with multiple versioned drafts in April 2026, aimed at helping enterprises build 'Mythos-ready' security programs

Evolution: Major escalation from last cycle: moved from commentary (research note on the autonomous offensive threshold) to comprehensive multi-versioned enterprise guidance, with active iteration suggesting ongoing development rather than a one-time publication

[42][9][10][11][12][43]

CSIS (Center for Strategic and International Studies)

Skeptical counter-framing: published 'Beyond Autonomous Attacks: The Reality of AI-Enabled Cyber Threats,' positioning itself as corrective to the dominant alarmed narratives about AI-autonomous cyberattacks

Evolution: New voice in this thread; the first major DC think-tank to provide institutional weight to skeptical views, challenging the consensus framing that AISI benchmark scores straightforwardly imply operational threat capabilities

[13]

OECD.AI

International policy recognition: has formally catalogued the frontier AI cyber capability development as an AI incident in its registry, granting it official status in international AI policy tracking

Evolution: New voice; represents the first cataloguing of this episode by an international multilateral policy body, raising the regulatory stakes beyond bilateral national agency advisories

[14]

Anthropic

Cautious-defensive: Mythos remains gated; risk report published; Project Glasswing frames offensive capability as dual-use for defenders; CrowdStrike partnership signals enterprise security positioning

Evolution: Consistent with prior Mythos approach — no new announcements in this cycle

[44][45][46][47]

National cybersecurity agencies (UK NCSC, ASD, CSE Canada, CSA Singapore)

Defensive warning posture: multiple agencies issuing advisories and guidance on frontier AI cyber threats, urging defenders to prepare

Evolution: Coordinated response continuing; no new agency voices added but existing advisories remain the policy baseline

[48][49][50][51]

VentureBeat and specialized security trade press

More granular than the AISI top-line: VentureBeat specifically reports GPT-5.5 'narrowly beats' Mythos on Terminal Bench 2.0; Dark Reading shifts to 'what comes next' posture for security practitioners; The Information affirms AISI's comparable framing

Evolution: VentureBeat's 'narrowly beats' framing is the first mainstream outlet to explicitly claim a GPT-5.5 performance edge; Dark Reading's entry signals the story has matured into a response-and-implications phase in specialized security press

[3][5][15][52]

Social media commentators (multilingual)

Amplification has gone international: Spanish and Portuguese accounts are independently reproducing the AISI comparable finding; English-language accounts continue debating tie vs. slight GPT-5.5 edge; Reddit r/singularity cites slight GPT-5.5 outperformance

Evolution: Story has crossed language barriers for the first time, indicating broader global spread; ranking debate persists despite the AISI 'tie' framing

[4][7][8][53][54][55][1][56][57]

Tensions

AISI 'statistical tie' top-line vs. sub-benchmark GPT-5.5 edges: AISI calls the models comparable on narrow cyber tasks (71.4% pass rate), but VentureBeat reports a narrow GPT-5.5 win on Terminal Bench 2.0 and Reddit users note a slight outperformance on multi-step attack scenarios — these are distinct benchmarks, but together they suggest the tie framing may be a conservative aggregate masking task-specific GPT-5.5 advantages [3][4][2][1][24][5][6]
GPT-5.5 vs. GPT-5.5-Cyber product distinction complicates the democratization debate: OpenAI's tiered 'Trusted Access for Cyber' program restricts only the -Cyber variant, while the general GPT-5.5 (which AISI found to be Mythos-class) remains broadly available — making the governance question more complex than a simple 'gated vs. open' binary [27][28][58][30][41][39][17][33]
GPT-5.4-Cyber vs. GPT-5.5-Cyber naming discrepancy: SecureWorld refers to a 'GPT-5.4-Cyber' launch alongside the Trusted Access program expansion, while all other coverage refers to GPT-5.5-Cyber — raising unresolved questions about whether the Trusted Access program has prior model versions, whether the tiered access concept predates GPT-5.5, or whether SecureWorld made a reporting error [21][34][16][35][18]
Whether benchmark performance translates to real-world offensive uplift: CSIS's 'Beyond Autonomous Attacks: The Reality of AI-Enabled Cyber Threats' explicitly frames itself as corrective to overstated autonomous-attack narratives; Tessl previously flagged GPT-5.5's weakness in using provided context; these dissenting signals have not entered the dominant discourse, which treats AISI benchmark scores as proxies for operational threat capability [13][59][60][61][62]
Regulatory and governance gap: OECD.AI has now formally catalogued this as an international AI incident, national agencies continue issuing advisories, and CSA is producing iterative enterprise guidance — but no coordinated international access-control framework exists; Anthropic's voluntary gating of Mythos contrasts with OpenAI's tiered-but-partially-open release posture, and the appropriate policy response remains unresolved [14][48][49][50][51][63][42][9][27]
Program scope ambiguity: OpenAI's own materials frame GPT-5.5-Cyber as for 'critical infrastructure defenders' and government partners, but third-party coverage describes the ambition as deploying the model 'at all levels of government to fight hackers' — a significantly broader scope with different eligibility and governance implications [19][27][34][18][64]

Sources

[1] GPT-5.5 hit parity with Claude Mythos on offensive cyber evals. UK AI Security Institute confirmed 71.4% pass rate on mu... — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[2] On our narrow cyber tasks, GPT-5.5 achieved a — reactive:frontier-ai-cyber-capabilities
[3] OpenAI's GPT-5.5 is here, and it's no potato - VentureBeat — reactive:frontier-ai-cyber-capabilities
[4] GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack ... — reactive:frontier-ai-cyber-capabilities
[5] UK Group Says OpenAI's GPT-5.5 is Comparable to Anthropic ... — reactive:frontier-ai-cyber-capabilities
[6] UK AISI Says GPT-5.5 Is One of the Strongest Cyber Models It Has ... — reactive:frontier-ai-cyber-capabilities
[7] GPT-5.5 agora resolve simulações de ataques de rede autonomamente — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[8] 🔍🚨 Evaluación del UK AI Security Institute revela que GPT-5.5 iguala a Claude Mythos en capacidades cibernéticas. — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[9] [PDF] The “AI Vulnerability Storm”: Building a “Mythos- ready” Security Program — reactive:frontier-ai-cyber-capabilities
[10] [PDF] The “AI Vulnerability Storm”: Building a “Mythos- ready” Security ... — reactive:frontier-ai-cyber-capabilities
[11] Cloud Security Alliance Draft Paper on Mythos-Class Capability ... — reactive:frontier-ai-cyber-capabilities
[12] Cloud Security Alliance Introduces New Tool for Assessing | CSA — reactive:frontier-ai-cyber-capabilities
[13] Beyond Autonomous Attacks: The Reality of AI-Enabled Cyber Threats | Strategic Technologies Blog | CSIS — reactive:frontier-ai-cyber-capabilities
[14] Frontier AI Models Accelerate Cyberattack Capabilities - OECD.AI — reactive:frontier-ai-cyber-capabilities
[15] Anthropic's Mythos Has Landed: Here's What Comes Next ... — reactive:frontier-ai-cyber-capabilities
[16] we're starting rollout of GPT-5.5-Cyber, a frontier cybersecurity ... — reactive:frontier-ai-cyber-capabilities
[17] Request OpenAI Pilot: Trusted Access For Cyber — reactive:openai-advanced-account-security
[18] OpenAI rolls out tiered access to advanced AI cyber models - Axios — reactive:frontier-ai-cyber-capabilities
[19] OpenAI wants to put its most powerful model at all levels of government to fight hackers | Business | kten.com — reactive:frontier-ai-cyber-capabilities
[20] Introducing Trusted Access for Cyber | Ilya Kabanov | 39 comments — reactive:frontier-ai-cyber-capabilities
[21] OpenAI Launches GPT-5.4-Cyber, Expands Trusted Access Program as AI Defense Race Heats Up — reactive:frontier-ai-cyber-capabilities
[22] Our evaluation of OpenAI's GPT-5.5 cyber capabilities | AISI Work — reactive:frontier-ai-cyber-capabilities
[23] Our evaluation of Claude Mythos Preview's cyber capabilities — reactive:frontier-ai-cyber-capabilities
[24] Our evaluation of OpenAI's GPT-5.5 cyber capabilities — Simon Willison (2026-04-30)
[25] Read our full evaluation: — reactive:frontier-ai-cyber-capabilities
[26] Introducing GPT-5.5 - OpenAI — reactive:frontier-ai-cyber-capabilities
[27] Introducing Trusted Access for Cyber | OpenAI — reactive:frontier-ai-cyber-capabilities
[28] OpenAI Expands Trusted Access Program With GPT-5.5-Cyber - Dataconomy — reactive:frontier-ai-cyber-capabilities
[29] OpenAI’s Sam Altman says GPT-5.5-Cyber to launch for cyber defenders with focus on trusted government access | Today News — reactive:frontier-ai-cyber-capabilities
[30] We're expanding Trusted Access for Cyber with additional tiers for ... — reactive:frontier-ai-cyber-capabilities
[31] Accelerating the cyber defense ecosystem that protects us all - OpenAI — reactive:openai-advanced-account-security
[32] Sam Altman announced GPT-5.5-Cyber on April 30, 2026 — a frontier cybersecurity model deploying to vetted defenders with... — reactive:frontier-ai-cyber-capabilities (2026-04-30)
[33] Trusted access for the next era of cyber defense - OpenAI — reactive:openai-advanced-account-security
[34] OpenAI prepares GPT-5.5-Cyber for trusted security researchers - Techzine Global — reactive:frontier-ai-cyber-capabilities
[35] OpenAI to roll out GPT-5.5-Cyber with restricted access: Sam Altman — reactive:frontier-ai-cyber-capabilities
[36] Sam Altman reveals GPT-5.5-Cyber model launch with new AI defence strategy — reactive:frontier-ai-cyber-capabilities
[37] OpenAI will roll out GPT-5.5-Cyber to critical cyber defenders, CEO ... — reactive:frontier-ai-cyber-capabilities
[38] Jonathan R.'s Post - LinkedIn — reactive:frontier-ai-cyber-capabilities
[39] XBOW - GPT-5.5: Mythos-Like Hacking, Open To All — reactive:frontier-ai-cyber-capabilities
[40] “Mythos-like hacking, open to all”: Industry reacts to OpenAI's GPT 5.5 — reactive:frontier-ai-cyber-capabilities
[41] GPT-5.5 Brings Mythos-Like Hacking to the Masses | Awesome Agents — reactive:frontier-ai-cyber-capabilities
[42] Claude Mythos and the AI Autonomous Offensive Threshold — reactive:frontier-ai-cyber-capabilities
[43] Cloud Security Alliance launches AI risk initiative — reactive:frontier-ai-cyber-capabilities
[44] Assessing Claude Mythos Preview's cybersecurity capabilities — reactive:frontier-ai-cyber-capabilities
[45] Project Glasswing: Securing critical software for the AI era - Anthropic — reactive:frontier-ai-cyber-capabilities
[46] [PDF] Alignment Risk Update: Claude Mythos Preview - Anthropic — reactive:frontier-ai-cyber-capabilities
[47] Anthropic Claude Mythos Preview - CrowdStrike — reactive:frontier-ai-cyber-capabilities
[48] Why cyber defenders need to be ready for frontier AI | National Cyber Security Centre — reactive:frontier-ai-cyber-capabilities
[49] Frontier AI models and their impact on cyber security | Cyber.gov.au — reactive:frontier-ai-cyber-capabilities
[50] Frontier artificial intelligence - Canadian Centre for Cyber Security — reactive:frontier-ai-cyber-capabilities
[51] Advisory on Risks associated with Frontier AI Models | Cyber Security Agency of Singapore — reactive:frontier-ai-cyber-capabilities
[52] AI models are starting to cross a new line in cybersecurity. UK AISI ... — reactive:frontier-ai-cyber-capabilities
[53] UK AISI: GPT-5.5 MATCHES MYTHOS ON CYBER TASKS — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[54] → UK AI Security Institute found GPT-5.5 can autonomously solve complex cyber attack scenarios — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[55] Big change in the high-stakes AI race: GPT-5.5 is now almost even with Claude Mythos Preview in cyber-attack simulations... — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[56] GPT-5.5 just matched Claude Mythos on the same cyber benchmark .... two models, two companies, weeks apart. — reactive:frontier-ai-cyber-capabilities (2026-05-01)
[57] GPT-5.5 is on par with Claude Mythos — reactive:frontier-ai-cyber-capabilities
[58] OpenAI's new security model (GPT-5.5-Cyber) is for 'critical ... - Reddit — reactive:frontier-ai-cyber-capabilities
[59] Anthropic's Mythos Claims Questioned by Cybersecurity Insider — reactive:frontier-ai-cyber-capabilities
[60] What is Mythos and why are experts worried about Anthropic's AI ... — reactive:frontier-ai-cyber-capabilities
[61] This is just one eval, but it's an important one — reactive:frontier-ai-cyber-capabilities
[62] GPT-5.5 is OpenAI's best model. It's also the worst at using ... - Tessl — reactive:frontier-ai-cyber-capabilities
[63] OpenAI's new security model is for 'critical cyber defenders' only — reactive:frontier-ai-cyber-capabilities
[64] Sam Altman teases GPT-5.5 Cyber rollout as OpenAI doubles down ... — reactive:frontier-ai-cyber-capabilities
[65] IBM Announces New Cybersecurity Measures to Help Enterprises ... — reactive:frontier-ai-cyber-capabilities
[66] IBM Introduces Autonomous Security to Counter Frontier AI-Driven Cyber Threats — reactive:frontier-ai-cyber-capabilities
[67] 从这张Benchmark看，不是 GPT-5.5 赢了。 — reactive:frontier-ai-cyber-capabilities (2026-04-24)
[68] AISI Evaluates GPT-5.5 Cybersecurity Performance Against Advanced Tasks | Let's Data Science — reactive:frontier-ai-cyber-capabilities
[69] Assessing Claude Mythos Preview's cybersecurity capabilities — reactive:frontier-ai-cyber-capabilities
[70] Anthropic's Mythos AI Model Raises Cybersecurity Alarms : r/Agent_AI — reactive:frontier-ai-cyber-capabilities
[71] Frontier agentic LLMs now enable both industrialized cyberattacks and advanced defensive operations, with Anthropic's Pr... — reactive:frontier-ai-cyber-capabilities (2026-05-01)