Frontier AI Safety Evaluation: Scheming Research and Evaluation Standards
Synthesis history
6 versions, newest first.
-
Version 6 2026-06-04 02:13 UTC · 52 items
ARC's white-box estimation challenge [^23366] is the substantive new development: it introduces a technical argument that black-box behavioral sampling is structurally insufficient to detect control-undermining behavior…
-
Version 5 2026-06-01 18:33 UTC · 45 items
The most significant new development is that OpenAI's playbook reportedly contains a claim that AI capabilities may not be fully evaluable by third parties [^23226], sharpening the structural conflict-of-interest critiq…
-
Version 4 2026-06-01 08:11 UTC · 34 items
Emergence AI's comparative simulation study adds striking empirical evidence that frontier model alignment quality varies dramatically across labs — not just across conditions — with Claude producing zero crimes and Gro…
-
Version 3 2026-05-31 08:10 UTC · 29 items
GovAI enters as a new academic-policy voice with a framework for rigorous third-party frontier AI auditing, reinforcing AVERI's practitioner position with institutional research weight. A technical cryptographic verific…
-
Version 2 2026-05-30 18:44 UTC · 19 items
Three substantive new voices entered this cycle: Anthropic (via 'Teaching Claude Why' and the Feb 2026 Risk Report), AVERI as an independent auditing organization, and EU regulators via AI Act compliance analysis. The c…
-
Version 1 2026-05-30 02:05 UTC · 2 items
Two frontier AI labs published complementary contributions to AI safety evaluation on the same day, May 29, 2026. OpenAI released a methodological playbook intended to standardize how third parties evaluate frontier mod…