The Information Machine

Frontier AI Safety Evaluation: Scheming Research and Evaluation Standards

Synthesis history

6 versions, newest first.

  1. Version 6 2026-06-04 02:13 UTC · 52 items

    ARC's white-box estimation challenge [^23366] is the substantive new development: it introduces a technical argument that black-box behavioral sampling is structurally insufficient to detect control-undermining behavior…

  2. Version 5 2026-06-01 18:33 UTC · 45 items

    The most significant new development is that OpenAI's playbook reportedly contains a claim that AI capabilities may not be fully evaluable by third parties [^23226], sharpening the structural conflict-of-interest critiq…

  3. Version 4 2026-06-01 08:11 UTC · 34 items

    Emergence AI's comparative simulation study adds striking empirical evidence that frontier model alignment quality varies dramatically across labs — not just across conditions — with Claude producing zero crimes and Gro…

  4. Version 3 2026-05-31 08:10 UTC · 29 items

    GovAI enters as a new academic-policy voice with a framework for rigorous third-party frontier AI auditing, reinforcing AVERI's practitioner position with institutional research weight. A technical cryptographic verific…

  5. Version 2 2026-05-30 18:44 UTC · 19 items

    Three substantive new voices entered this cycle: Anthropic (via 'Teaching Claude Why' and the Feb 2026 Risk Report), AVERI as an independent auditing organization, and EU regulators via AI Act compliance analysis. The c…

  6. Version 1 2026-05-30 02:05 UTC · 2 items

    Two frontier AI labs published complementary contributions to AI safety evaluation on the same day, May 29, 2026. OpenAI released a methodological playbook intended to standardize how third parties evaluate frontier mod…