New paper from Cambridge Univ+NVIDIA and other top labs teaches AI agents and AI judges to improve together, so neither…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-29

A Cambridge University and NVIDIA paper introduces the Red Queen Gödel Machine, a co-evolutionary training framework where AI agents and their evaluators improve together, outperforming fixed-evaluator baselines on coding and paper-writing tasks while using fewer tokens.

Open original ↗

Extraction

Topics: self-improving-aiai-evaluationco-evolutionary-trainingllm-agentsreinforcement-learning

Claims

Most self-improving AI systems train against fixed benchmarks or evaluators, causing scores to become stale or gameable over time.
The Red Queen Gödel Machine allows evaluators to co-evolve with agents by updating only at stable handoff points, ensuring each training phase has a consistent judge.
On coding tasks, the co-evolutionary system beats the prior best self-improving coding agent while requiring 1.35x to 1.72x fewer tokens due to efficient code reviewer feedback.
On paper writing tasks, the co-evolved writer achieves approximately 1.86x higher acceptance rates from a reviewer panel compared to a fixed-evaluator baseline.
Advanced AI systems will require co-evolving evaluators rather than static benchmarks to sustain meaningful training signal as capabilities grow.

Key quotes

The big point is that stronger AI systems may need stronger judges growing with them, because fixed tests can stop giving useful pressure.

Moves self-improving AI away from fixed benchmarks and toward a loop where the thing doing the judging can also get better.

On coding, the system beats the earlier best self-improving coding agent while using 1.35× to 1.72× fewer tokens, because a cheap code reviewer adds useful feedback.