The Information Machine

Strong AI agents still struggle with long research work because they often fail to keep testing and improving.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-08

A multi-institution paper from Stanford, MIT, NVIDIA, and Google finds that top AI research agents succeed primarily through persistent iterative testing rather than superior reasoning, revealing a key limitation of current agentic systems.

Open original ↗

Appears in

Extraction

Topics: ai-agentsresearch-automationagentic-aillm-benchmarksscientific-ai

Claims

  • Current AI agents struggle to sustain performance on long-horizon research tasks.
  • Top-performing research agents succeed more by persistence and repeated testing than by raw reasoning ability.
  • Agents that fail to keep iterating and improving underperform those that do, regardless of underlying model strength.
  • This persistence gap represents a critical and underappreciated limitation of state-of-the-art research agents.

Key quotes

today's strongest research agents win less by brilliance than by refusing to stop testing