New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science start…

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-08

Anthropic research reveals that AI agents excelling at coding tasks produce inconsistent and unreliable outputs in biology, giving different answers to identical Ebola sequence data requests without any prompt change.

Open original ↗

Appears in

Research Findings Challenge AI Agent Architecture Assumptions

Extraction

Topics: ai-agentsai-reliabilitybiology-aianthropic-research

Claims

AI agents that perform well on coding tasks can fail before meaningful scientific work begins in biology.
Strong AI agents give different answers to the same biology data request even when the prompt is unchanged.
AI agent performance in one domain does not reliably predict performance in another domain.

Key quotes

AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.