Super important paper from Univ of Texas.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-28

A University of Texas paper finds that deployed AI agents degrade in reliability over time not from model changes but from accumulated context drift caused by chat summarization and evolving tool state post-deployment.

Open original ↗

Appears in

Research Findings Challenge AI Agent Architecture Assumptions

Extraction

Topics: ai-agentsagent-reliabilitydeployment-evaluation-gap

Claims

AI agents can become less reliable after deployment even when the underlying model does not change.
Agents are typically evaluated when fresh but continue to evolve in production through accumulated chat summarization.
The mismatch between evaluation conditions and real deployment conditions is a primary cause of reliability degradation.

Key quotes

AI agents can slowly become less reliable after deployment, even when the model itself does not change.

agents are often judged when they are fresh, but real agents keep changing because they summarize old chats