Super important paper from Univ of Texas.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-28
A University of Texas paper finds that deployed AI agents degrade in reliability over time not from model changes but from accumulated context drift caused by chat summarization and evolving tool state post-deployment.
Appears in
Extraction
Topics: ai-agentsagent-reliabilitydeployment-evaluation-gap
Claims
- AI agents can become less reliable after deployment even when the underlying model does not change.
- Agents are typically evaluated when fresh but continue to evolve in production through accumulated chat summarization.
- The mismatch between evaluation conditions and real deployment conditions is a primary cause of reliability degradation.
Key quotes
AI agents can slowly become less reliable after deployment, even when the model itself does not change.
agents are often judged when they are fresh, but real agents keep changing because they summarize old chats