OpenAI's is new research shows a model’s future failures can be estimated by replaying real past chats
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-17
OpenAI research finds that replaying real past user conversations to simulate deployment predicts post-release model failure rates more accurately than using adversarial challenging prompts.
Appears in
Extraction
Topics: model-evaluationdeployment-simulationfailure-predictionai-safety
Claims
- Replaying real past user conversations can estimate a model's future failure rates before deployment.
- Deployment simulation outperforms adversarial challenging prompts at predicting which model failures will rise or fall after a new release.
- Deployment simulation is usually better than challenging prompts at estimating the magnitude of failure rate changes post-release.
Key quotes
Deployment simulation was much better than challenging prompts at predicting which model failures would rise or fall after release, and usually better at estimating [failure rates].