OpenAI's is new research shows a model’s future failures can be estimated by replaying real past chats

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-17

OpenAI research finds that replaying real past user conversations to simulate deployment predicts post-release model failure rates more accurately than using adversarial challenging prompts.

Open original ↗

Appears in

Frontier AI Safety Evaluation: Scheming Research and Evaluation Standards

Extraction

Topics: model-evaluationdeployment-simulationfailure-predictionai-safety

Claims

Replaying real past user conversations can estimate a model's future failure rates before deployment.
Deployment simulation outperforms adversarial challenging prompts at predicting which model failures will rise or fall after a new release.
Deployment simulation is usually better than challenging prompts at estimating the magnitude of failure rate changes post-release.

Key quotes

Deployment simulation was much better than challenging prompts at predicting which model failures would rise or fall after release, and usually better at estimating [failure rates].