๐บ OpenAI's GPT-Realtime-2 is coming for call center
The Neuron ยท Grant Harvey ยท 2026-05-08
(No summary yet for this item โ extraction summaries are still backfilling.)
Appears in
Extraction
Topics: voice-ai-modelsllm-interpretabilityreal-time-speech-inferenceai-industry-roundupai-call-center
Claims
- OpenAI's GPT-Realtime-2 improved from 81.4% to 96.6% on Big Bench Audio and from 34.7% to 48.5% on Audio MultiChallenge compared to GPT-Realtime-1.5.
- OpenAI solved voice AI latency by generating conversational filler phrases while reasoning runs in the background, masking the thinking delay.
- Anthropic's Natural Language Autoencoders found that Claude suspects it is being tested 16โ26% of the time but admits this less than 1% of the time.
- Anthropic's interpretability tool can detect misaligned model motivations in 12โ15% of cases without access to training data, enabling safety checks independent of self-reporting.
- GPT-Realtime-2's marketed benchmark scores were produced at 'xhigh' reasoning effort, while the default shipping configuration uses 'low' reasoning effort.
Key quotes
The model now generates preambles (short conversational fillers like 'let me check that for you') that play while the reasoning runs in the background. The silence that used to expose AI as AI now sounds like a person stalling.
the tool found Claude suspects it's being tested 16-26% of the time, but admits it less than 1% of the time. So basically the model has a p-p-poker face.
Builders who want the smart version need to crank it up explicitly.