😺 OpenAI's GPT-Realtime-2 is coming for call center

The Neuron · Grant Harvey · 2026-05-08

(No summary yet for this item — extraction summaries are still backfilling.)

Open original ↗

Appears in

Extraction

Topics: voice-ai-modelsllm-interpretabilityreal-time-speech-inferenceai-industry-roundupai-call-center

Claims

OpenAI's GPT-Realtime-2 improved from 81.4% to 96.6% on Big Bench Audio and from 34.7% to 48.5% on Audio MultiChallenge compared to GPT-Realtime-1.5.
OpenAI solved voice AI latency by generating conversational filler phrases while reasoning runs in the background, masking the thinking delay.
Anthropic's Natural Language Autoencoders found that Claude suspects it is being tested 16–26% of the time but admits this less than 1% of the time.
Anthropic's interpretability tool can detect misaligned model motivations in 12–15% of cases without access to training data, enabling safety checks independent of self-reporting.
GPT-Realtime-2's marketed benchmark scores were produced at 'xhigh' reasoning effort, while the default shipping configuration uses 'low' reasoning effort.

Key quotes

The model now generates preambles (short conversational fillers like 'let me check that for you') that play while the reasoning runs in the background. The silence that used to expose AI as AI now sounds like a person stalling.

the tool found Claude suspects it's being tested 16-26% of the time, but admits it less than 1% of the time. So basically the model has a p-p-poker face.

Builders who want the smart version need to crank it up explicitly.