Advancing voice intelligence with new models in the API

OpenAI Blog · 2026-05-07

(No summary yet for this item — extraction summaries are still backfilling.)

Open original ↗

Appears in

Extraction

Topics: voice-airealtime-apispeech-to-textlive-translationopenai

Claims

OpenAI is releasing three new audio models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—in the Realtime API.
GPT-Realtime-2 uses GPT-5-class reasoning and scores 15.2% higher on Big Bench Audio than its predecessor GPT-Realtime-1.5.
GPT-Realtime-2 expands the context window from 32K to 128K tokens and adds five adjustable reasoning effort levels.
GPT-Realtime-Translate supports live speech translation across 70+ input languages into 13 output languages.
GPT-Realtime-Whisper provides streaming speech-to-text transcription at low latency as speakers talk in real time.

Key quotes

GPT‑Realtime‑2 is built for live voice interactions where the model keeps the conversation moving while it reasons through a request, calls tools, handles corrections or interruptions, and responds in a way that fits the moment.

On our hardest adversarial benchmark, this translates to a 26-point lift in call success rate after prompt optimization (95% vs. 69%).

In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested, along with lower fallback rates, higher task completion, and latency that sustained natural conversation.