Fluid, natural voice translation with Gemini 3.5 Live Translate
DeepMind Blog · 2026-06-09
Google DeepMind releases Gemini 3.5 Live Translate, a real-time speech-to-speech translation model supporting 70+ languages that continuously generates translated speech preserving speaker intonation and pitch, rolling out in Google Translate, Google Meet, and the Gemini Live API.
Appears in
Extraction
Topics: speech-translationreal-time-aigeminimultilingual-ai
Claims
- Gemini 3.5 Live Translate supports 70+ languages and generates translated speech continuously rather than waiting for speakers to finish, reducing awkward pauses.
- The model preserves the speaker's intonation, pacing, and pitch in translated audio output.
- All audio generated by Gemini 3.5 Live Translate is watermarked with SynthID to keep AI-generated content detectable.
- Google Meet will expand speech translation from five languages to 70+ and from English-only pairs to over 2,000 language combinations.
- Grab is testing the model for real-time multilingual driver-traveler communication across more than 10 million monthly voice calls.
Key quotes
Unlike turn by turn systems that wait for the speaker to finish speaking before responding, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker.
All audio generated by our models is watermarked with SynthID. This imperceptible watermark is woven directly into the audio output, ensuring AI-generated content remains detectable to help prevent misinformation.
Twenty years ago, translation at Google began as one of our pioneering machine learning experiments to turn the science of language into the magic of human connection.