Introducing Gemini Omni

DeepMind Blog · 2026-05-17

Google DeepMind launches Gemini Omni Flash, the first model in the Gemini Omni family, which can generate and conversationally edit video from any combination of image, audio, video, and text inputs, with all outputs carrying an imperceptible SynthID digital watermark.

Open original ↗

Appears in

Extraction

Topics: multimodal-aivideo-generationgeminicontent-provenancegenerative-ai

Claims

Gemini Omni Flash accepts any combination of image, audio, video, and text as input to produce high-quality video output.
The model supports multi-turn conversational video editing where each prompt builds on prior edits while preserving character consistency and physics.
All videos generated by Omni are embedded with SynthID watermarks that can be verified through the Gemini app, Chrome, and Google Search.
Gemini Omni Flash is rolling out to all Google AI Plus, Pro, and Ultra subscribers and to YouTube Shorts users at no additional cost.
Omni includes improved physics simulation covering gravity, kinetic energy, and fluid dynamics for more realistic scene generation.

Key quotes

Omni is our new model that can create anything from any input — starting with video.

All videos created with Omni include our imperceptible SynthID digital watermark.

Every instruction builds on the last. Your characters stay consistent, the physics hold up and the scene remembers what came before.