DiffusionGemma

Simon Willison · Simon Willison · 2026-06-10

Google releases DiffusionGemma, a 26B open-weight Apache 2 licensed diffusion language model derived from its earlier Gemini Diffusion research, achieving over 500 tokens per second when hosted on NVIDIA's NIM cloud API.

Open original ↗

Appears in

Google DeepMind DiffusionGemma: Parallel Diffusion Architecture for 4x Faster Local Text Generation

Extraction

Topics: open-source-modelsdiffusion-language-modelsllm-performancegoogle-gemma

Claims

Google has released DiffusionGemma (google/diffusiongemma-26B-A4B-it) as an open-weight model under the Apache 2 license.
The model is derived from the Gemini Diffusion research Google briefly previewed in May 2025 but did not follow up on publicly.
NVIDIA is hosting DiffusionGemma for free on their NIM cloud API.
The model generated 2,409 tokens in 4.4 seconds in testing, corresponding to at least 500 tokens per second.

Key quotes

That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it.

I used that API to generate this pelican, which took 4.4s (according to time uv run generate.py) to return 2,409 tokens - so at least 500 tokens/second.