The Information Machine

DiffusionGemma

Simon Willison · Simon Willison · 2026-06-10

Google releases DiffusionGemma, a 26B open-weight Apache 2 licensed diffusion language model derived from its earlier Gemini Diffusion research, achieving over 500 tokens per second when hosted on NVIDIA's NIM cloud API.

Open original ↗

Appears in

Extraction

Topics: open-source-modelsdiffusion-language-modelsllm-performancegoogle-gemma

Claims

  • Google has released DiffusionGemma (google/diffusiongemma-26B-A4B-it) as an open-weight model under the Apache 2 license.
  • The model is derived from the Gemini Diffusion research Google briefly previewed in May 2025 but did not follow up on publicly.
  • NVIDIA is hosting DiffusionGemma for free on their NIM cloud API.
  • The model generated 2,409 tokens in 4.4 seconds in testing, corresponding to at least 500 tokens per second.

Key quotes

That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it.
I used that API to generate this pelican, which took 4.4s (according to time uv run generate.py) to return 2,409 tokens - so at least 500 tokens/second.