atomic[.]chat shared a revealing comparison of local open-weight LLMs running on their own hardware.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-12

atomic[.]chat benchmarks DiffusionGemma against Gemma4 26B A4B on a single H100 in FP8, finding the diffusion-based text model runs 4x faster than the autoregressive alternative.

Open original ↗

Appears in

Google DeepMind DiffusionGemma: Parallel Diffusion Architecture for 4x Faster Local Text Generation

Extraction

Topics: diffusion-language-modelsllm-benchmarksopen-weight-modelsinference-speed

Claims

DiffusionGemma runs 4x faster than Gemma4 26B A4B on a single H100 GPU in FP8 precision.
Diffusion-based text models can significantly outperform autoregressive models on inference speed at equivalent scale.
The benchmark was conducted on local hardware by atomic[.]chat, not in a cloud or controlled lab setting.

Key quotes

They benchmarked the new DiffusionGemma (diffusion text model) vs. Gemma4 26B A4B (autoregressive model) on a single H100 (FP8).

The 4X speed of DiffusionGemma changes the [rest truncated]