atomic[.]chat shared a revealing comparison of local open-weight LLMs running on their own hardware.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-12
atomic[.]chat benchmarks DiffusionGemma against Gemma4 26B A4B on a single H100 in FP8, finding the diffusion-based text model runs 4x faster than the autoregressive alternative.
Appears in
Extraction
Topics: diffusion-language-modelsllm-benchmarksopen-weight-modelsinference-speed
Claims
- DiffusionGemma runs 4x faster than Gemma4 26B A4B on a single H100 GPU in FP8 precision.
- Diffusion-based text models can significantly outperform autoregressive models on inference speed at equivalent scale.
- The benchmark was conducted on local hardware by atomic[.]chat, not in a cloud or controlled lab setting.
Key quotes
They benchmarked the new DiffusionGemma (diffusion text model) vs. Gemma4 26B A4B (autoregressive model) on a single H100 (FP8).
The 4X speed of DiffusionGemma changes the [rest truncated]