Synthesis history

4 versions, newest first.

Version 4 2026-05-26 19:24 UTC · 96 items

Google has officially endorsed Multi-Token Prediction for Gemma 4 in a dedicated blog post [^20690], with benchmarks showing up to 3x faster inference [^20688][^20689] — MTP has moved from vendor-documented (NVIDIA, lla…
Version 3 2026-05-25 18:30 UTC · 88 items

The MLX non-determinism critique from the previous pass has attracted a community response: a GitHub fix project (mlx-deterministic [^20419]) offering batch-invariant operations, and an explanatory article [^20420] deta…
Version 2 2026-05-25 11:13 UTC · 83 items

Two substantive new developments this pass: AMD Strix Halo enters the picture with a 2x token generation improvement, adding a third hardware contender alongside NVIDIA and Apple Silicon; and Apple escalated its commitm…
Version 1 2026-05-25 04:08 UTC · 68 items

A wave of demonstrations in May 2026 shows capable AI models running locally on consumer hardware through novel memory and inference tricks. - A 1-trillion-parameter model (Kimi K2.5, a sparse MoE) was run on a single c…