The Information Machine

Capable AI Models Running on Consumer Hardware

Synthesis history

4 versions, newest first.

  1. Version 4 2026-05-26 19:24 UTC · 96 items

    Google has officially endorsed Multi-Token Prediction for Gemma 4 in a dedicated blog post [^20690], with benchmarks showing up to 3x faster inference [^20688][^20689] — MTP has moved from vendor-documented (NVIDIA, lla…

  2. Version 3 2026-05-25 18:30 UTC · 88 items

    The MLX non-determinism critique from the previous pass has attracted a community response: a GitHub fix project (mlx-deterministic [^20419]) offering batch-invariant operations, and an explanatory article [^20420] deta…

  3. Version 2 2026-05-25 11:13 UTC · 83 items

    Two substantive new developments this pass: AMD Strix Halo enters the picture with a 2x token generation improvement, adding a third hardware contender alongside NVIDIA and Apple Silicon; and Apple escalated its commitm…

  4. Version 1 2026-05-25 04:08 UTC · 68 items

    A wave of demonstrations in May 2026 shows capable AI models running locally on consumer hardware through novel memory and inference tricks. - A 1-trillion-parameter model (Kimi K2.5, a sparse MoE) was run on a single c…