Another good news for local-LLM from atomic[.]chat, that runs 100% offline on your computer.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-21

Local LLM app atomic.chat demonstrates Multi-Token Prediction (MTP) more than doubling inference speed for a Qwen 27B model from 51 to 117 tokens/s and boosting a Qwen MoE 35B model from 218 to 267 tokens/s on dual RTX 5090 GPUs.

Open original ↗

Appears in

Capable AI Models Running on Consumer Hardware

Extraction

Topics: local-llminference-optimizationmulti-token-predictionon-device-aiqwen

Claims

Multi-Token Prediction (MTP) boosted a dense Qwen 27B model's inference speed from 51 to 117 tokens/s on atomic.chat.
A Qwen MoE 35B-A3B model improved from 218 to 267 tokens/s on dual RTX 5090 GPUs with MTP enabled.
atomic.chat runs 100% offline on consumer hardware with no cloud dependency.
MTP provides substantial throughput gains for local model inference without requiring additional hardware.

Key quotes

They just showed MTP (Multi-Token Prediction) pushing local Qwen models from 51 to 117 tokens/s on dense 27B.

An MoE 35B-A3B model rose from 218 to 267 tokens/s on 2x RTX 5090.