The Information Machine

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-22

Cerebras achieves 981 tokens/sec on Kimi K2.6's 1-trillion-parameter model, running 6.7x faster than competing GPU clouds, with performance independently validated by Artificial Analysis.

Open original ↗

Appears in

Extraction

Topics: inference-speedllm-hardwarecerebraskimi-k2

Claims

  • Cerebras reported 981 tokens/sec throughput on the 1T-parameter Kimi K2.6 model.
  • This speed is 6.7x faster than the next-fastest GPU cloud alternative.
  • The performance claim was validated by Artificial Analysis.
  • The fundamental bottleneck in large-model inference is moving weights and activations fast enough across chips.

Key quotes

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model. 6.7× faster than the next GPU cloud, validated by Artificial Analysis.
The hard part is moving model weights and activations fast enough, because normal GPU clusters split the model across many chips