Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-22
Cerebras achieves 981 tokens/sec on Kimi K2.6's 1-trillion-parameter model, running 6.7x faster than competing GPU clouds, with performance independently validated by Artificial Analysis.
Appears in
Extraction
Topics: inference-speedllm-hardwarecerebraskimi-k2
Claims
- Cerebras reported 981 tokens/sec throughput on the 1T-parameter Kimi K2.6 model.
- This speed is 6.7x faster than the next-fastest GPU cloud alternative.
- The performance claim was validated by Artificial Analysis.
- The fundamental bottleneck in large-model inference is moving weights and activations fast enough across chips.
Key quotes
Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model. 6.7× faster than the next GPU cloud, validated by Artificial Analysis.
The hard part is moving model weights and activations fast enough, because normal GPU clusters split the model across many chips