A huge 750 tokens/sec for GPT 5.6 Sol.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-06-26

OpenAI's GPT-5.6 Sol model served on Cerebras wafer-scale chips is announced to reach 750 tokens per second in July, representing up to 15x the throughput of current GPT-5.5 priority-tier service.

Open original ↗

Appears in

OpenAI GPT-5.6 Launch: Sol/Terra/Luna Tiers and White House-Controlled Rollout

Extraction

Topics: llm-inferenceopenaihardware-acceleration

Claims

GPT-5.6 Sol will achieve 750 tokens per second when served on Cerebras hardware, arriving in July.
This represents up to 15x the throughput of GPT-5.5 priority tier service, which guarantees 99% of requests at over 50 tokens per second.
The speed gain comes from Cerebras's wafer-scale chip architecture, which reduces memory and networking latency compared to conventional multi-GPU inference setups.

Key quotes

750 token/sec coming to 5.6 sol in july!

Sol on Cerebras is claiming up to 15x that rate.

Cerebras, whose wafer-scale chip is designed to move model data with far less memory and networking delay than a normal multi-GPU setup.