I had to test it myself to believe this unreal inference speed.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-29

Startup Kog AI achieves 3,000 tokens per second single-user LLM inference on 8× AMD MI300X GPUs by exploiting a previously unrecognized efficiency gap in GPU token generation, with 2,100 tokens/s on 8× NVIDIA H200s.

Open original ↗

Appears in

Ultra-Low Latency LLM Inference: Benchmarks and Emerging Enterprise Pricing Tier

Extraction

Topics: llm-inferencegpu-performanceinference-optimization

Claims

Kog AI achieved 3,000 tokens/s for a single user on 8× AMD MI300X GPUs.
Kog AI achieved 2,100 tokens/s on 8× NVIDIA H200 GPUs.
The performance gain exploits a hidden efficiency gap in how GPUs generate tokens.
These speeds are achievable on standard datacenter GPU hardware without specialized modifications.

Key quotes

I had to test it myself to believe this unreal inference speed.

3,000 tokens/s for 1 user on standard datacenter GPUs. They leveraged a hidden efficiency gap in how GPUs generate tokens.