I had to test it myself to believe this unreal inference speed.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-29
Startup Kog AI achieves 3,000 tokens per second single-user LLM inference on 8× AMD MI300X GPUs by exploiting a previously unrecognized efficiency gap in GPU token generation, with 2,100 tokens/s on 8× NVIDIA H200s.
Appears in
Extraction
Topics: llm-inferencegpu-performanceinference-optimization
Claims
- Kog AI achieved 3,000 tokens/s for a single user on 8× AMD MI300X GPUs.
- Kog AI achieved 2,100 tokens/s on 8× NVIDIA H200 GPUs.
- The performance gain exploits a hidden efficiency gap in how GPUs generate tokens.
- These speeds are achievable on standard datacenter GPU hardware without specialized modifications.
Key quotes
I had to test it myself to believe this unreal inference speed.
3,000 tokens/s for 1 user on standard datacenter GPUs. They leveraged a hidden efficiency gap in how GPUs generate tokens.