The Information Machine

I had to test it myself to believe this unreal inference speed.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-29

Startup Kog AI achieves 3,000 tokens per second single-user LLM inference on 8× AMD MI300X GPUs by exploiting a previously unrecognized efficiency gap in GPU token generation, with 2,100 tokens/s on 8× NVIDIA H200s.

Open original ↗

Appears in

Extraction

Topics: llm-inferencegpu-performanceinference-optimization

Claims

  • Kog AI achieved 3,000 tokens/s for a single user on 8× AMD MI300X GPUs.
  • Kog AI achieved 2,100 tokens/s on 8× NVIDIA H200 GPUs.
  • The performance gain exploits a hidden efficiency gap in how GPUs generate tokens.
  • These speeds are achievable on standard datacenter GPU hardware without specialized modifications.

Key quotes

I had to test it myself to believe this unreal inference speed.
3,000 tokens/s for 1 user on standard datacenter GPUs. They leveraged a hidden efficiency gap in how GPUs generate tokens.