Synthesis history

4 versions, newest first.

Version 4 2026-06-09 18:22 UTC · 37 items

SemiAnalysis's longitudinal DeepSeek V4 inference benchmarks via InferenceX [^27364] are the most substantive addition: AMD ROCm's 100x improvement by Day 26 and NVIDIA TensorRT-LLM's launch failures add a time-series b…
Version 3 2026-06-05 08:07 UTC · 32 items

Three substantive additions this pass. Moreh's 21K aggregate tokens/s benchmark on AMD MI300X [^25133] introduces a per-request vs. aggregate throughput distinction that sharpens a core tension. Qualcomm's CEO projectin…
Version 2 2026-06-01 18:31 UTC · 25 items

Kog AI's primary technical blog post (23206) surfaced in the feed, confirming the monokernel approach has published documentation — though no specific claims were extractable from it in this pass. Community discussion s…
Version 1 2026-05-31 18:12 UTC · 17 items

Kog AI, an inference startup, has claimed approximately 3,000 tokens per second on 8× AMD MI300X GPUs and 2,100 tokens/s on 8× NVIDIA H200 with a 2B parameter model — roughly 20–30× faster than the ~100 tokens/s typical…