Today’s edition of my newsletter just went out.

Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-29

Rohan Paul's daily AI newsletter reports on Anthropic releasing Claude Opus 4.8 with a 1M-token context window and parallel agentic 'dynamic workflows' on the same day as a $965B post-money valuation round, alongside KogAI's claimed 10-30x inference speedup using a monokernel GPU architecture achieving 3,000 tokens/s on a 2B model.

Open original ↗

Appears in

Claude Opus 4.8: Candid Model Launch with Mid-Conversation System Messages

Extraction

Topics: claude-opus-4-8anthropicllm-inference-optimizationagentic-aiai-valuation

Claims

Anthropic released Claude Opus 4.8 with a 1M-token context window, up to 128K output tokens, and a new 'dynamic workflows' feature that parallelizes large engineering tasks across tens to hundreds of subagents.
Claude Opus 4.8 scored 74.6% on agentic terminal coding benchmarks, up from 66.1% for Opus 4.7.
Anthropic closed a $65B funding round at a $965B post-money valuation on the same day as the model release.
KogAI achieved 3,000 tokens/s per user on 8× AMD MI300X GPUs with a 2B model in FP16 without speculative decoding, claiming a 10-30x improvement over typical inference stacks.
KogAI's speed gains derive from a 'monokernel' design running the entire decode pass as a single persistent GPU-resident program, and from eliminating grid synchronization overhead that consumed approximately 35% of token generation time.

Key quotes

Fast Mode now runs at 2.5x the speed and costs 3x less.

dynamic workflows in Claude Code will break a massive engineering task into many smaller jobs, run them through tens to hundreds of parallel subagents, and check the results before handing anything back.

their own measurements say grid sync was eating around 35% of token-generation time; instead of making every compute unit wait at a broad barrier, each unit waits only for the exact data it needs.