Chamath on all important “prefill” and “decode.” in AI compute.
Rohan Paul Twitter · Rohan Paul (@rohanpaul_ai) · 2026-05-24
Chamath explains the two phases of AI inference compute — prefill (compute-bound, favoring Nvidia's parallel GPUs) and decode (memory-bandwidth-bound) — and their implications for AI hardware dominance.
Appears in
Extraction
Topics: ai-inferencegpu-architectureai-hardwarenvidia
Claims
- Prefill in AI inference is compute-bound, giving massively parallel GPU architectures like Nvidia a dominant advantage as context windows grow.
- Decode is memory-bandwidth-bound because generating each new token requires scanning all previously generated tokens.
- The prefill/decode distinction has structural implications for which hardware vendors dominate different parts of the AI compute stack.
Key quotes
Prefill is compute-bound; massive parallel GPUs win, so Nvidia dominates as context grows.
Decode is memory-bandwidth bound as each next token depends on scanning what's already generated.