Prefill and decode fight for the same GPU. Prefill is compute bound, decode is memory bandwidth bound. Run them in the s...
reactive:inference-cost-optimization · Prajjwal · building nanoserve (@pdurdenj) · 2026-06-29
(No summary yet for this item — extraction summaries are still backfilling.)