Disaggregation goes further: prefill on one GPU pool, decode on another, ship the KV cache between them. You pay a trans...
reactive:inference-cost-optimization · Prajjwal · building nanoserve (@pdurdenj) · 2026-06-29
(No summary yet for this item — extraction summaries are still backfilling.)