Disaggregation goes further: prefill on one GPU pool, decode on another, ship the KV cache between them. You pay a trans...

reactive:inference-cost-optimization · Prajjwal · building nanoserve (@pdurdenj) · 2026-06-29

(No summary yet for this item — extraction summaries are still backfilling.)

Appears in