Tuning an LLM cluster runs the design space up to ~10^6 GPU-hours. A new paper from UT Austin + @preminstrel + colleague...
reactive:mlsys-2026-inference-systems · Guilherme Favaron (@guifav) · 2026-05-20
(No summary yet for this item — extraction summaries are still backfilling.)