MLSys 2026: Inference Systems Research Preview
Synthesis history
9 versions, newest first.
-
Version 9 2026-05-26 19:38 UTC · 226 items
New items this pass are thin: a second tweet confirming UW SyFi's multi-prize win[^20724] and the public FlashInfer Bench GitHub starter kit[^20725] add minor depth to the contest thread, and Paradigm.xyz's Attention Ke…
-
Version 8 2026-05-25 19:01 UTC · 221 items
vLLM's disaggregated prefilling documentation for both v0.8.5[^20326] and v0.10.2[^20327] confirms the 'experimental' label has not been lifted through the most recent tracked release, directly answering a prior open qu…
-
Version 7 2026-05-25 10:13 UTC · 213 items
Third-party deployment guides from Vultr[^19593] and Spheron[^19594], plus a dedicated vLLM Dynamo integration page[^19595], extend the disaggregation toolchain maturity signal beyond NVIDIA's own Kubernetes documentati…
-
Version 6 2026-05-25 04:14 UTC · 193 items
NVIDIA Dynamo's official Kubernetes documentation for disaggregated communication[^18696] is the most substantive addition this pass: it directly addresses the RDMA KV cache networking obstacle that was previously chara…
-
Version 5 2026-05-24 18:56 UTC · 184 items
The Native Sparse Attention paper (arXiv 2502.11089, ACL 2025) enters the thread as a substantive counter-weight to the sparse-attention-as-stopgap debate: NSA designs hardware-aligned sparse attention from training tim…
-
Version 4 2026-05-24 11:13 UTC · 164 items
Three substantive additions this pass: (1) @superaiwatcher introduces the first explicit counter-narrative in the thread, framing sparse attention as a transitional stopgap before hardware-native linear attention, which…
-
Version 3 2026-05-24 04:52 UTC · 128 items
MIT HAN Lab's Adaptive Drafter paper (arXiv 2511.16665) is now confirmed as ASPLOS'26 with an open-source GitHub repository (mit-han-lab/fastrl) and MIT News coverage; secondary sources characterize the speedup as 2x, s…
-
Version 2 2026-05-23 05:02 UTC · 107 items
Attention-FFN disaggregation moved from a single conference mention to a concrete engineering push this pass: StepFun's StepMesh library, a vLLM RFC, and formal papers on provisioning and hardware challenges all appeare…
-
Version 1 2026-05-22 18:27 UTC · 80 items
MLSys 2026, the ninth annual Conference on Machine Learning and Systems, is underway in Bellevue, Washington (week of May 18–22, 2026)[^7796][^10577]. The conference's inference track is organized around four converging…