The Information Machine

MLSys 2026: Inference Systems Research Preview

Synthesis history

9 versions, newest first.

  1. Version 9 2026-05-26 19:38 UTC · 226 items

    New items this pass are thin: a second tweet confirming UW SyFi's multi-prize win[^20724] and the public FlashInfer Bench GitHub starter kit[^20725] add minor depth to the contest thread, and Paradigm.xyz's Attention Ke…

  2. Version 8 2026-05-25 19:01 UTC · 221 items

    vLLM's disaggregated prefilling documentation for both v0.8.5[^20326] and v0.10.2[^20327] confirms the 'experimental' label has not been lifted through the most recent tracked release, directly answering a prior open qu…

  3. Version 7 2026-05-25 10:13 UTC · 213 items

    Third-party deployment guides from Vultr[^19593] and Spheron[^19594], plus a dedicated vLLM Dynamo integration page[^19595], extend the disaggregation toolchain maturity signal beyond NVIDIA's own Kubernetes documentati…

  4. Version 6 2026-05-25 04:14 UTC · 193 items

    NVIDIA Dynamo's official Kubernetes documentation for disaggregated communication[^18696] is the most substantive addition this pass: it directly addresses the RDMA KV cache networking obstacle that was previously chara…

  5. Version 5 2026-05-24 18:56 UTC · 184 items

    The Native Sparse Attention paper (arXiv 2502.11089, ACL 2025) enters the thread as a substantive counter-weight to the sparse-attention-as-stopgap debate: NSA designs hardware-aligned sparse attention from training tim…

  6. Version 4 2026-05-24 11:13 UTC · 164 items

    Three substantive additions this pass: (1) @superaiwatcher introduces the first explicit counter-narrative in the thread, framing sparse attention as a transitional stopgap before hardware-native linear attention, which…

  7. Version 3 2026-05-24 04:52 UTC · 128 items

    MIT HAN Lab's Adaptive Drafter paper (arXiv 2511.16665) is now confirmed as ASPLOS'26 with an open-source GitHub repository (mit-han-lab/fastrl) and MIT News coverage; secondary sources characterize the speedup as 2x, s…

  8. Version 2 2026-05-23 05:02 UTC · 107 items

    Attention-FFN disaggregation moved from a single conference mention to a concrete engineering push this pass: StepFun's StepMesh library, a vLLM RFC, and formal papers on provisioning and hardware challenges all appeare…

  9. Version 1 2026-05-22 18:27 UTC · 80 items

    MLSys 2026, the ninth annual Conference on Machine Learning and Systems, is underway in Bellevue, Washington (week of May 18–22, 2026)[^7796][^10577]. The conference's inference track is organized around four converging…