The Information Machine

The long-context demands of agentic AI accelerated attention research aimed at overcoming the context wall. Over the pas…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-29

SemiAnalysis surveys the past year's attention research wave driven by agentic AI's long-context demands, crediting Gated Delta Networks for linear attention and DeepSeek's Native Sparse Attention for sparse attention, with adoption across Qwen, Kimi, MiniMax, ZhipuAI, Cohere, and Xiaomi.

Open original ↗

Appears in

Extraction

Topics: linear-attentionsparse-attentionlong-contextagentic-aiattention-mechanisms

Claims

  • Long-context demands of agentic AI directly accelerated a wave of attention research aimed at overcoming context-window limits.
  • Gated Delta Networks (GDNs) became the dominant linear attention approach and were adopted by Qwen 3.5, with Kimi building further improvements on top.
  • DeepSeek led open sparse attention research with Native Sparse Attention and DeepSeek Sparse Attention (DSA), inspiring MiniMax and ZhipuAI variants.
  • SWA-GQA hybrid attention was popularized by Cohere and later refined by Xiaomi with detailed ablation studies published.

Key quotes

The long-context demands of agentic AI accelerated attention research aimed at overcoming the context wall.
linear attention has become mainstream, most notably with Gated Delta Networks (GDNs) by @songlinyang4 gaining strong traction among open-weight models.