The long-context demands of agentic AI accelerated attention research aimed at overcoming the context wall. Over the pas…

SemiAnalysis Twitter · SemiAnalysis (@SemiAnalysis_) · 2026-06-29

SemiAnalysis surveys the past year's attention research wave driven by agentic AI's long-context demands, crediting Gated Delta Networks for linear attention and DeepSeek's Native Sparse Attention for sparse attention, with adoption across Qwen, Kimi, MiniMax, ZhipuAI, Cohere, and Xiaomi.

Open original ↗

Appears in

Transformer Attention: A Decade of Innovation Recognized by SemiAnalysis

Extraction

Topics: linear-attentionsparse-attentionlong-contextagentic-aiattention-mechanisms

Claims

Long-context demands of agentic AI directly accelerated a wave of attention research aimed at overcoming context-window limits.
Gated Delta Networks (GDNs) became the dominant linear attention approach and were adopted by Qwen 3.5, with Kimi building further improvements on top.
DeepSeek led open sparse attention research with Native Sparse Attention and DeepSeek Sparse Attention (DSA), inspiring MiniMax and ZhipuAI variants.
SWA-GQA hybrid attention was popularized by Cohere and later refined by Xiaomi with detailed ablation studies published.

Key quotes

The long-context demands of agentic AI accelerated attention research aimed at overcoming the context wall.

linear attention has become mainstream, most notably with Gated Delta Networks (GDNs) by @songlinyang4 gaining strong traction among open-weight models.