Native sparse attention: Hardware-aligned and natively trainable sparse attention

Jingyang Yuan, Huazuo Liu, Zhaozhuo Zhang, et al · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Why Attend to Everything? Focus is the Key

cs.CL · 2026-03-12 · conditional · novelty 6.0

Focus learns a few centroids to gate long-range token attention, producing sparse attention that matches or beats full attention quality with up to 8.6x speedup at million-token lengths.

citing papers explorer

Showing 1 of 1 citing paper.

Why Attend to Everything? Focus is the Key cs.CL · 2026-03-12 · conditional · none · ref 33
Focus learns a few centroids to gate long-range token attention, producing sparse attention that matches or beats full attention quality with up to 8.6x speedup at million-token lengths.

Native sparse attention: Hardware-aligned and natively trainable sparse attention

fields

years

verdicts

representative citing papers

citing papers explorer