Efficient content-based sparse attention with routing transformers

Roy, A · 2020 · DOI 10.1162/tacl_a_00353

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS

cs.LG · 2026-04-04 · conditional · novelty 7.0

k-MIP attention enables linear-complexity graph transformers that approximate full attention arbitrarily closely and bounds GraphGPS expressivity via S-SEG-WL.

Gated Linear Attention Transformers with Hardware-Efficient Training

cs.LG · 2023-12-11 · unverdicted · novelty 6.0

Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

citing papers explorer

Showing 2 of 2 citing papers.

k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS cs.LG · 2026-04-04 · conditional · none · ref 2
k-MIP attention enables linear-complexity graph transformers that approximate full attention arbitrarily closely and bounds GraphGPS expressivity via S-SEG-WL.
Gated Linear Attention Transformers with Hardware-Efficient Training cs.LG · 2023-12-11 · unverdicted · none · ref 81
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

Efficient content-based sparse attention with routing transformers

fields

years

verdicts

representative citing papers

citing papers explorer