k-MIP attention enables linear-complexity graph transformers that approximate full attention arbitrarily closely and bounds GraphGPS expressivity via S-SEG-WL.
Efficient content-based sparse attention with routing transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.
citing papers explorer
-
k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
k-MIP attention enables linear-complexity graph transformers that approximate full attention arbitrarily closely and bounds GraphGPS expressivity via S-SEG-WL.
-
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.