RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.
Reformer: The efficient transformer
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
background 1representative citing papers
DashAttention introduces differentiable adaptive sparse hierarchical attention via α-entmax block selection, achieving full-attention accuracy at 75% sparsity with improved Pareto performance over NSA and InfLLMv2.
BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.
S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.
citing papers explorer
-
RelFlexformer: Efficient Attention 3D-Transformers for Integrable Relative Positional Encodings
RelFlexformers enable flexible integrable 3D RPE in attention via NU-FFT, generalizing prior methods to heterogeneous token positions with O(L log L) complexity.
-
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
DashAttention introduces differentiable adaptive sparse hierarchical attention via α-entmax block selection, achieving full-attention accuracy at 75% sparsity with improved Pareto performance over NSA and InfLLMv2.
-
Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes
BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.
-
Simplified State Space Layers for Sequence Modeling
S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.