Robust Filter Attention models self-attention as consistency-based state estimation under a linear SDE for token trajectories, matching standard attention complexity while showing lower perplexity and better zero-shot length extrapolation than RoPE on language benchmarks.
Adaptive filter attention
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation
Robust Filter Attention models self-attention as consistency-based state estimation under a linear SDE for token trajectories, matching standard attention complexity while showing lower perplexity and better zero-shot length extrapolation than RoPE on language benchmarks.