Robust Filter Attention models self-attention as consistency-based state estimation under a linear SDE for token trajectories, matching standard attention complexity while showing lower perplexity and better zero-shot length extrapolation than RoPE on language benchmarks.
2505.10222 , archivePrefix=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
citing papers explorer
-
Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation
Robust Filter Attention models self-attention as consistency-based state estimation under a linear SDE for token trajectories, matching standard attention complexity while showing lower perplexity and better zero-shot length extrapolation than RoPE on language benchmarks.
-
RT-Transformer: The Transformer Block as a Spherical State Estimator
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.