Self-attention in transformers corresponds exactly to Power Voronoi diagrams under tropical geometry, yielding tight bounds of Theta(N to the power of d_model times L) linear regions.
Theoretical limitations of self-attention in neural sequence models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Large language models display three universal scale-dependent regimes of behavior—stable, chaotic, and signal-dominated—driven by floating-point rounding errors that produce an avalanche effect in early layers.
citing papers explorer
-
Expressivity of Transformers: A Tropical Geometry Perspective
Self-attention in transformers corresponds exactly to Power Voronoi diagrams under tropical geometry, yielding tight bounds of Theta(N to the power of d_model times L) linear regions.
-
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models
Large language models display three universal scale-dependent regimes of behavior—stable, chaotic, and signal-dominated—driven by floating-point rounding errors that produce an avalanche effect in early layers.