For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
High-Dimensional Probability: An Introduction with Applications in Data Science , year =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
SMART transfers knowledge in multi-task linear regression via spectral subspace similarity assumptions, achieving near-minimax Frobenius error rates while requiring only a fitted source model.
SMILE aligns latent embeddings across multi-source EHR feature spaces via spherical mixtures of von Mises-Fisher distributions, provides non-asymptotic error bounds, and enables consistent synonym cluster recovery.
citing papers explorer
-
Scaling Limits of Long-Context Transformers
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
-
SMART: A Spectral Transfer Approach to Multi-Task Learning
SMART transfers knowledge in multi-task linear regression via spectral subspace similarity assumptions, achieving near-minimax Frobenius error rates while requiring only a fitted source model.
-
Spherical Mixture Integration for Latent Embedding Alignment across Multi-Source Feature Spaces
SMILE aligns latent embeddings across multi-source EHR feature spaces via spherical mixtures of von Mises-Fisher distributions, provides non-asymptotic error bounds, and enables consistent synonym cluster recovery.