For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
The variational formulation of the
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
method 1polarities
use method 1representative citing papers
Sobolev regularization on the witness function enables global convergence of MMD gradient flows for both sampling and generative modeling without isoperimetric assumptions.
Using a new discrete Wasserstein distance and action functional, the paper proves polynomial convergence rates for annealed Glauber dynamics in mean-field Ising and Potts models.
Energy-based model with covariance regularization computes normalized posteriors for linear inverse problems without retraining, enabling adaptive sampling and blind estimation on image datasets.
Geometric tempering yields exponential convergence bounds for both Wasserstein and Fisher-Rao flows but produces no speedup in the Fisher-Rao metric, with new adaptive schedules derived from the tempered dynamics.
citing papers explorer
-
Scaling Limits of Long-Context Transformers
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
-
Sobolev Regularized MMD Gradient Flow
Sobolev regularization on the witness function enables global convergence of MMD gradient flows for both sampling and generative modeling without isoperimetric assumptions.
-
Discrete Optimal Transport: Rapid Convergence of Simulated Annealing Algorithms
Using a new discrete Wasserstein distance and action functional, the paper proves polynomial convergence rates for annealed Glauber dynamics in mean-field Ising and Potts models.
-
Learning Normalized Energy Models for Linear Inverse Problems
Energy-based model with covariance regularization computes normalized posteriors for linear inverse problems without retraining, enabling adaptive sampling and blind estimation on image datasets.
-
Properties and limitations of geometric tempering for gradient flow dynamics
Geometric tempering yields exponential convergence bounds for both Wasserstein and Fisher-Rao flows but produces no speedup in the Fisher-Rao metric, with new adaptive schedules derived from the tempered dynamics.