For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
Using a new discrete Wasserstein distance and action functional, the paper proves polynomial convergence rates for annealed Glauber dynamics in mean-field Ising and Potts models.
Agent's optimization in unique-contract principal-agent problem with adverse selection is recast as stochastic target problem, enabling principal's objective as stochastic optimal control with partial information and state constraints.
Energy-based model with covariance regularization computes normalized posteriors for linear inverse problems without retraining, enabling adaptive sampling and blind estimation on image datasets.
Observational and counterfactual distributions are linked by identical support and invariant features, enabling a flow-matching estimator with semiparametric efficiency correction to generate debiased counterfactuals from observations.
citing papers explorer
-
Scaling Limits of Long-Context Transformers
For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.
-
Discrete Optimal Transport: Rapid Convergence of Simulated Annealing Algorithms
Using a new discrete Wasserstein distance and action functional, the paper proves polynomial convergence rates for annealed Glauber dynamics in mean-field Ising and Potts models.
-
Principal-agent problems with adverse selection: A stochastic target problem formulation
Agent's optimization in unique-contract principal-agent problem with adverse selection is recast as stochastic target problem, enabling principal's objective as stochastic optimal control with partial information and state constraints.
-
Learning Normalized Energy Models for Linear Inverse Problems
Energy-based model with covariance regularization computes normalized posteriors for linear inverse problems without retraining, enabling adaptive sampling and blind estimation on image datasets.
-
Debiased Counterfactual Generation via Flow Matching from Observations
Observational and counterfactual distributions are linked by identical support and invariant features, enabling a flow-matching estimator with semiparametric efficiency correction to generate debiased counterfactuals from observations.