Defines the L² over Wasserstein space to equip random probability measures with inherited Riemannian geometry, enabling statistical convergence results and Bayesian posterior consistency in the Wasserstein topology.
On the structure of stationary solutions to McKean-Vlasov equations with applications to noisy transformers
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
For 1/(n+1)-periodic interactions with Fourier decay, the phase transition in mean-field free energies on the circle is continuous at the linear stability threshold of the uniform state.
Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).
citing papers explorer
-
$L^2$ over Wasserstein: Statistical Analysis for Optimal Transport
Defines the L² over Wasserstein space to equip random probability measures with inherited Riemannian geometry, enabling statistical convergence results and Bayesian posterior consistency in the Wasserstein topology.
-
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
-
Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models
For 1/(n+1)-periodic interactions with Fourier decay, the phase transition in mean-field free energies on the circle is continuous at the linear stability threshold of the uniform state.
-
Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows
Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.
-
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).