Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
In the wide-width limit under Gaussian likelihood, the posterior of the network output is identified when the random covariance matrix is positive definite, with mild conditions ensuring invertibility and order-independent sequential limits.
Quantitative 2-Wasserstein bounds are established between finite-width deep neural networks and their infinite-width Gaussian limits using a Lindeberg principle for successive Gaussian replacement of weights.
citing papers explorer
-
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
-
Posterior Bayesian Neural Networks with Dependent Weights
In the wide-width limit under Gaussian likelihood, the posterior of the network output is identified when the random covariance matrix is positive definite, with mild conditions ensuring invertibility and order-independent sequential limits.
-
Universality in Deep Neural Networks: An approach via the Lindeberg exchange principle
Quantitative 2-Wasserstein bounds are established between finite-width deep neural networks and their infinite-width Gaussian limits using a Lindeberg principle for successive Gaussian replacement of weights.