Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.
Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.
Reduced SPDE models for co-evolving opinion dynamics capture clustering behavior efficiently with lower cost than full-state models.
A data-driven framework reduces particle-based transfer operators via concentration projection, geometric manifold, and finite-state discretization to reproduce clustering transitions and metastable states from simulation data.
citing papers explorer
-
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
-
Perceptrons and localization of attention's mean-field landscape
In the mean-field limit of attention with perceptron blocks, critical points of the energy landscape are generically atomic and localized on subsets of the unit sphere.
-
Multi-Headed Transformer Architectures as Time-dependent Wasserstein Gradient Flows
Models multi-head transformer data flow as time-dependent Wasserstein gradient flows of an attention-capturing interaction energy, with proofs on omega-limit stationary points and stability under weight and input perturbations.
-
Clustering in co-evolving opinion dynamics: reduced SPDE models
Reduced SPDE models for co-evolving opinion dynamics capture clustering behavior efficiently with lower cost than full-state models.
-
Data-driven Reduction of Transfer Operators for Particle Clustering Dynamics
A data-driven framework reduces particle-based transfer operators via concentration projection, geometric manifold, and finite-state discretization to reproduce clustering transitions and metastable states from simulation data.