Develops a mean-field neural PDE model for transformer training, proves forward-pass well-posedness via function-space ODEs, derives conditional Wasserstein gradients, and shows global convergence of gradient flow under an NTK injectivity condition equivalent to linear independence of log-sum-exp fu
Transport equation and Cauchy problem for non-smooth vector fields
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training Infinitely Deep and Wide Transformers
Develops a mean-field neural PDE model for transformer training, proves forward-pass well-posedness via function-space ODEs, derives conditional Wasserstein gradients, and shows global convergence of gradient flow under an NTK injectivity condition equivalent to linear independence of log-sum-exp fu