Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

· 2026 · stat.ML · arXiv 2605.22010

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We consider one-hidden layer neural networks trained in the feature-learning regime using gradient descent, and relate the output of the finite-width network $f_{\hat{\rho}_t^m}$ to its infinite-width counterpart $f_{\rho_t^{MF}}$, which evolves in the mean-field dynamics. While constant-time horizon bounds for $\|f_{\rho_t^{MF}} - f_{\hat{\rho}_t^m}\|$ may be obtained via standard Gr\"onwall estimates, the long-time behavior of the fluctuation is a more delicate matter. Uniform-in-time bounds often rely on (local) strong convexity in the landscape or Logarithmic Sobolev inequalities present in noisy gradient dynamics. In this work, we establish non-asymptotic weak propagation-of-chaos that holds uniformly in time, obtained by exploiting instead the convergence rate of the mean-field deterministic Wasserstein-gradient-flow dynamics. Specifically, denoting by $L_t$ the mean-field excess MSE loss at time $t$ and $m$ the number of neurons, under standard regularity assumptions and the condition $\int_0^\infty L_t^{1/2} dt =O(\log d)$, we obtain the uniform in time bound $\|f_{\rho_t^{MF}}- f_{\hat{\rho}_t^m}\|^2 \lesssim \text{poly}(d) m^{-\min(1,c/6)}$ whenever $L_t \lesssim t^{-c}$. Our result holds in a noiseless setting and does not make any assumptions on the geometry of the landscape near the optimum, and extends seamlessly to other forms of discretization, including finite number of samples and time discretization. A key takeaway of our result is that whenever the convergence rate of the mean-field, population-loss dynamics is faster than $t^{-2}$, we can attain a loss of $\epsilon$ with only $\text{poly}(d/\epsilon)$ neurons, training samples, and GD steps.

representative citing papers

Uniform-in-time Propagation-of-Chaos for Stein Variational Gradient Descent

math.PR · 2026-06-30 · unverdicted · novelty 7.0

Uniform-in-time propagation-of-chaos bounds for SVGD are obtained via cutoff for distributional metrics (logarithmic rates) and via finite-dimensional closure plus conjugacy for Gaussian targets (parametric N^{-1/2} rates).

citing papers explorer

Showing 1 of 1 citing paper after filters.

Uniform-in-time Propagation-of-Chaos for Stein Variational Gradient Descent math.PR · 2026-06-30 · unverdicted · none · ref 21 · internal anchor
Uniform-in-time propagation-of-chaos bounds for SVGD are obtained via cutoff for distributional metrics (logarithmic rates) and via finite-dimensional closure plus conjugacy for Gaussian targets (parametric N^{-1/2} rates).

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

fields

years

verdicts

representative citing papers

citing papers explorer