Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.
arXiv preprint arXiv:1805.00915 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.
A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.
Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.
citing papers explorer
-
Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks
Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.
-
Mirror Descent-Ascent for mean-field min-max problems
Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.
-
A unified perspective on fine-tuning and sampling with diffusion and flow models
A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
-
Mean-field limit of particle systems with absorption
Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.
-
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.
-
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.