Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

Andrea Montanari; Song Mei; Theodor Misiakiewicz

arxiv: 1902.06015 · v1 · pith:IN44AKBHnew · submitted 2019-02-16 · 📊 stat.ML · cond-mat.stat-mech· cs.LG· math.ST· stat.TH

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

Song Mei , Theodor Misiakiewicz , Andrea Montanari This is my paper

classification 📊 stat.ML cond-mat.stat-mechcs.LGmath.STstat.TH

keywords evolutiongradientnumberanalysisboundsdescentdescriptiondistributions

0 comments

read the original abstract

We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^D$ (where $D$ is the number of parameters associated to each neuron). This evolution can be defined through a partial differential equation or, equivalently, as the gradient flow in the Wasserstein space of probability distributions. Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$. In this paper we establish stronger and more general approximation guarantees. First of all, we show that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions. Next, we generalize this analysis to the case of unbounded activation functions, which was not covered by earlier bounds. We extend our results to noisy stochastic gradient descent. Finally, we show that kernel ridge regression can be recovered as a special limit of the mean field analysis.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Thermodynamic Irreversibility of Training Algorithms
cond-mat.stat-mech 2026-05 unverdicted novelty 6.0

Four characterizations of irreversibility in training algorithms are equivalent to leading order in step size and produce an emergent force that breaks reparametrization symmetries while favoring minimum entropy produ...
An overview of condensation phenomenon in deep learning
cs.LG 2025-04 unverdicted novelty 2.0

Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into general...