Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
read the original abstract
We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^D$ (where $D$ is the number of parameters associated to each neuron). This evolution can be defined through a partial differential equation or, equivalently, as the gradient flow in the Wasserstein space of probability distributions. Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$. In this paper we establish stronger and more general approximation guarantees. First of all, we show that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions. Next, we generalize this analysis to the case of unbounded activation functions, which was not covered by earlier bounds. We extend our results to noisy stochastic gradient descent. Finally, we show that kernel ridge regression can be recovered as a special limit of the mean field analysis.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Thermodynamic Irreversibility of Training Algorithms
Four characterizations of irreversibility in training algorithms are equivalent to leading order in step size and produce an emergent force that breaks reparametrization symmetries while favoring minimum entropy produ...
-
An overview of condensation phenomenon in deep learning
Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into general...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.