pith. sign in

arxiv: 1906.00193 · v1 · pith:VXUG2K3Onew · submitted 2019-06-01 · 🧮 math.ST · cond-mat.dis-nn· math.PR· stat.TH

A mean-field limit for certain deep neural networks

classification 🧮 math.ST cond-mat.dis-nnmath.PRstat.TH
keywords limitnetworkscertaindeepdnnsfixedlayersmckean-vlasov
0
0 comments X
read the original abstract

Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being trained by stochastic gradient descent. Our networks have a fixed (but arbitrary) number $L\geq 2$ of inner layers; $N\gg 1$ neurons per layer; full connections between layers; and fixed weights (or "random features" that are not trained) near the input and output. Our results describe the evolution of the DNN during training in the limit when $N\to +\infty$, which we relate to a mean field model of McKean-Vlasov type. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by the mean-field model. A key part of the proof is to show existence and uniqueness for our McKean-Vlasov problem, which does not seem to be amenable to existing theory. Our paper extends previous work on the $L=1$ case by Mei, Montanari and Nguyen; Rotskoff and Vanden-Eijnden; and Sirignano and Spiliopoulos. We also complement recent independent work on $L>1$ by Sirignano and Spiliopoulos (who consider a less natural scaling limit) and Nguyen (who nonrigorously derives similar results).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

    stat.ML 2025-11 unverdicted novelty 5.0

    At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near...