The loss landscape of overparameterized neural networks

Y Cooper

arxiv: 1804.10200 · v1 · pith:JRRGTZ46new · submitted 2018-04-26 · 💻 cs.LG · cs.AI· cs.NE· stat.ML

The loss landscape of overparameterized neural networks

Y Cooper This is my paper

classification 💻 cs.LG cs.AIcs.NEstat.ML

keywords neuralfunctionlossmathbboverparameterizeddatadiscreteglobal

0 comments

read the original abstract

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $\mathbb{R}^n$ to $\mathbb{R}$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has $n$ parameters and is trained on $d$ data points, with $n>d$, we show that the locus $M$ of global minima of $L$ is usually not discrete, but rather an $n-d$ dimensional submanifold of $\mathbb{R}^n$. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that $M$ is typically a very high-dimensional subset of $\mathbb{R}^n$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Flat Channels to Infinity in Neural Loss Landscapes
cs.LG 2025-06 unverdicted novelty 7.0

Neural loss landscapes contain flat channels to infinity along which gradient flow leads pairs of neurons to implement gated linear units.