pith. sign in

arxiv: 1702.08580 · v2 · pith:3TDI2XNFnew · submitted 2017-02-27 · 💻 cs.LG · cs.NE· math.OC· stat.ML

Depth Creates No Bad Local Minima

classification 💻 cs.LG cs.NEmath.OCstat.ML
keywords minimadepthlocalcreatedeeplossresultsalone
0
0 comments X
read the original abstract

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

    cs.LG 2019-07 conditional novelty 7.0

    Permutation symmetries generate permutation saddles and equal-loss valleys linking equivalent global minima, yielding a lower bound on symmetry-induced critical points.

  2. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    cs.LG 2024-01 unverdicted novelty 6.0

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...