Gradient Descent Converges to Minimizers

· 2016 · stat.ML · arXiv 1602.04915

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

representative citing papers

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima

math.OC · 2023-07-13 · unverdicted · novelty 7.0

Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of nonconvex functions.

Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape

cs.LG · 2019-07-05 · conditional · novelty 7.0

Permutation symmetries generate permutation saddles and equal-loss valleys linking equivalent global minima, yielding a lower bound on symmetry-induced critical points.

Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization

math.OC · 2019-06-27 · unverdicted · novelty 7.0

Introduces the SANC algorithm combining negative curvature with stochastic adaptive cubic regularization for nonconvex optimization and claims it is the first such combination with consistent batch sizes for large-scale ML.

citing papers explorer

Showing 3 of 3 citing papers.

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima math.OC · 2023-07-13 · unverdicted · none · ref 49 · internal anchor
Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of nonconvex functions.
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape cs.LG · 2019-07-05 · conditional · none · ref 23 · internal anchor
Permutation symmetries generate permutation saddles and equal-loss valleys linking equivalent global minima, yielding a lower bound on symmetry-induced critical points.
Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization math.OC · 2019-06-27 · unverdicted · none · ref 23 · internal anchor
Introduces the SANC algorithm combining negative curvature with stochastic adaptive cubic regularization for nonconvex optimization and claims it is the first such combination with consistent batch sizes for large-scale ML.

Gradient Descent Converges to Minimizers

fields

years

verdicts

representative citing papers

citing papers explorer