Gradient Descent Converges to Minimizers
read the original abstract
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of ...
-
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape
Permutation symmetries generate permutation saddles and equal-loss valleys linking equivalent global minima, yielding a lower bound on symmetry-induced critical points.
-
Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization
Introduces the SANC algorithm combining negative curvature with stochastic adaptive cubic regularization for nonconvex optimization and claims it is the first such combination with consistent batch sizes for large-scale ML.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.