Escaping saddles with stochastic gradients

Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann · 2018 · cs.LG · arXiv 1803.05999

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points

cs.MA · 2019-07-03 · unverdicted · novelty 6.0

Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.

Dimension-Free Saddle-Point Escape in Muon

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.

Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate

math.OC · 2019-07-03 · unverdicted · novelty 5.0

Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.

citing papers explorer

Showing 3 of 3 citing papers.

Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points cs.MA · 2019-07-03 · unverdicted · none · ref 27 · internal anchor
Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.
Dimension-Free Saddle-Point Escape in Muon cs.LG · 2026-05-10 · unverdicted · none · ref 5
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate math.OC · 2019-07-03 · unverdicted · none · ref 24 · internal anchor
Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.

Escaping saddles with stochastic gradients

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer