Escaping Saddles with Stochastic Gradients

Aurelien Lucchi; Hadi Daneshmand; Jonas Kohler; Thomas Hofmann

arxiv: 1803.05999 · v2 · pith:WVTPM2BAnew · submitted 2018-03-15 · 💻 cs.LG · math.OC· stat.ML

Escaping Saddles with Stochastic Gradients

Hadi Daneshmand , Jonas Kohler , Aurelien Lucchi , Thomas Hofmann This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords gradientsstochasticalongdirectionsisotropicnoiseundervariance

0 comments

read the original abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dimension-Free Saddle-Point Escape in Muon
cs.LG 2026-05 unverdicted novelty 6.0

Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points
cs.MA 2019-07 unverdicted novelty 6.0

Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centr...
Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate
math.OC 2019-07 unverdicted novelty 5.0

Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.