pith. sign in

arxiv: 1803.05999 · v2 · pith:WVTPM2BAnew · submitted 2018-03-15 · 💻 cs.LG · math.OC· stat.ML

Escaping Saddles with Stochastic Gradients

classification 💻 cs.LG math.OCstat.ML
keywords gradientsstochasticalongdirectionsisotropicnoiseundervariance
0
0 comments X
read the original abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this observation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dimension-Free Saddle-Point Escape in Muon

    cs.LG 2026-05 unverdicted novelty 6.0

    Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.

  2. Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points

    cs.MA 2019-07 unverdicted novelty 6.0

    Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centr...

  3. Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate

    math.OC 2019-07 unverdicted novelty 5.0

    Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.