Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

Cong Fang; Tong Zhang; Zhouchen Lin

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1902.00247 v2 pith:ZWMJJRBE submitted 2019-02-01 math.OC cs.CCcs.LG

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points

Cong Fang , Zhouchen Lin , Tong Zhang This is my paper

classification math.OC cs.CCcs.LG

keywords epsilonstochasticnonconvexanalysisgradientoptimizationalgorithmsapproximate

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(\epsilon, O(\epsilon^{0.5}))$-approximate second-order stationary point in $\tilde{O}(\epsilon^{-3.5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least $O(\epsilon^{-4})$ stochastic gradient computations for obtaining an $(\epsilon,O(\epsilon^{0.5}))$-approximate second-order stationary point. Such SGD rate matches, up to a polylogarithmic factor of problem-dependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov's momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
math.OC 2023-07 unverdicted novelty 7.0

Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of ...
Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points
cs.MA 2019-07 unverdicted novelty 6.0

Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centr...
Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate
math.OC 2019-07 unverdicted novelty 5.0

Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.