Gradient Descent Can Take Exponential Time to Escape Saddle Points

Aarti Singh; Barnabas Poczos; Chi Jin; Jason D. Lee; Michael I. Jordan; Simon S. Du

arxiv: 1705.10412 · v2 · pith:WILZIHPYnew · submitted 2017-05-29 · 🧮 math.OC · cs.LG· stat.ML

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Simon S. Du , Chi Jin , Jason D. Lee , Michael I. Jordan , Barnabas Poczos , Aarti Singh This is my paper

classification 🧮 math.OC cs.LGstat.ML

keywords pointssaddledescentgradienttimedownescapeexponential

0 comments

read the original abstract

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points
cs.MA 2019-07 unverdicted novelty 6.0

Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centr...