Gradient Descent Can Take Exponential Time to Escape Saddle Points

· 2017 · math.OC · arXiv 1705.10412

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

representative citing papers

Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points

cs.MA · 2019-07-03 · unverdicted · novelty 6.0

Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.

citing papers explorer

Showing 1 of 1 citing paper.

Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points cs.MA · 2019-07-03 · unverdicted · none · ref 23 · internal anchor
Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.

Gradient Descent Can Take Exponential Time to Escape Saddle Points

fields

years

verdicts

representative citing papers

citing papers explorer