Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.
Gradient Descent Can Take Exponential Time to Escape Saddle Points
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points - it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.
fields
cs.MA 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points
Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.