pith. sign in

arxiv: 1706.09156 · v4 · pith:BXGUG2NDnew · submitted 2017-06-28 · 🧮 math.OC · cs.CC

Non-convex Finite-Sum Optimization Via SCSG Methods

classification 🧮 math.OC cs.CC
keywords scsgmethodsepsilongradientoutperformsstochasticfinite-sumnon-convex
0
0 comments X
read the original abstract

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods (Lei and Jordan, 2016), for the smooth non-convex finite-sum optimization problem. Assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $\mathbb{E} \|\nabla f(x)\|^{2}\le \epsilon$ is $O\left (\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\}\right)$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.