Finite-Sum Smooth Optimization with SARAH

Dzung T. Phan; Jayant R. Kalagnanam; Lam M. Nguyen; Marten van Dijk; Phuong Ha Nguyen; Tsui-Wei Weng

arxiv: 1901.07648 · v2 · pith:GA3DW32Dnew · submitted 2019-01-22 · 🧮 math.OC · cs.LG· stat.ML

Finite-Sum Smooth Optimization with SARAH

Lam M. Nguyen , Marten van Dijk , Dzung T. Phan , Phuong Ha Nguyen , Tsui-Wei Weng , Jayant R. Kalagnanam This is my paper

classification 🧮 math.OC cs.LGstat.ML

keywords epsiloncomplexityconvergenceconvexoptimizationsarahtotalalgorithm

0 comments

read the original abstract

The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $\Omega(\sqrt{n}/\epsilon)$ for $n \leq \mathcal{O}(\epsilon^{-2})$ where $\epsilon$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq \epsilon$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(\epsilon^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Stochastic Composite Gradient Method with Incremental Variance Reduction
math.OC 2019-06 unverdicted novelty 6.0

Proposes an incremental variance-reduced stochastic gradient method for minimizing smooth nonconvex composite functions that achieves optimal first-order complexity rates.
Accelerating Mini-batch SARAH by Step Size Rules
cs.LG 2019-06 unverdicted novelty 4.0

MB-SARAH-RBB uses a random Barzilai-Borwein step size to accelerate mini-batch SARAH, with a linear convergence proof and improved complexity for strongly convex objectives.