Finite-Sum Smooth Optimization with SARAH
read the original abstract
The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function $F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w)$ has been proven to be at least $\Omega(\sqrt{n}/\epsilon)$ for $n \leq \mathcal{O}(\epsilon^{-2})$ where $\epsilon$ denotes the attained accuracy $\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq \epsilon$ for the outputted approximation $\tilde{w}$ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when $n \leq \mathcal{O}(\epsilon^{-2})$ for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
A Stochastic Composite Gradient Method with Incremental Variance Reduction
Proposes an incremental variance-reduced stochastic gradient method for minimizing smooth nonconvex composite functions that achieves optimal first-order complexity rates.
-
Accelerating Mini-batch SARAH by Step Size Rules
MB-SARAH-RBB uses a random Barzilai-Borwein step size to accelerate mini-batch SARAH, with a linear convergence proof and improved complexity for strongly convex objectives.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.