Variance Reduction for Distributed Stochastic Gradient Descent

Gavin Taylor; Soham De; Tom Goldstein

arxiv: 1512.01708 · v2 · pith:HUHHKVCQnew · submitted 2015-12-05 · 💻 cs.LG · cs.DC· math.OC· stat.ML

Variance Reduction for Distributed Stochastic Gradient Descent

Soham De , Gavin Taylor , Tom Goldstein This is my paper

classification 💻 cs.LG cs.DCmath.OCstat.ML

keywords methodsdistributedgradientstochasticvariancereductionalgorithmsdescent

0 comments

read the original abstract

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical distributed settings. In this paper, we propose a variance reduction method, called VR-lite, that does not require full gradient computations or extra storage. We explore distributed synchronous and asynchronous variants that are scalable and remain stable with low communication frequency. We empirically compare both the sequential and distributed algorithms to state-of-the-art stochastic optimization methods, and find that our proposed algorithms perform favorably to other stochastic methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Distributed Inexact Successive Convex Approximation ADMM: Analysis-Part I
math.OC 2019-07 unverdicted novelty 6.0

The paper develops two variants of a distributed inexact SCA-ADMM algorithm and proves first-order convergence rate guarantees under mild assumptions for non-convex problems with robustness to errors and delays.