A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

Peter Richt\'arik; Zhize Li

arxiv: 2006.07013 · v1 · pith:6E27GWP3new · submitted 2020-06-12 · 🧮 math.OC · cs.DS· cs.LG· stat.ML

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

Zhize Li , Peter Richt\'arik This is my paper

classification 🧮 math.OC cs.DScs.LGstat.ML

keywords methodsunifiedanalysisconvergencenonconvexassumptiongradientlarge

0 comments

read the original abstract

In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. We show that all methods captured by these two frameworks also satisfy our unified assumption. Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Boosted Stochastic Frank-Wolfe for Constrained Nonconvex Optimization
math.OC 2026-05 unverdicted novelty 7.0

A new step size rule lets boosted stochastic Frank-Wolfe match ordinary stochastic Frank-Wolfe rates on nonconvex and quasar-convex problems and deliver faster empirical convergence on sparse logistic regression and q...