Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

Samuel Horv\'ath , Dmitry Kovalev , Konstantin Mishchenko , Sebastian Stich , Peter Richt\'arik

Authors on Pith no claims yet

classification 🧮 math.OC

keywords kappaomegavariancedistributedepsilonschemesarbitraryconvergence

read the original abstract

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g.\ quantize or sparsify) the gradients, thereby introducing additional variance $\omega \geq 1$ that might slow down convergence. For strongly convex functions with condition number $\kappa$ distributed among $n$ machines, we (i) give a scheme that converges in $\mathcal{O}((\kappa + \kappa \frac{\omega}{n} + \omega)$ $\log (1/\epsilon))$ steps to a neighborhood of the optimal solution. For objective functions with a finite-sum structure, each worker having less than $m$ components, we (ii) present novel variance reduced schemes that converge in $\mathcal{O}((\kappa + \kappa \frac{\omega}{n} + \omega + m)\log(1/\epsilon))$ steps to arbitrary accuracy $\epsilon > 0$. These are the first methods that achieve linear convergence for arbitrary quantized updates. We also (iii) give analysis for the weakly convex and non-convex cases and (iv) verify in experiments that our novel variance reduced schemes are more efficient than the baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FedSLoP: Memory-Efficient Federated Learning with Low-Rank Gradient Projection
cs.LG 2026-04 unverdicted novelty 5.0

FedSLoP reduces communication and memory costs in federated learning through stochastic low-rank gradient projections, with a nonconvex convergence rate of O(1/sqrt(NT)) and competitive accuracy on heterogeneous MNIST data.