On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Francesco Orabona; Xiaoyu Li

arxiv: 1805.08114 · v3 · pith:TMB6WMQVnew · submitted 2018-05-21 · 📊 stat.ML · cs.LG· math.OC

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Xiaoyu Li , Francesco Orabona This is my paper

classification 📊 stat.ML cs.LGmath.OC

keywords stepsizesnon-convexstochasticadagradadaptivechoiceconvergenceconvex

0 comments

read the original abstract

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between $O(1/T)$ and $O(1/\sqrt{T})$, up to logarithmic terms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adaptive Federated Optimization
cs.LG 2020-02 unverdicted novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics
cs.LG 2022-12 unverdicted novelty 2.0

A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.