A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

Vineet Gupta , Tomer Koren , Yoram Singer

Authors on Pith no claims yet

classification 💻 cs.LG math.OCstat.ML

keywords algorithmsonlineoptimizationadaptiveframeworkstochasticmethodsregularization

read the original abstract

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, including the AdaGrad and Online Newton Step algorithms as well as their diagonal versions. As a result, we obtain new convergence proofs for these algorithms that are substantially simpler than previous analyses. Our framework also exposes the rationale for the different preconditioned updates used in common stochastic optimization methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stochastic Auto-conditioned Fast Gradient Methods with Optimal Rates
math.OC 2026-04 unverdicted novelty 8.0

Stochastic AC-FGM achieves optimal O(1/√ε) iteration complexity and O(1/ε²) sample complexity while being fully adaptive to smoothness, horizon, and noise under bounded conditional variance.
A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo
cs.LG 2026-04 unverdicted novelty 7.0

A unified stochastic convergence theory is developed for adaptive preconditioned first-order methods including AdaGrad variants, Shampoo, and Muon in nonconvex optimization.
Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization
math.OC 2026-04 unverdicted novelty 7.0

AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
Muon Does Not Converge on Convex Lipschitz Functions
cs.LG 2026-05 unverdicted novelty 6.0

Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.