pith. machine review for the scientific record. sign in

arxiv: 1706.06569 · v1 · submitted 2017-06-20 · 💻 cs.LG · math.OC· stat.ML

Recognition: unknown

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

Authors on Pith no claims yet
classification 💻 cs.LG math.OCstat.ML
keywords algorithmsonlineoptimizationadaptiveframeworkstochasticmethodsregularization
0
0 comments X
read the original abstract

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, including the AdaGrad and Online Newton Step algorithms as well as their diagonal versions. As a result, we obtain new convergence proofs for these algorithms that are substantially simpler than previous analyses. Our framework also exposes the rationale for the different preconditioned updates used in common stochastic optimization methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stochastic Auto-conditioned Fast Gradient Methods with Optimal Rates

    math.OC 2026-04 unverdicted novelty 8.0

    Stochastic AC-FGM achieves optimal O(1/√ε) iteration complexity and O(1/ε²) sample complexity while being fully adaptive to smoothness, horizon, and noise under bounded conditional variance.

  2. A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

    cs.LG 2026-04 unverdicted novelty 7.0

    A unified stochastic convergence theory is developed for adaptive preconditioned first-order methods including AdaGrad variants, Shampoo, and Muon in nonconvex optimization.

  3. Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

    math.OC 2026-04 unverdicted novelty 7.0

    AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.

  4. Muon Does Not Converge on Convex Lipschitz Functions

    cs.LG 2026-05 unverdicted novelty 6.0

    Muon does not converge on convex Lipschitz functions regardless of learning rate, while error feedback restores theoretical convergence but degrades performance on CIFAR-10 and nanoGPT tasks.