A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

· 2026 · cs.LG · arXiv 2604.17423

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

A unified framework for first-order optimization algorithms fornonconvex unconstrained optimization is proposed that uses adaptivelypreconditioned gradients and includes popular methods such as full anddiagonal AdaGrad, AdaNorm, as well as adpative variants of Shampoo andMuon. This framework also allows combining heterogeneous geometriesacross different groups of variables while preserving a unifiedconvergence analysis. A fully stochastic global rate-of-convergenceanalysis is conducted for all methods in the framework, with andwithout two types of momentum, using reasonable assumptions on thevariance of the gradient oracle and without assuming boundedstochastic gradients or small enough stepsize.

representative citing papers

Stochastic convergence of parallel asynchronous adaptive first-order methods

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

Introduces a class of asynchronous adaptive first-order methods and establishes O(1/sqrt t) convergence (up to logs) for non-convex stochastic optimization under reasonable assumptions.

citing papers explorer

Showing 1 of 1 citing paper.

Stochastic convergence of parallel asynchronous adaptive first-order methods cs.AI · 2026-06-01 · unverdicted · none · ref 18 · internal anchor
Introduces a class of asynchronous adaptive first-order methods and establishes O(1/sqrt t) convergence (up to logs) for non-convex stochastic optimization under reasonable assumptions.

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

fields

years

verdicts

representative citing papers

citing papers explorer