pith. machine review for the scientific record. sign in

arxiv: 2601.07326 · v2 · submitted 2026-01-12 · 🧮 math.OC · cs.LG

Recognition: unknown

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Authors on Pith no claims yet
classification 🧮 math.OC cs.LG
keywords nablaconvergenceratefracshampoosqrtadamw-styleanalysis
0
0 comments X
read the original abstract

This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algorithm competition. Our analysis unifies one-sided and two-sided preconditioning and establishes the convergence rate $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$ measured by nuclear norm, where $K$ represents the iteration number, $(m,n)$ denotes the size of matrix parameters, and $C$ matches the constant in the optimal convergence rate of SGD. Theoretically, we have $\|\nabla f(X)\|_F\leq \|\nabla f(X)\|_*\leq \sqrt{m+n}\|\nabla f(X)\|_F$, supporting that our convergence rate can be considered to be analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[\|\nabla f(X_k)\|_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case of $\|\nabla f(X)\|_*= \Theta(\sqrt{m+n})\|\nabla f(X)\|_F$.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Convergence Rate Analysis of SOAP with Arbitrary Orthogonal Projection Matrices

    math.OC 2026-04 unverdicted novelty 7.0

    SOAP and its generalizations with arbitrary orthogonal projections converge at a provable rate when the projections are conditionally independent of the current gradient.