Shampoo: Preconditioned Stochastic Tensor Optimization

Tomer Koren; Vineet Gupta; Yoram Singer

arxiv: 1802.09568 · v2 · pith:FRKL7WXZnew · submitted 2018-02-26 · 💻 cs.LG · math.OC· stat.ML

Shampoo: Preconditioned Stochastic Tensor Optimization

Vineet Gupta , Tomer Koren , Yoram Singer This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords shampoooptimizationpreconditioningstochasticgradientmatricesmethodspreconditioned

0 comments

read the original abstract

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo's runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
cs.DC 2026-05 unverdicted novelty 6.0

Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-stalenes...
Dimension-Free Saddle-Point Escape in Muon
cs.LG 2026-05 unverdicted novelty 6.0

Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.