Pith Number

pith:Y3UJ36HW

pith:2024:Y3UJ36HWYODK3SBIVQAWN3ZXCB

not attested not anchored not stored refs resolved

SOAP: Improving and Stabilizing Shampoo using Adam

David Brandfonbrener, Depen Morwani, Itai Shapira, Lucas Janson, Mujin Kwun, Nikhil Vyas, Rosie Zhao, Sham Kakade

SOAP runs Adam inside Shampoo's eigenbasis to cut large-batch iterations by over 40 percent versus AdamW.

arxiv:2409.11321 v2 · 2024-09-17 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{Y3UJ36HWYODK3SBIVQAWN3ZXCB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

In the large-batch regime, SOAP reduces the number of iterations by over 40% and wall-clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo.

C2weakest assumption

The formal equivalence between 1/2-power Shampoo and Adafactor holds only inside the current eigenbasis; the paper assumes that keeping this basis fixed for many steps does not materially degrade the preconditioning quality, an assumption validated only empirically on the tested model sizes.

C3one line summary

SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.

References

14 extracted · 14 resolved · 1 Pith anchors

[1] URLhttps://doi.org/10.48550/arXiv 2024 · doi:10.48550/arxiv

[3] (360m) We sweep over the cross product of best 3 learning rates and β1 ∈ {0.9, 0.95, 0.99}

[4] The last two of the sweeps did not yield any benefit for the 360m model with 2m batch size hence we only sweep over learning rate for the 660m model with 2m batch size

[6] (360m) We sweep over over the cross product of best 3 learning rates from above and ϵshampoo ∈ {1e−11, 1e−12, 1e−13}

[7] (360m) We sweep over over the cross product of best 3 learning rates from above and βshampoo ∈ {.9, .95, .975}

Formal links

2 machine-checked theorem links

Cited by

35 papers in Pith

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Coupling-Robust Accuracy in Multiphysics Physics Informed Neural Networks via Kronecker-Preconditioned Optimization

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

GradPower: Powering Gradients for Faster Language Model Pre-Training

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

Receipt and verification

First computed	2026-05-17T23:39:05.179384Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c6e89df8f6c386adc828ac0166ef37105c60b407da96a0de22491f79bd188883

Aliases

arxiv: 2409.11321 · arxiv_version: 2409.11321v2 · doi: 10.48550/arxiv.2409.11321 · pith_short_12: Y3UJ36HWYODK · pith_short_16: Y3UJ36HWYODK3SBI · pith_short_8: Y3UJ36HW

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y3UJ36HWYODK3SBIVQAWN3ZXCB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c6e89df8f6c386adc828ac0166ef37105c60b407da96a0de22491f79bd188883

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "10304c10072863fbd852c8c755150cdf3e5f2321c96b18b2f4e2df8a7fcfe0d2",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-09-17T16:18:05Z",
    "title_canon_sha256": "19f51ed35d18fedb5acd0bd7125373799c5f6753272115eb11a91ac74cc69dcb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.11321",
    "kind": "arxiv",
    "version": 2
  }
}