pith. sign in
Pith Number

pith:Y3UJ36HW

pith:2024:Y3UJ36HWYODK3SBIVQAWN3ZXCB
not attested not anchored not stored refs resolved

SOAP: Improving and Stabilizing Shampoo using Adam

David Brandfonbrener, Depen Morwani, Itai Shapira, Lucas Janson, Mujin Kwun, Nikhil Vyas, Rosie Zhao, Sham Kakade

SOAP runs Adam inside Shampoo's eigenbasis to cut large-batch iterations by over 40 percent versus AdamW.

arxiv:2409.11321 v2 · 2024-09-17 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y3UJ36HWYODK3SBIVQAWN3ZXCB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

In the large-batch regime, SOAP reduces the number of iterations by over 40% and wall-clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo.

C2weakest assumption

The formal equivalence between 1/2-power Shampoo and Adafactor holds only inside the current eigenbasis; the paper assumes that keeping this basis fixed for many steps does not materially degrade the preconditioning quality, an assumption validated only empirically on the tested model sizes.

C3one line summary

SOAP runs Adam in the eigenbasis of Shampoo's preconditioner, cutting iterations by over 40% versus AdamW on 360M-660M language models while adding only one hyperparameter.

References

14 extracted · 14 resolved · 1 Pith anchors

[1] URLhttps://doi.org/10.48550/arXiv 2024 · doi:10.48550/arxiv
[3] (360m) We sweep over the cross product of best 3 learning rates and β1 ∈ {0.9, 0.95, 0.99}
[4] The last two of the sweeps did not yield any benefit for the 360m model with 2m batch size hence we only sweep over learning rate for the 660m model with 2m batch size
[6] (360m) We sweep over over the cross product of best 3 learning rates from above and ϵshampoo ∈ {1e−11, 1e−12, 1e−13}
[7] (360m) We sweep over over the cross product of best 3 learning rates from above and βshampoo ∈ {.9, .95, .975}

Formal links

2 machine-checked theorem links

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:05.179384Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c6e89df8f6c386adc828ac0166ef37105c60b407da96a0de22491f79bd188883

Aliases

arxiv: 2409.11321 · arxiv_version: 2409.11321v2 · doi: 10.48550/arxiv.2409.11321 · pith_short_12: Y3UJ36HWYODK · pith_short_16: Y3UJ36HWYODK3SBI · pith_short_8: Y3UJ36HW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y3UJ36HWYODK3SBIVQAWN3ZXCB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c6e89df8f6c386adc828ac0166ef37105c60b407da96a0de22491f79bd188883
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "10304c10072863fbd852c8c755150cdf3e5f2321c96b18b2f4e2df8a7fcfe0d2",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-09-17T16:18:05Z",
    "title_canon_sha256": "19f51ed35d18fedb5acd0bd7125373799c5f6753272115eb11a91ac74cc69dcb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2409.11321",
    "kind": "arxiv",
    "version": 2
  }
}