pith:ZGN6K3U5
MUON+: Towards More Effective Muon via One Additional Normalization Step for LLM Pre-training
Muon+ adds one normalization step after polar orthogonalization to fix norm imbalance and improve LLM pre-training over Muon.
arxiv:2602.21545 v3 · 2026-02-25 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZGN6K3U52HN3B5SHNOS5FVRPB3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
Across pre-training experiments on GPT and LLaMA models from 60M to 7B parameters, spanning both compute-optimal budgets and extended token-to-parameter ratios up to approximately 200, Muon+ consistently outperforms Muon in terms of training and validation perplexity, leading to significant overall pre-training speedup.
That the post-polar norm imbalance identified in the blockwise descent analysis is the dominant practical limitation of Muon and that the added normalization step corrects it without introducing offsetting drawbacks in other regimes or model scales.
Muon+ adds one normalization step after polar orthogonalization in the Muon optimizer, yielding lower training and validation perplexity and faster pre-training across 60M-7B models.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:15.957870Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c99be56e9dd1dbb0f6476ba5d2d62f0eec00bc1013a01f0860b0619bb38cbb6b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZGN6K3U52HN3B5SHNOS5FVRPB3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c99be56e9dd1dbb0f6476ba5d2d62f0eec00bc1013a01f0860b0619bb38cbb6b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f823f07862b2e54fe74fc0c27266a6718f86f5f2dbc32bfe51f802fc24c17b95",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-25T04:04:00Z",
"title_canon_sha256": "3265ff483205087d87e92ca09328bbc5ef60fcad494a026759207b403787db37"
},
"schema_version": "1.0",
"source": {
"id": "2602.21545",
"kind": "arxiv",
"version": 3
}
}