pith. sign in
Pith Number

pith:ZGN6K3U5

pith:2026:ZGN6K3U52HN3B5SHNOS5FVRPB3
not attested not anchored not stored refs resolved

MUON+: Towards More Effective Muon via One Additional Normalization Step for LLM Pre-training

Liyan Tan, Ruijie Zhang, Yequan Zhao, Yupeng Su, Zhengyang Wang, Zheng Zhang, Ziyue Liu

Muon+ adds one normalization step after polar orthogonalization to fix norm imbalance and improve LLM pre-training over Muon.

arxiv:2602.21545 v3 · 2026-02-25 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ZGN6K3U52HN3B5SHNOS5FVRPB3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across pre-training experiments on GPT and LLaMA models from 60M to 7B parameters, spanning both compute-optimal budgets and extended token-to-parameter ratios up to approximately 200, Muon+ consistently outperforms Muon in terms of training and validation perplexity, leading to significant overall pre-training speedup.

C2weakest assumption

That the post-polar norm imbalance identified in the blockwise descent analysis is the dominant practical limitation of Muon and that the added normalization step corrects it without introducing offsetting drawbacks in other regimes or model scales.

C3one line summary

Muon+ adds one normalization step after polar orthogonalization in the Muon optimizer, yielding lower training and validation perplexity and faster pre-training across 60M-7B models.

References

40 extracted · 40 resolved · 14 Pith anchors

[1] GPT-4 Technical Report 2023 · arXiv:2303.08774
[2] The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm 2025 · arXiv:2505.16932
[3] J. Bernstein. Deriving muon, 2025 2025
[4] F. L. Cesista, Y . Jiacheng, and K. Jordan. Squeezing 1-2
[5] Kimi-Audio Technical Report 2025 · arXiv:2504.18425

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-17T23:39:15.957870Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c99be56e9dd1dbb0f6476ba5d2d62f0eec00bc1013a01f0860b0619bb38cbb6b

Aliases

arxiv: 2602.21545 · arxiv_version: 2602.21545v3 · doi: 10.48550/arxiv.2602.21545 · pith_short_12: ZGN6K3U52HN3 · pith_short_16: ZGN6K3U52HN3B5SH · pith_short_8: ZGN6K3U5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZGN6K3U52HN3B5SHNOS5FVRPB3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c99be56e9dd1dbb0f6476ba5d2d62f0eec00bc1013a01f0860b0619bb38cbb6b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f823f07862b2e54fe74fc0c27266a6718f86f5f2dbc32bfe51f802fc24c17b95",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-25T04:04:00Z",
    "title_canon_sha256": "3265ff483205087d87e92ca09328bbc5ef60fcad494a026759207b403787db37"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.21545",
    "kind": "arxiv",
    "version": 3
  }
}