pith. sign in
Pith Number

pith:FBA4ZOPE

pith:2026:FBA4ZOPERMBYKBNME7KYD25AIQ
not attested not anchored not stored refs resolved

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

Ruochen Jin, Shenyang Deng, Shuhua Yu, Tianyu Pang, Yaoqing Yang, Zhuoli Ouyang, Zihang Liu

RMNP replaces Newton-Schulz orthogonalization with row-wise L2 normalization to match Muon performance at linear cost.

arxiv:2603.20527 v3 · 2026-03-20 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FBA4ZOPERMBYKBNME7KYD25AIQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

RMNP delivers competitive optimization performance compared with Muon while substantially reducing preconditioning wall-clock time. We establish convergence guarantees for RMNP in the non-convex setting that match recent results for Muon optimizers, achieving the minimax optimal complexity.

C2weakest assumption

The substitution is justified by the empirically observed diagonal block structure of the Transformer layerwise Hessian together with the claim that orthogonalization and row-wise (on input dim) ℓ2 normalization are asymptotically equivalent for transformers.

C3one line summary

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.

References

48 extracted · 48 resolved · 7 Pith anchors

[1] Adaptive subgradient methods for online learning and stochastic optimization.Journal of Machine Learning Research, 12(61):2121–2159, 2011 2011
[2] Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.COURSERA: Neural networks for machine learning, 4(2):26–31, 2012 2012
[3] Adam: A Method for Stochastic Optimization 2014 · arXiv:1412.6980
[4] Decoupled weight decay regularization 2019
[5] The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton 2025 · arXiv:2510.09378

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-05-18T02:45:04.674774Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2841ccb9e48b038505ac27d581eba04438022926ba78350e921ded0a68330cb6

Aliases

arxiv: 2603.20527 · arxiv_version: 2603.20527v3 · doi: 10.48550/arxiv.2603.20527 · pith_short_12: FBA4ZOPERMBY · pith_short_16: FBA4ZOPERMBYKBNM · pith_short_8: FBA4ZOPE
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FBA4ZOPERMBYKBNME7KYD25AIQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2841ccb9e48b038505ac27d581eba04438022926ba78350e921ded0a68330cb6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e92f3bf1d9c42d393e8d8a72ba71b92e73aa30dc0aa1e26c2d7b46caa8d26031",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-20T21:55:28Z",
    "title_canon_sha256": "24d13c333a5267bdfede30bb7ec0da9457bde5b3addf8f7c03aa4b45f0c37341"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.20527",
    "kind": "arxiv",
    "version": 3
  }
}