{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:XL4BQDF2V5BVJ23PUX3YMIOY3I","short_pith_number":"pith:XL4BQDF2","schema_version":"1.0","canonical_sha256":"baf8180cbaaf4354eb6fa5f78621d8da28d3455fd710fefd1cd6db22c9a8caae","source":{"kind":"arxiv","id":"2605.13079","version":1},"attestation_state":"computed","paper":{"title":"Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Muon orthogonalizes its momentum buffer to flatten the gradient spectrum, allowing stable learning rates scaled to the average singular value rather than the largest.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"James Bailey, Minh-Phuc Truong, Tien-Phat Nguyen, Trung Le, Truong Nguyen, Tuc Nguyen","submitted_at":"2026-05-13T06:54:01Z","abstract_excerpt":"Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and converge faster than other optimizers, but why? We show that the mechanism is spectral flattening, and develop two results around it. First, we prove that Muon's maximal stable step size scales with the average singular value of the gradient rather than the largest, which bottlenecks standard gradient descent. Second, we recast Muon as a preconditioned gradient method and show, under a Kronecker-fac"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.13079","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-13T06:54:01Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"20a75af27bd355c3ec2ba041878db76ece16dd934d3bf27eddedba55f2d63068","abstract_canon_sha256":"825f1b6a3b0309227215f80f71f1595fd868be0beea38e553c8fa38e294d3f87"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T03:08:58.726511Z","signature_b64":"ebO4Q/SjHX/nF0LkYeTojuyS9xH5/fod/E+RSlib1fUMSifF2SOBXMLjqURnbL82o4EXdRgZC4fvPV9qf5boAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"baf8180cbaaf4354eb6fa5f78621d8da28d3455fd710fefd1cd6db22c9a8caae","last_reissued_at":"2026-05-18T03:08:58.726066Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T03:08:58.726066Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Muon orthogonalizes its momentum buffer to flatten the gradient spectrum, allowing stable learning rates scaled to the average singular value rather than the largest.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"James Bailey, Minh-Phuc Truong, Tien-Phat Nguyen, Trung Le, Truong Nguyen, Tuc Nguyen","submitted_at":"2026-05-13T06:54:01Z","abstract_excerpt":"Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and converge faster than other optimizers, but why? We show that the mechanism is spectral flattening, and develop two results around it. First, we prove that Muon's maximal stable step size scales with the average singular value of the gradient rather than the largest, which bottlenecks standard gradient descent. Second, we recast Muon as a preconditioned gradient method and show, under a Kronecker-fac"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that Muon's maximal stable step size scales with the average singular value of the gradient rather than the largest, which bottlenecks standard gradient descent.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The improvement in effective convergence factor is shown under a Kronecker-factored curvature model for the loss landscape.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Muon achieves faster convergence and larger stable learning rates by flattening the singular value spectrum of the momentum buffer through orthogonalization, scaling step size with average rather than maximum singular values.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Muon orthogonalizes its momentum buffer to flatten the gradient spectrum, allowing stable learning rates scaled to the average singular value rather than the largest.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e0706e3668d3d12db3e02d8c4a5cc338bc90dd20ea17a562235b87a86e53894c"},"source":{"id":"2605.13079","kind":"arxiv","version":1},"verdict":{"id":"009bef9f-4997-4403-972e-ed6eb782aa5a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:50:59.028164Z","strongest_claim":"We prove that Muon's maximal stable step size scales with the average singular value of the gradient rather than the largest, which bottlenecks standard gradient descent.","one_line_summary":"Muon achieves faster convergence and larger stable learning rates by flattening the singular value spectrum of the momentum buffer through orthogonalization, scaling step size with average rather than maximum singular values.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The improvement in effective convergence factor is shown under a Kronecker-factored curvature model for the loss landscape.","pith_extraction_headline":"Muon orthogonalizes its momentum buffer to flatten the gradient spectrum, allowing stable learning rates scaled to the average singular value rather than the largest."},"references":{"count":15,"sample":[{"doi":"","year":null,"title":"Old Optimizer, New Norm: An Anthology","work_id":"c0089db6-7349-44fd-b103-0d7b36142bab","ref_index":1,"cited_arxiv_id":"2409.20325","is_internal_anchor":true},{"doi":"","year":null,"title":"Muon optimizes under spectral norm constraints","work_id":"11632e07-3f8e-4967-8533-73db12242eee","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"An exploration of non-euclidean gradient descent: Muon and its many variants.arXiv preprint arXiv:2510.09827","work_id":"3ef55068-cafa-489b-aeb3-3aac27c50399","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2512.04299 , year =","work_id":"2b29279f-97b1-446c-a164-5217d6d534c3","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Effective quantization of muon optimizer states.arXiv preprint arXiv:2509.23106, 2025","work_id":"301183e6-ba1d-40e5-b404-32f503c85587","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":15,"snapshot_sha256":"81e9c766ecd6da76550588c7b530a42915f4bfcff1b40c91d9d304b6495b96b8","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.13079","created_at":"2026-05-18T03:08:58.726126+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.13079v1","created_at":"2026-05-18T03:08:58.726126+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.13079","created_at":"2026-05-18T03:08:58.726126+00:00"},{"alias_kind":"pith_short_12","alias_value":"XL4BQDF2V5BV","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"XL4BQDF2V5BVJ23P","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"XL4BQDF2","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I","json":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I.json","graph_json":"https://pith.science/api/pith-number/XL4BQDF2V5BVJ23PUX3YMIOY3I/graph.json","events_json":"https://pith.science/api/pith-number/XL4BQDF2V5BVJ23PUX3YMIOY3I/events.json","paper":"https://pith.science/paper/XL4BQDF2"},"agent_actions":{"view_html":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I","download_json":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I.json","view_paper":"https://pith.science/paper/XL4BQDF2","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.13079&json=true","fetch_graph":"https://pith.science/api/pith-number/XL4BQDF2V5BVJ23PUX3YMIOY3I/graph.json","fetch_events":"https://pith.science/api/pith-number/XL4BQDF2V5BVJ23PUX3YMIOY3I/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I/action/timestamp_anchor","attest_storage":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I/action/storage_attestation","attest_author":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I/action/author_attestation","sign_citation":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I/action/citation_signature","submit_replication":"https://pith.science/pith/XL4BQDF2V5BVJ23PUX3YMIOY3I/action/replication_record"}},"created_at":"2026-05-18T03:08:58.726126+00:00","updated_at":"2026-05-18T03:08:58.726126+00:00"}