{"paper":{"title":"Root Mean Square Layer Normalization","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.","cross_cats":["cs.CL","stat.ML"],"primary_cat":"cs.LG","authors_text":"Biao Zhang, Rico Sennrich","submitted_at":"2019-10-16T16:44:22Z","abstract_excerpt":"Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight matrix. However, the computational overhead introduced by LayerNorm makes these improvements expensive and significantly slows the underlying network, e.g. RNN in particular. In this paper, we hypothesize that re-centering invariance in LayerNorm is dispensable and propose root mean square layer normalization, or RMSNorm. RMSNorm regularizes the summed inpu"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"Re-centering invariance in LayerNorm is dispensable for the stabilization and convergence benefits the method provides.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"500d62aac33e3cac7d4290625de31760a2485fa97f9a7fe5b8711432f5c0f68b"},"source":{"id":"1910.07467","kind":"arxiv","version":1},"verdict":{"id":"428befb3-8470-42df-b5d6-636fd531defa","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T18:35:04.596165Z","strongest_claim":"Extensive experiments on several tasks using diverse network architectures show that RMSNorm achieves comparable performance against LayerNorm but reduces the running time by 7%~64% on different models.","one_line_summary":"RMSNorm delivers re-scaling invariance and comparable accuracy to LayerNorm while cutting computation by skipping mean subtraction, yielding 7-64% runtime reductions across tested models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"Re-centering invariance in LayerNorm is dispensable for the stabilization and convergence benefits the method provides.","pith_extraction_headline":""},"references":{"count":37,"sample":[{"doi":"","year":2016,"title":"Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng","work_id":"7d508c46-4bf6-4a21-a869-a2047d225905","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks","work_id":"c9de45d0-ed57-47a0-ac8d-f52a961173b4","ref_index":2,"cited_arxiv_id":"1603.01431","is_internal_anchor":true},{"doi":"","year":2016,"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","ref_index":3,"cited_arxiv_id":"1607.06450","is_internal_anchor":true},{"doi":"","year":2014,"title":"Neural Machine Translation by Jointly Learning to Align and Translate","work_id":"d831e763-d530-4029-a65c-ac595d82cb2a","ref_index":4,"cited_arxiv_id":"1409.0473","is_internal_anchor":true},{"doi":"","year":2018,"title":"Understanding batch normalization","work_id":"e66d9f22-11a6-481d-b49f-73514654db8e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":37,"snapshot_sha256":"820b5a6f211d3171f5b53ca84896d0d7c098f19d6e10fb9c703b7f327465d1e0","internal_anchors":16},"formal_canon":{"evidence_count":2,"snapshot_sha256":"04c43cf71823bd0f60267fd06334181c3788186a5fdf182cb393959fc49b918f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}