MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Chang, D · 2026 · cs.LG · arXiv 2603.28254

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions typically either rescale updates after orthogonalization or use heavier whitening-based preconditioners before it. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon with three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). By rebalancing the momentum matrix before finite-step Newton--Schulz orthogonalization, {\method} improves the geometry seen by orthogonalization. We show that finite-step orthogonalization is governed by the input spectrum, especially stable rank and condition number, and that row/column normalization acts as a zeroth-order surrogate for whitening. For hidden matrix weights, R is the default variant. Theoretically, {\method} (R) retains the standard $\widetilde{\mathcal O}(T^{-1/4})$ Muon-type nonconvex stationarity guarantee with decoupled weight decay and a horizon-free diminishing learning-rate schedule, and extends it to finite-step NS5 up to an explicit inexactness constant. In LLaMA2 pretraining on C4, {\method} (R) consistently outperforms Muon on 130M, 350M, and 1B models, with faster convergence and lower validation perplexity. The code is available at the \href{https://github.com/MaeChd/muon-eq}{MuonEq codebase}.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Aurora: A Leverage-Aware Spectral Optimizer

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Aurora is a leverage-aware spectral optimizer that enforces uniform row norms in matrix updates while preserving Muon's polar geometry, outperforming Muon and achieving SOTA among spectral methods on modded-nanoGPT.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

PolarAdamW disentangles spectral control from gauge-equivariance in matrix optimizers, with experiments demonstrating their distinct roles on standard versus symmetry-aware neural networks.

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

cs.AI · 2026-06-01 · unverdicted · novelty 5.0

Compression of LLMs often decouples accuracy from uncertainty, with larger models absorbing the effect better and inflation occurring in a threshold-like manner.

A Note on Stability for Orthogonalized Matrix Momentum with Client Sampling

cs.LG · 2026-06-01 · unverdicted · novelty 4.0

Derives finite-round upper-tail guarantee on population-empirical gap for client-sampled orthogonalized matrix momentum under heterogeneous data, with Lipschitz condition on the orthogonalizer.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC · 2026-05-18 · unverdicted · none · ref 25 · 2 links · internal anchor
Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer