AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
arXiv preprint arXiv:2603.09697 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 2polarities
background 2representative citing papers
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
PolarAdamW disentangles spectral control from gauge-equivariance in matrix optimizers, with experiments demonstrating their distinct roles on standard versus symmetry-aware neural networks.
Pro-KLShampoo projects KL-Shampoo preconditioners to a spike-and-flat parametric form on an r-dimensional subspace and recovers the full algebraic preconditioner via orthogonalization, outperforming KL-Shampoo on GPT-2 and LLaMA pre-training scales.
citing papers explorer
-
AMUSE: Anytime Muon with Stable Gradient Evaluation
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
-
Dimension-Free Saddle-Point Escape in Muon
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
-
PolarAdamW: Disentangling Spectral Control and Schur Gauge-Equivariance in Matrix Optimisation
PolarAdamW disentangles spectral control from gauge-equivariance in matrix optimizers, with experiments demonstrating their distinct roles on standard versus symmetry-aware neural networks.
-
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
Pro-KLShampoo projects KL-Shampoo preconditioners to a spike-and-flat parametric form on an r-dimensional subspace and recovers the full algebraic preconditioner via orthogonalization, outperforming KL-Shampoo on GPT-2 and LLaMA pre-training scales.
- Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers