LionMuon alternates Lion sign steps and Muon spectral steps with shared dual-EMA momentum to match Lion memory while outperforming both at P=2 on 124M-720M models, backed by heavy-tailed complexity bounds that predict the optimal period.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 2polarities
background 2representative citing papers
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.
citing papers explorer
-
LionMuon: Alternating Spectral and Sign Descent for Efficient Training
LionMuon alternates Lion sign steps and Muon spectral steps with shared dual-EMA momentum to match Lion memory while outperforming both at P=2 on 124M-720M models, backed by heavy-tailed complexity bounds that predict the optimal period.
-
Dimension-Free Saddle-Point Escape in Muon
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
-
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.
-
Can Muon Fine-tune Adam-Pretrained Models?
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.
- Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers