LiMuon: Light and fast Muon optimizer for large models.arXiv preprint arXiv:2509.14562

· 2025 · arXiv 2509.14562

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

LionMuon alternates Lion sign steps and Muon spectral steps with shared dual-EMA momentum to match Lion memory while outperforming both at P=2 on 124M-720M models, backed by heavy-tailed complexity bounds that predict the optimal period.

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.

Communication-Efficient Gluon in Federated Learning

cs.LG · 2026-04-12 · unverdicted · novelty 5.0

Compressed Gluon variants using unbiased/contraction compressors and SARAH-style variance reduction achieve convergence guarantees and lower communication costs in federated learning under layer-wise smoothness.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18

citing papers explorer

Showing 6 of 6 citing papers.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds cs.LG · 2026-05-07 · unverdicted · none · ref 14 · internal anchor
SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
LionMuon: Alternating Spectral and Sign Descent for Efficient Training cs.LG · 2026-05-19 · unverdicted · none · ref 12 · internal anchor
LionMuon alternates Lion sign steps and Muon spectral steps with shared dual-EMA momentum to match Lion memory while outperforming both at P=2 on 124M-720M models, backed by heavy-tailed complexity bounds that predict the optimal period.
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration cs.LG · 2026-03-30 · unverdicted · none · ref 33 · internal anchor
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models cs.LG · 2026-05-19 · unverdicted · none · ref 12 · internal anchor
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
Communication-Efficient Gluon in Federated Learning cs.LG · 2026-04-12 · unverdicted · none · ref 12 · internal anchor
Compressed Gluon variants using unbiased/contraction compressors and SARAH-style variance reduction achieve convergence guarantees and lower communication costs in federated learning under layer-wise smoothness.
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC · 2026-05-18 · unreviewed · ref 71 · internal anchor

LiMuon: Light and fast Muon optimizer for large models.arXiv preprint arXiv:2509.14562

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer