Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.
Momentum-based variance reduction in non-convex sgd
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.
citing papers explorer
-
On the Convergence of Muon and Beyond
Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.
-
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.