Muon in matrix factorization avoids saddle-to-saddle dynamics, learns top modes simultaneously, conserves sqrt(P^TP) - sqrt(Q^TQ), and reaches balanced solutions from small initialization with a two-step alignment schedule.
arXiv preprint arXiv:2003.06340 , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Muon learns balanced solutions in matrix factorization without slow saddle-to-saddle dynamics
Muon in matrix factorization avoids saddle-to-saddle dynamics, learns top modes simultaneously, conserves sqrt(P^TP) - sqrt(Q^TQ), and reaches balanced solutions from small initialization with a two-step alignment schedule.