Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
read the original abstract
In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in deep learning. Moving beyond the spectral norm that underlies the Muon update, we leverage the duals of the Ky Fan norms to introduce the Fanion family of linear minimization oracle (LMO) algorithms, which are closely related to Muon, $\nu$-SAM, and Dion. Staying inside the LMO, we construct the families of F-Fanions and S-Fanions, whose updates are convex combinations of the updates of Fanions and Normalized SGD or SignSGD, respectively. The most promising algorithms in these families are F-Muon and S-Muon. By conducting an extensive empirical study of all three algorithm families across a wide range of tasks and settings, we demonstrate that F-Muon and S-Muon consistently match Muon's performance, while outperforming Muon on a synthetic smooth convex problem.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
Proposes equivariant optimizers matched to the symmetry groups of embeddings, SwiGLU projections and MoE routers, with experiments showing consistent gains over AdamW on language model pre-training.
-
Anytime Training with Schedule-Free Spectral Optimization
SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.