Proposes equivariant optimizers matched to the symmetry groups of embeddings, SwiGLU projections and MoE routers, with experiments showing consistent gains over AdamW on language model pre-training.
Mano: Restriking manifold optimization for llm training
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 4polarities
background 4representative citing papers
Intrinsic Muon provides closed-form linear maximization oracles on multiple Riemannian matrix manifolds for unitarily invariant norms, with convergence rates depending only on manifold dimension or rank.
Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.
Fisher-information-guided margin refinement reorganizes the Voronoi tessellation of an LLM's latent space to raise median token margins by 28% with no benchmark degradation, though gains concentrate on high-frequency structural tokens.
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
Muon+ adds one normalization step after polar orthogonalization in the Muon optimizer, yielding lower training and validation perplexity and faster pre-training across 60M-7B models.
citing papers explorer
-
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
Proposes equivariant optimizers matched to the symmetry groups of embeddings, SwiGLU projections and MoE routers, with experiments showing consistent gains over AdamW on language model pre-training.
-
Intrinsic Muon: Spectral Optimization on Riemannian Matrix Manifolds
Intrinsic Muon provides closed-form linear maximization oracles on multiple Riemannian matrix manifolds for unitarily invariant norms, with convergence rates depending only on manifold dimension or rank.
-
Demystifying Manifold Constraints in LLM Pre-training
Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.
-
Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models
Fisher-information-guided margin refinement reorganizes the Voronoi tessellation of an LLM's latent space to raise median token margins by 28% with no benchmark degradation, though gains concentrate on high-frequency structural tokens.
-
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
-
MUON+: Towards More Effective Muon via One Additional Normalization Step for LLM Pre-training
Muon+ adds one normalization step after polar orthogonalization in the Muon optimizer, yielding lower training and validation perplexity and faster pre-training across 60M-7B models.