AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
International Conference on Learning Representations , year=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
SGD stabilizes sharpness below 2/η with equilibrium gap ΔS = η β σ_u²/(4α) due to noise-enhanced stochastic self-stabilization.
citing papers explorer
-
AMUSE: Anytime Muon with Stable Gradient Evaluation
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
-
Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach
Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.
-
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
-
SGD at the Edge of Stability: The Stochastic Sharpness Gap
SGD stabilizes sharpness below 2/η with equilibrium gap ΔS = η β σ_u²/(4α) due to noise-enhanced stochastic self-stabilization.