AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
Connections between schedule-free optimizers, AdEMAMix, and accelerated sgd variants.arXiv preprint arXiv:2502.02431
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
ScheduleFree+ scales schedule-free learning to LLMs with fixes for large batches and models, outperforming Warmup-Stable-Decay schedules by up to 31% at 1000 tokens per parameter.
citing papers explorer
-
AMUSE: Anytime Muon with Stable Gradient Evaluation
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
-
Optimistic Dual Averaging Unifies Modern Optimizers
SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
-
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models
ScheduleFree+ scales schedule-free learning to LLMs with fixes for large batches and models, outperforming Warmup-Stable-Decay schedules by up to 31% at 1000 tokens per parameter.