Introduces Distance-Adaptive Muon, Scale-Calibrated Muon, and Distance-Free Muon with stationarity and O(1/T) objective-gap guarantees, shown to match or improve fixed-scale Muon on GPT-124M and ViT-Tiny models.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Distance-Aware Muon: Adaptive Step Scaling for Normalized Optimization
Introduces Distance-Adaptive Muon, Scale-Calibrated Muon, and Distance-Free Muon with stationarity and O(1/T) objective-gap guarantees, shown to match or improve fixed-scale Muon on GPT-124M and ViT-Tiny models.