SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.
SNOO: Step-k Nesterov outer optimizer-the surprising effectiveness of Nesterov momentum applied to pseudo- gradients.arXiv preprint arXiv:2510.15830
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Periodic outer-momentum restarts in two-phase optimizers exploit phase cancellation in a linearized NTK model to widen stable learning-rate and momentum ranges in language-model pretraining.
citing papers explorer
-
Optimistic Dual Averaging Unifies Modern Optimizers
SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.
-
Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
Periodic outer-momentum restarts in two-phase optimizers exploit phase cancellation in a linearized NTK model to widen stable learning-rate and momentum ranges in language-model pretraining.