Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.
time- resolved
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.
citing papers explorer
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.