Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.
time- resolved
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
citing papers explorer
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.
-
Mastering Diverse Domains through World Models
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.