Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.