Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.
If supu∈Sd−1,u∈F E exp |u⊤X|2/K2 F | F ≤2a.s , then E[XX⊤ | F]⪯K 2 F Id a.s and henceE[∥X∥ 2 2 | F]≤dK 2 F a.s Proof.We first prove the scalar case
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
Momentum SGD incurs a provable drift-amplification penalty in nonstationary stochastic optimization that makes it worse than vanilla SGD in drift-dominated regimes, confirmed by finite-time upper bounds and minimax lower bounds under gradient-variation constraints.