Characterizes high-dimensional phase structure of momentum under sparse updates via closed-form second-moment dynamics, with regimes matching SGD, unstable, or heavy-ball depending on retention-to-learning timescale ratio.
The marginal value of momentum for small learning rate sgd
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.
Lower bounds establish that heavy-ball momentum extends the compute-efficient batch-size window by sqrt(kappa) over SGD in linear regression, with accelerated SGD showing spectrum-dependent CE-serial runtime tradeoffs.
Classical momentum acceleration in mini-batch SGD for quadratics is proportional to batch size up to saturation, enabling perfect parallelization under minimal noise assumptions.
citing papers explorer
-
Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions
Characterizes high-dimensional phase structure of momentum under sparse updates via closed-form second-moment dynamics, with regimes matching SGD, unstable, or heavy-ball depending on retention-to-learning timescale ratio.