Adam achieves a δ^{-1/2} high-probability convergence rate while SGD requires at least δ^{-1} due to second-moment normalization, established via stopping-time/martingale analysis under bounded variance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
Adam achieves a δ^{-1/2} high-probability convergence rate while SGD requires at least δ^{-1} due to second-moment normalization, established via stopping-time/martingale analysis under bounded variance.