The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.
DBS-Adam, which scales learning rates by batch difficulty from EMA gradient norms and loss, reaches 95.22% accuracy on Bi-LSTM accident severity prediction and shows statistically significant precision gains over AMSGrad, AdamW and AdaBound.
citing papers explorer
-
Adam Converges in Nonsmooth Nonconvex Optimization
The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
-
A Stochastic--Geometric Theory of Scaling Laws in Grokking
A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.
-
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction
DBS-Adam, which scales learning rates by batch difficulty from EMA gradient norms and loss, reaches 95.22% accuracy on Bi-LSTM accident severity prediction and shows statistically significant precision gains over AMSGrad, AdamW and AdaBound.