Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,

· 2021 · DOI 10.1137/19m1263443

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Adam Converges in Nonsmooth Nonconvex Optimization

math.OC · 2026-06-21 · unverdicted · novelty 8.0

The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.

A Stochastic--Geometric Theory of Scaling Laws in Grokking

stat.ML · 2026-06-29 · unverdicted · novelty 6.0

A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.

Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction

cs.LG · 2026-05-14 · unverdicted · novelty 3.0

DBS-Adam, which scales learning rates by batch difficulty from EMA gradient norms and loss, reaches 95.22% accuracy on Bi-LSTM accident severity prediction and shows statistically significant precision gains over AMSGrad, AdamW and AdaBound.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Adam Converges in Nonsmooth Nonconvex Optimization math.OC · 2026-06-21 · unverdicted · none · ref 3
The paper establishes the first finite-time convergence rate of 1/T^{2/13} for classical Adam (with bias correction, no extra steps) in nonsmooth nonconvex optimization under heavy-tailed noise with β1=β2.
A Stochastic--Geometric Theory of Scaling Laws in Grokking stat.ML · 2026-06-29 · unverdicted · none · ref 2
A stochastic-geometric model of solution-space topology under Adam derives explicit scaling laws for grokking transition time as a function of learning rate, batch size, and L2 coefficient.
Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction cs.LG · 2026-05-14 · unverdicted · none · ref 36
DBS-Adam, which scales learning rates by batch difficulty from EMA gradient norms and loss, reaches 95.22% accuracy on Bi-LSTM accident severity prediction and shows statistically significant precision gains over AMSGrad, AdamW and AdaBound.

Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization,

fields

years

verdicts

representative citing papers

citing papers explorer