pith. sign in

arxiv: 1808.02941 · v2 · pith:PXU6XLVAnew · submitted 2018-08-08 · 💻 cs.LG · math.OC· stat.ML

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

classification 💻 cs.LG math.OCstat.ML
keywords algorithmsclassconvergenceadam-typeadaptiveconditionsgradientmethods
0
0 comments X
read the original abstract

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconvex problems remains an open question. This paper provides a set of mild sufficient conditions that guarantee the convergence for the Adam-type methods. We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization. We show the conditions are essential in the sense that violating them may make the algorithm diverge. Moreover, we propose and analyze a class of (deterministic) incremental adaptive gradient algorithms, which has the same $O(\log{T}/\sqrt{T})$ convergence rate. Our study could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

    math.OC 2026-04 unverdicted novelty 8.0

    Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.

  2. Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

    cs.LG 2026-05 unverdicted novelty 7.0

    Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.

  3. On the Convergence of Muon and Beyond

    cs.LG 2025-09 unverdicted novelty 7.0

    Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.

  4. Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima

    math.OC 2023-07 unverdicted novelty 7.0

    Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of ...

  5. Anon: Extrapolating Adaptivity Beyond SGD and Adam

    cs.AI 2026-05 unverdicted novelty 6.0

    Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

  6. Adaptive Federated Optimization

    cs.LG 2020-02 unverdicted novelty 6.0

    Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

  7. Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

    cs.LG 2022-12 unverdicted novelty 2.0

    A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.