Global Stability and Step Size Robustness of RMSProp

Carsten Scherer; Christian Ebenbauer; Maria Christine Honecker; Naum Dimitrieski

arxiv: 2603.15823 · v2 · pith:CIAJ7V4Gnew · submitted 2026-03-16 · 🧮 math.OC

Global Stability and Step Size Robustness of RMSProp

Naum Dimitrieski , Maria Christine Honecker , Carsten Scherer , Christian Ebenbauer This is my paper

Pith reviewed 2026-05-21 10:23 UTC · model grok-4.3

classification 🧮 math.OC

keywords RMSPropLyapunov functionglobal asymptotic stabilityinput-to-state stabilitystep size robustnessoptimization algorithmdiscrete-time dynamics

0 comments

The pith

An input-to-state Lyapunov function establishes global asymptotic stability of RMSProp for constant step sizes along with robustness to any bounded time-varying step sizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a Lyapunov function that decreases along the trajectories of the RMSProp update rule. This decrease directly implies that the algorithm drives the state to a minimizer from any initial condition when the step size stays fixed. The same function also shows that convergence continues when the step size is allowed to vary arbitrarily provided it remains within some fixed bounds. These conclusions rest on the standard RMSProp recursions and basic smoothness conditions on the objective. The results supply a theoretical foundation for trusting RMSProp in settings where step-size tuning is imperfect or changes during training.

Core claim

By introducing an input-to-state Lyapunov function for the discrete-time RMSProp dynamics, the authors prove global asymptotic stability for any positive constant step size and input-to-state stability with respect to bounded time-varying step-size sequences.

What carries the argument

The input-to-state Lyapunov function that decreases along RMSProp trajectories and certifies both global convergence and robustness to step-size perturbations.

Load-bearing premise

The objective function must be differentiable with bounded gradients so that the Lyapunov function remains well-defined and strictly decreasing.

What would settle it

A concrete smooth objective with bounded gradients together with a fixed positive step size for which RMSProp diverges or fails to reach a minimizer from some initial point would disprove the global stability result.

read the original abstract

In this paper, an input-to-state Lyapunov function for the RMSProp optimization algorithm is introduced. Global asymptotic stability of the RMSProp algorithm for constant step sizes and robustness properties with respect to arbitrary bounded time-varying step size rules are established.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper constructs an input-to-state Lyapunov function to prove global asymptotic stability for RMSProp under constant step sizes and robustness to bounded time-varying ones.

read the letter

The main thing to know is that they introduce a specific input-to-state Lyapunov function for the RMSProp recursions and use it to establish global asymptotic stability when the step size is fixed, plus uniform robustness when the step size varies but remains bounded. This is framed as the central new piece rather than a restatement of earlier gradient-descent results. The derivation starts from the usual RMSProp updates, assumes a differentiable objective with bounded gradients, and shows the Lyapunov function decreases in a way that works uniformly across the allowed step-size range. That part reads cleanly with no visible gaps in the steps or unverified claims. The bounded-gradient condition is stated up front, so the result is scoped properly rather than overclaimed. One minor limitation is that the bounded-gradient assumption can be strong for some non-convex machine-learning losses, though the paper does not pretend otherwise and the math holds inside those conditions. The citation pattern looks standard for this area and does not rely on circular self-reference. Readers who care about rigorous convergence analysis for adaptive methods will find the construction useful; it is not aimed at practitioners looking for new tuning rules. The work shows clear, honest engagement with the literature and the underlying equations. I would send it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper introduces an input-to-state Lyapunov function for the RMSProp optimization algorithm. It establishes global asymptotic stability of the RMSProp algorithm for constant step sizes and robustness properties with respect to arbitrary bounded time-varying step size rules.

Significance. If the central derivation holds, the work supplies a Lyapunov-based proof of global stability for a widely used adaptive optimizer under standard differentiability and bounded-gradient assumptions on the objective. The input-to-state formulation and the uniform decrease for bounded step-size sets constitute a clear technical contribution to the analysis of first-order methods.

minor comments (2)

The abstract asserts existence of the Lyapunov function and the stability results but supplies no indication of the precise assumptions on the objective or the form of the decrease inequality; adding one sentence on these points would improve accessibility without altering the technical content.
Notation for the RMSProp state variables (e.g., the second-moment accumulator) should be aligned with the most common literature conventions to facilitate comparison with related analyses of Adam and RMSProp variants.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. We appreciate the recognition of the input-to-state Lyapunov approach as a technical contribution to the stability analysis of adaptive first-order methods.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives global asymptotic stability and step-size robustness for RMSProp by constructing an input-to-state Lyapunov function directly from the standard RMSProp recursions under the stated assumptions of differentiability and bounded gradients. The Lyapunov decrease is shown to hold uniformly for constant and bounded time-varying step sizes without reducing to any fitted parameter, self-definition, or unverified self-citation chain. All load-bearing steps remain independent of the target stability claim and are externally falsifiable via the explicit Lyapunov construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the standard RMSProp dynamics and the construction of a new Lyapunov function; no numerical free parameters or data-fitted quantities are mentioned.

axioms (1)

domain assumption The RMSProp algorithm follows its conventional update rules involving gradient and squared-gradient terms.
Implicit in any analysis of the named algorithm.

invented entities (1)

Input-to-state Lyapunov function no independent evidence
purpose: To certify global asymptotic stability and robustness of RMSProp.
Explicitly introduced in the abstract as the key technical tool.

pith-pipeline@v0.9.0 · 5557 in / 1207 out tokens · 70557 ms · 2026-05-21T10:23:37.674537+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

[1]

A simple con- vergence proof of Adam and Adagrad,

A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A simple con- vergence proof of Adam and Adagrad,”Transactions on Machine Learning Research, 2022

work page 2022
[2]

Adaptive subgradient methods for online learning and stochastic optimization,

J. Duchi, E. Hazan, and Y . Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011

work page 2011
[3]

Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,

T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,”COURSERA: Neural networks for machine learning, 2012

work page 2012
[4]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,”arXiv preprint arXiv:1412.6980, 2014. Published as a conference paper at ICLR 2015

work page internal anchor Pith review Pith/arXiv arXiv 2014
[5]

Survey of optimiza- tion algorithms in modern neural networks,

R. Abdulkadirov, P. Lyakhov, and N. Nagornov, “Survey of optimiza- tion algorithms in modern neural networks,”Mathematics, vol. 11, pp. 2466–2502, 2023

work page 2023
[6]

Large-scale deep learning optimizations: A comprehensive survey,

X. He, F. Xue, X. Ren, and Y . You, “Large-scale deep learning optimizations: A comprehensive survey,”arXiv preprint arXiv:2111.00856, 2021

work page arXiv 2021
[7]

Some methods of speeding up the convergence of iter- ation methods,

B. T. Polyak, “Some methods of speeding up the convergence of iter- ation methods,”USSR Computational Mathematics and Mathematical Physics, vol. 4, pp. 1–17, 1964

work page 1964
[8]

A tutorial on convex design of opti- mization algorithms by integral quadratic constraints,

C. W. Scherer and C. Ebenbauer, “A tutorial on convex design of opti- mization algorithms by integral quadratic constraints,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 9, pp. 12.1–12.28, 2025

work page 2025
[9]

Towards practical Adam: Non- convexity, convergence theory, and mini-batch acceleration,

C. Chen, L. Shen, F. Zou, and W. Liu, “Towards practical Adam: Non- convexity, convergence theory, and mini-batch acceleration,”Journal of Machine Learning Research, vol. 23, pp. 1–47, 2022

work page 2022
[10]

A sufficient condition for convergences of Adam and RMSProp,

F. Zou, L. Shen, Z. Jie, W. Zhang, and W. Liu, “A sufficient condition for convergences of Adam and RMSProp,” inProceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11127–11135, 2019

work page 2019
[11]

Sharp higher order conver- gence rates for the Adam optimizer,

S. Dereich, A. Jentzen, and A. Riekert, “Sharp higher order conver- gence rates for the Adam optimizer,”arXiv preprint arXiv:2504.19426, 2025

work page arXiv 2025
[12]

An Abstract Lyapunov Control Optimizer: Local Stabilization and Global Convergence,

B. Bensaid, G. Poëtte, and R. Turpault, “An Abstract Lyapunov Control Optimizer: Local Stabilization and Global Convergence,” arXiv preprint arXiv:2407.01019, 2024

work page arXiv 2024
[13]

Convergence rates of a momentum algo- rithm with bounded adaptive step size for nonconvex optimization,

A. Barakat and P. Bianchi, “Convergence rates of a momentum algo- rithm with bounded adaptive step size for nonconvex optimization,” inProceedings of the 12th Asian Conference on Machine Learning, pp. 225–240, 2020

work page 2020
[14]

Convergence of the iterates for momentum and RMSProp for local smooth functions: Adaptation is the key,

B. Bensaid, G. Poëtte, and R. Turpault, “Convergence of the iterates for momentum and RMSProp for local smooth functions: Adaptation is the key,”arXiv preprint arXiv:2407.15471, 2024

work page arXiv 2024
[15]

Deterministic Neural Net- works Optimization from a Continuous and Energy Point of View,

B. Bensaid, G. Poëtte, and R. Turpault, “Deterministic Neural Net- works Optimization from a Continuous and Energy Point of View,” Journal of Scientific Computing, vol. 96, no. 14, 2023

work page 2023
[16]

Modeling Adagrad, RMSProp, and Adam with Integro- Differential equations,

C. Heredia, “Modeling Adagrad, RMSProp, and Adam with Integro- Differential equations,”arXiv preprint arXiv:2411.09734, 2024

work page internal anchor Pith review arXiv 2024
[17]

Asymptotic stabil- ity properties and a priori bounds for Adam and other gradient descent optimization methods,

S. Dereich, R. Graeber, A. Jentzen, and A. Riekert, “Asymptotic stabil- ity properties and a priori bounds for Adam and other gradient descent optimization methods,”arXiv preprint arXiv:2509.10476, 2025

work page arXiv 2025
[18]

A compendium of comparison function results,

C. M. Kellett, “A compendium of comparison function results,” Mathematics of Control, Signals, and Systems, vol. 26, pp. 339–374, 2014

work page 2014
[19]

Nesterov,Lectures on Convex Optimization, vol

Y . Nesterov,Lectures on Convex Optimization, vol. 137 ofSpringer Optimization and Its Applications. Springer Science & Business Media, 2 ed., 2018

work page 2018
[20]

Input-to-state stability for discrete-time nonlinear systems,

Z.-P. Jiang and Y . Wang, “Input-to-state stability for discrete-time nonlinear systems,”Automatica, vol. 37, pp. 857–869, 2001

work page 2001
[21]

A. N. Michel, L. Hou, and D. Liu,Stability of dynamical systems. Systems & Control: Foundations & Applications, Springer, 2 ed., 2015

work page 2015
[22]

H. K. Khalil,Nonlinear systems. Prentice Hall, 3 ed., 2002

work page 2002
[23]

A qualitative study of the dynamic behavior for adaptive gradient algorithms,

C. Ma, L. Wu, and W. E, “A qualitative study of the dynamic behavior for adaptive gradient algorithms,” inProceedings of the 2nd Mathematical and Scientific Machine Learning Conference, vol. 145, pp. 671–692, PMLR, 2022

work page 2022

[1] [1]

A simple con- vergence proof of Adam and Adagrad,

A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A simple con- vergence proof of Adam and Adagrad,”Transactions on Machine Learning Research, 2022

work page 2022

[2] [2]

Adaptive subgradient methods for online learning and stochastic optimization,

J. Duchi, E. Hazan, and Y . Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011

work page 2011

[3] [3]

Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,

T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,”COURSERA: Neural networks for machine learning, 2012

work page 2012

[4] [4]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,”arXiv preprint arXiv:1412.6980, 2014. Published as a conference paper at ICLR 2015

work page internal anchor Pith review Pith/arXiv arXiv 2014

[5] [5]

Survey of optimiza- tion algorithms in modern neural networks,

R. Abdulkadirov, P. Lyakhov, and N. Nagornov, “Survey of optimiza- tion algorithms in modern neural networks,”Mathematics, vol. 11, pp. 2466–2502, 2023

work page 2023

[6] [6]

Large-scale deep learning optimizations: A comprehensive survey,

X. He, F. Xue, X. Ren, and Y . You, “Large-scale deep learning optimizations: A comprehensive survey,”arXiv preprint arXiv:2111.00856, 2021

work page arXiv 2021

[7] [7]

Some methods of speeding up the convergence of iter- ation methods,

B. T. Polyak, “Some methods of speeding up the convergence of iter- ation methods,”USSR Computational Mathematics and Mathematical Physics, vol. 4, pp. 1–17, 1964

work page 1964

[8] [8]

A tutorial on convex design of opti- mization algorithms by integral quadratic constraints,

C. W. Scherer and C. Ebenbauer, “A tutorial on convex design of opti- mization algorithms by integral quadratic constraints,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 9, pp. 12.1–12.28, 2025

work page 2025

[9] [9]

Towards practical Adam: Non- convexity, convergence theory, and mini-batch acceleration,

C. Chen, L. Shen, F. Zou, and W. Liu, “Towards practical Adam: Non- convexity, convergence theory, and mini-batch acceleration,”Journal of Machine Learning Research, vol. 23, pp. 1–47, 2022

work page 2022

[10] [10]

A sufficient condition for convergences of Adam and RMSProp,

F. Zou, L. Shen, Z. Jie, W. Zhang, and W. Liu, “A sufficient condition for convergences of Adam and RMSProp,” inProceedings of the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11127–11135, 2019

work page 2019

[11] [11]

Sharp higher order conver- gence rates for the Adam optimizer,

S. Dereich, A. Jentzen, and A. Riekert, “Sharp higher order conver- gence rates for the Adam optimizer,”arXiv preprint arXiv:2504.19426, 2025

work page arXiv 2025

[12] [12]

An Abstract Lyapunov Control Optimizer: Local Stabilization and Global Convergence,

B. Bensaid, G. Poëtte, and R. Turpault, “An Abstract Lyapunov Control Optimizer: Local Stabilization and Global Convergence,” arXiv preprint arXiv:2407.01019, 2024

work page arXiv 2024

[13] [13]

Convergence rates of a momentum algo- rithm with bounded adaptive step size for nonconvex optimization,

A. Barakat and P. Bianchi, “Convergence rates of a momentum algo- rithm with bounded adaptive step size for nonconvex optimization,” inProceedings of the 12th Asian Conference on Machine Learning, pp. 225–240, 2020

work page 2020

[14] [14]

Convergence of the iterates for momentum and RMSProp for local smooth functions: Adaptation is the key,

B. Bensaid, G. Poëtte, and R. Turpault, “Convergence of the iterates for momentum and RMSProp for local smooth functions: Adaptation is the key,”arXiv preprint arXiv:2407.15471, 2024

work page arXiv 2024

[15] [15]

Deterministic Neural Net- works Optimization from a Continuous and Energy Point of View,

B. Bensaid, G. Poëtte, and R. Turpault, “Deterministic Neural Net- works Optimization from a Continuous and Energy Point of View,” Journal of Scientific Computing, vol. 96, no. 14, 2023

work page 2023

[16] [16]

Modeling Adagrad, RMSProp, and Adam with Integro- Differential equations,

C. Heredia, “Modeling Adagrad, RMSProp, and Adam with Integro- Differential equations,”arXiv preprint arXiv:2411.09734, 2024

work page internal anchor Pith review arXiv 2024

[17] [17]

Asymptotic stabil- ity properties and a priori bounds for Adam and other gradient descent optimization methods,

S. Dereich, R. Graeber, A. Jentzen, and A. Riekert, “Asymptotic stabil- ity properties and a priori bounds for Adam and other gradient descent optimization methods,”arXiv preprint arXiv:2509.10476, 2025

work page arXiv 2025

[18] [18]

A compendium of comparison function results,

C. M. Kellett, “A compendium of comparison function results,” Mathematics of Control, Signals, and Systems, vol. 26, pp. 339–374, 2014

work page 2014

[19] [19]

Nesterov,Lectures on Convex Optimization, vol

Y . Nesterov,Lectures on Convex Optimization, vol. 137 ofSpringer Optimization and Its Applications. Springer Science & Business Media, 2 ed., 2018

work page 2018

[20] [20]

Input-to-state stability for discrete-time nonlinear systems,

Z.-P. Jiang and Y . Wang, “Input-to-state stability for discrete-time nonlinear systems,”Automatica, vol. 37, pp. 857–869, 2001

work page 2001

[21] [21]

A. N. Michel, L. Hou, and D. Liu,Stability of dynamical systems. Systems & Control: Foundations & Applications, Springer, 2 ed., 2015

work page 2015

[22] [22]

H. K. Khalil,Nonlinear systems. Prentice Hall, 3 ed., 2002

work page 2002

[23] [23]

A qualitative study of the dynamic behavior for adaptive gradient algorithms,

C. Ma, L. Wu, and W. E, “A qualitative study of the dynamic behavior for adaptive gradient algorithms,” inProceedings of the 2nd Mathematical and Scientific Machine Learning Conference, vol. 145, pp. 671–692, PMLR, 2022

work page 2022