pith. sign in

arxiv: 1907.02710 · v1 · pith:A2COBMZLnew · submitted 2019-07-05 · 🧮 math.OC

Nesterov's acceleration and Polyak's heavy ball method in continuous time: convergence rate analysis under geometric conditions and perturbations

Pith reviewed 2026-05-25 02:26 UTC · model grok-4.3

classification 🧮 math.OC
keywords inertial gradient descentNesterov accelerationheavy ball methodcontinuous time ODEŁojasiewicz propertyconvergence ratesperturbationsconvex optimization
0
0 comments X

The pith

Inertial gradient descent ODEs achieve new asymptotic convergence rates on convex functions satisfying local Łojasiewicz properties when perturbations meet integrability conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes a family of second-order ordinary differential equations that model inertial gradient descent methods, including the continuous-time versions of Nesterov's acceleration and Polyak's heavy ball method. It derives new asymptotic bounds for the difference between the function value along solutions and its minimum, assuming the objective is convex and obeys local geometric conditions such as the Łojasiewicz property, together with suitable integrability on a perturbation term that represents gradient errors. A sympathetic reader would care because these bounds jointly account for geometry and perturbations for the first time in this setting, offering insight into how these methods behave under inexact gradients, as occurs in stochastic optimization.

Core claim

The central claim is that for convex functions F satisfying local Łojasiewicz properties, and under integrability conditions on the perturbation g, the solutions x(t) of the perturbed inertial ODEs satisfy new asymptotic bounds on F(x(t)) - F(x*), and this holds for various damping parameters including non-vanishing ones, providing the first joint analysis of geometric properties and perturbations for this family of ODEs.

What carries the argument

The family of second-order ODEs with inertia (damping parameter) and additive perturbation term g(t), whose solutions are analyzed for convergence rates using the local Łojasiewicz geometry of F.

Load-bearing premise

The objective function must satisfy local Łojasiewicz geometric properties and the perturbation must obey the required integrability conditions.

What would settle it

Observe whether the claimed asymptotic decay rate of F(x(t)) - F(x*) holds for a convex function that violates the local Łojasiewicz property while the perturbation g satisfies the integrability conditions.

read the original abstract

In this article a family of second order ODEs associated to inertial gradient descend is studied. These ODEs are widely used to build trajectories converging to a minimizer $x^*$ of a function $F$, possibly convex. This family includes the continuous version of the Nesterov inertial scheme and the continuous heavy ball method. Several damping parameters, not necessarily vanishing, and a perturbation term $g$ are thus considered. The damping parameter is linked to the inertia of the associated inertial scheme and the perturbation term $g$ is linked to the error that can be done on the gradient of the function $F$. This article presents new asymptotic bounds on $F(x(t))-F(x^*)$ where $x$ is a solution of the ODE, when $F$ is convex and satisfies local geometrical properties such as {\L}ojasiewicz properties and under integrability conditions on $g$. Even if geometrical properties and perturbations were already studied for most ODEs of these families, it is the first time they are jointly studied. All these results give an insight on the behavior of these inertial and perturbed algorithms if $F$ satisfies some {\L}ojasiewicz properties especially in the setting of stochastic algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes a family of second-order ODEs modeling inertial gradient descent schemes (continuous Nesterov acceleration and Polyak heavy-ball dynamics) that incorporate a damping parameter and an additive perturbation term g. Under the joint hypotheses that F is convex and obeys local Łojasiewicz geometry while g satisfies suitable integrability conditions, the paper derives new asymptotic bounds on the objective gap F(x(t))−F(x*). The work emphasizes that this is the first joint treatment of the geometric and perturbation assumptions.

Significance. If the stated bounds are valid, the results supply concrete convergence-rate information for inertial flows under local geometric structure and noise, directly relevant to the analysis of stochastic first-order methods. The explicit joint treatment of Łojasiewicz geometry with integrability on g fills a documented gap left by earlier separate analyses of each hypothesis.

minor comments (3)
  1. [Introduction] §1 (Introduction): the precise differential equation satisfied by the family of flows is stated only after several paragraphs; moving the ODE display to the first paragraph would improve readability.
  2. [Introduction] The statement that the joint geometric-plus-perturbation analysis is new would be strengthened by a short table or paragraph contrasting the hypotheses used in the cited prior works on geometry alone versus perturbation alone.
  3. Notation for the damping function and the perturbation integrability class is introduced without a consolidated table; a short notation summary would aid cross-referencing in the proofs.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives new asymptotic bounds on F(x(t))−F(x*) for the family of inertial ODEs (Nesterov and heavy-ball) under the joint hypotheses of convexity of F, local Łojasiewicz geometry on F, and integrability conditions on the perturbation g. These hypotheses are used directly as the closing assumptions for the estimates; no step reduces a claimed prediction to a fitted input by construction, no self-citation is load-bearing for a uniqueness claim, and no ansatz is smuggled via prior work. The derivation chain is therefore self-contained against the stated external conditions and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

Analysis rests on standard domain assumptions of convexity and Łojasiewicz geometry plus an integrability condition on the perturbation; no free parameters or invented entities are introduced in the abstract.

axioms (3)
  • domain assumption F is convex
    Required for the stated asymptotic bounds (abstract).
  • domain assumption F satisfies local Łojasiewicz properties
    Central geometric assumption invoked for the new bounds (abstract).
  • domain assumption g satisfies integrability conditions
    Necessary for the perturbation analysis (abstract).

pith-pipeline@v0.9.0 · 5772 in / 1179 out tokens · 37612 ms · 2026-05-25T02:26:50.930101+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Apidopoulos, J.-F

    V. Apidopoulos, J.-F. Aujol, Ch. Dossal, and A. Rondepie rre. Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. pre print, December 2018

  2. [2]

    Attouch and J

    H. Attouch and J. Bolte. On the convergence of the proxima l algorithm for nonsmooth functions involving analytic features. Mathematical Programming, 116(1):5–16, 2009

  3. [3]

    Attouch and A

    H. Attouch and A. Cabot. Asymptotic stabilization of ine rtial gradient dynamics with time-dependent viscosity. Journal of Differential Equations , 263(9):5412–5458, 2017

  4. [4]

    Attouch and A

    H. Attouch and A. Cabot. Convergence rates of inertial fo rward-backward algorithms. SIAM Journal on Optimization, 28(1):849–874, 2018

  5. [5]

    Fast inertial dynamics and FISTA algorithms in convex optimization. Perturbation aspects

    H. Attouch and Z. Chbani. Fast inertial dynamics and FIST A algorithms in convex optimization. Pertur- bation aspects. arXiv preprint arXiv:1507.01367 , 2015

  6. [6]

    Attouch, Z

    H. Attouch, Z. Chbani, J. Peypouquet, and P. Redont. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Mathematical Programming, 168(1-2):123–175, 2018

  7. [7]

    Attouch, Z

    H. Attouch, Z. Chbani, and H. Riahi. Rate of convergence o f the Nesterov accelerated gradient method in the subcritical case α ⩽ 3. ESAIM: COCV , 2019

  8. [8]

    Attouch and J

    H. Attouch and J. Peypouquet. The rate of convergence of N esterov’s accelerated forward-backward method is actually faster than 1 k2 . SIAM Journal of Optimization , 26(3):1824–1834, 2016

  9. [9]

    Aujol and C

    J-F. Aujol and C. Dossal. Optimal rate of convergence of a n ODE associated to the fast gradient descent schemes for b > 0. Hal Preprint hal-01547251 , June 2017

  10. [10]

    Optimal convergence rates for Nesterov acceleration

    J.-F. Aujol, C. Dossal, and A. Rondepierre. Optimal con vergence rates for Nesterov acceleration. arXiv preprint arXiv:1805.05719, 2018

  11. [11]

    Balti and R

    M. Balti and R. May. Asymptotic for the perturbed heavy b all system with vanishing damping term. Evolution Equations & Control Theory , 6(2):177–186, 2017

  12. [12]

    Bégout, J

    P. Bégout, J. Bolte, and M. A. Jendoubi. On damped second order gradients systems. Journal of Differential Equation, 259(9):3315–3143, 2015

  13. [13]

    Bolte, T.P

    J. Bolte, T.P. Nguyen, J. Peypouquet, and B.W. Suter. Fr om error bounds to the complexity of first-order descent methods for convex functions. Mathematical Programming, 165(2):471–507, 2017

  14. [14]

    H. Brezis. Opérateurs maximaux monotones et semi-groupes de contracti ons dans les espaces de Hilbert , volume 5. Elsevier, 1973

  15. [15]

    Cabot, H

    A. Cabot, H. Engler, and S. Gadat. On the long time behavi or of second order differential equations with asymptotically small dissipation. Transactions of the American Mathematical Society , 361(11):5983–6017, 2009

  16. [16]

    Cabot, H

    A. Cabot, H. Engler, S. Gadat, et al. Second-order differ ential equations with asymptotically small dissi- pation and piecewise flat potentials. Electronic Journal of Differential Equation , 17:33–38, 2009

  17. [17]

    fast iterative shrinkage/thresholding algorithm

    A. Chambolle and C. Dossal. On the convergence of the ite rates of the “fast iterative shrinkage/thresholding algorithm”. Journal of Optimization Theory and Applications , 166(3):968–982, 2015

  18. [18]

    Garrigos, L

    G. Garrigos, L. Rosasco, and S. Villa. Convergence of th e forward-backward algorithm: Beyond the worst case with the help of geometry. arXiv preprint arXiv:1703.09477 , 2017

  19. [19]

    On a second order dissipative ode in hilbert space with an integrable source term

    Alain Haraux and Mohamed Ali Jendoubi. On a second order dissipative ode in hilbert space with an integrable source term. Acta Mathematica Scientia , 32(1):155 – 163, 2012. Mathematics Dedicated to professor Constantine M. Dafermos on the occasion of his 70t h birthday

  20. [20]

    M. A. Jendoubi and R. May. Asymptotics for a second-orde r differential equation with nonautonomous damping and an integrable source term. Applicable Analysis, 94(2):435–443, 2015. 21

  21. [21]

    Łojasiewicz

    S. Łojasiewicz. Une propriété topologique des sous-en sembles analytiques réels. In Les Équations aux Dérivées Partielles (Paris, 1962) , pages 87–89. Éditions du Centre National de la Recherche Sc ientifique, Paris, 1963

  22. [22]

    Łojasiewicz

    S. Łojasiewicz. Sur la géométrie semi- et sous-analyti que. Annales de l’Institut Fourier. Université de Grenoble, 43(5):1575–1595, 1993

  23. [23]

    R. May. Asymptotic for a second order evolution equatio n with convex potential and vanishing damping term. Turkish Journal of Mathematics , 41:681–685, 2017

  24. [24]

    Polyak and P

    B. Polyak and P. Shcherbakov. Lyapunov functions: An op timization theory perspective. IF AC- PapersOnLine, 50(1):7456–7461, 2017

  25. [25]

    W. Su, S. Boyd, and E. J. Candes. A differential equation f or modeling Nesterov’s accelerated gradient method: theory and insights. Journal of Machine Learning Research , 17(153):1–43, 2016. 22