pith. sign in

arxiv: 2604.06651 · v1 · submitted 2026-04-08 · 🧮 math.OC

Nesterov Flow May Travel Infinitely Long to Converge to a Minimizer

Pith reviewed 2026-05-10 18:19 UTC · model grok-4.3

classification 🧮 math.OC
keywords Nesterov flowconvex potentialinfinite arc lengthpointwise convergencerectifiabilityaccelerated gradientcontinuous-time dynamics
0
0 comments X

The pith

Nesterov flow can converge to a minimizer while traveling infinite path length.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether pointwise convergence of the Nesterov flow to a minimizer always comes with finite path length. It answers no by constructing an explicit differentiable convex potential in two dimensions where the flow approaches the minimum yet the total distance traveled grows without bound. This matters because it shows that convergence in position does not control the total variation of the trajectory for the continuous version of accelerated gradient methods. Readers will see that rectifiability is a strictly stronger property than point convergence in these dynamics.

Core claim

There exists a differentiable convex potential in R^2 for which the Nesterov flow converges to its minimizer but still accumulates infinite path length.

What carries the argument

The specially constructed convex differentiable potential in R^2 that defines an Nesterov ODE whose solutions converge pointwise but possess infinite arc length.

Load-bearing premise

The potential is convex and differentiable everywhere so that the Nesterov ODE is well-defined and the trajectory converges pointwise.

What would settle it

Numerical integration or an analytic proof showing that the constructed potential actually produces a trajectory of finite arc length.

Figures

Figures reproduced from arXiv: 2604.06651 by Ernest K. Ryu.

Figure 1
Figure 1. Figure 1: Illustration of the potential fε,a. The function is radial near the origin, while the one-sided quadratic perturbation εΨa(x1), whose purpose is to generate positive angular momentum, becomes inactive once the trajectory is sufficiently close to the origin. Let {X(t)}t≥0 be the solution to X¨(t) + 3 t X˙ (t) + ∇fε,a(X(t)) = 0, t > 0, X(0) = (2a, a), X˙ (0) = 0. Since the minimizer is unique, the convergenc… view at source ↗
Figure 2
Figure 2. Figure 2: Numerical simulation of the pathological Nesterov trajectory with [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Recent work has established that the trajectory of the Nesterov ODE, a the continuous-time model of Nesterov's accelerated gradient method, exhibits point convergence towards a minimizer of a convex potential. A natural next question is whether this point convergence can be upgraded to rectifiability, namely whether the convergent orbit has finite path length. This work provides the answer in the negative by constructing a differentiable convex potential in $\mathbb{R}^2$ for which the flow converges to its minimizer but still accumulates infinite path length. All proofs of this work are due entirely to an internal model at OpenAI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper constructs a differentiable convex potential f: R^2 → R such that the Nesterov ODE ẍ + (3/t)ẋ + ∇f(x) = 0 admits a trajectory that converges pointwise to a minimizer of f but has infinite arc length. All proofs are attributed to an internal OpenAI model.

Significance. If the construction is valid and the ODE trajectory is well-defined, the result shows that pointwise convergence of Nesterov flow does not imply rectifiability (finite path length). This would be a notable negative result for the regularity of continuous-time accelerated gradient dynamics on convex problems, distinguishing them from standard gradient flow.

major comments (2)
  1. [Abstract / construction of the potential] The central claim requires a classical solution to the Nesterov system ẋ = v, v̇ = -(3/t)v - ∇f(x) that is defined for all t > 0, converges pointwise, and has infinite length. However, the manuscript only assumes f is differentiable and convex, which guarantees existence of ∇f but not its continuity. If the constructed f has discontinuous gradient (possible for merely differentiable convex functions), the vector field is not continuous and Peano existence or Picard-Lindelöf uniqueness may fail along the purported orbit. This is load-bearing for the existence of the claimed trajectory.
  2. [Abstract] The manuscript states that 'all proofs of this work are due entirely to an internal model at OpenAI' with no explicit construction, no verification steps, and no external reproducibility provided. Without an explicit formula for f or a human-readable proof sketch, it is impossible to check whether the potential is indeed convex and differentiable everywhere, whether the ODE is well-posed, or whether the infinite-length property holds.
minor comments (1)
  1. [Abstract] The abstract refers to 'the Nesterov flow' and 'the Nesterov ODE' without writing the equation; including the explicit form ẍ + (3/t)ẋ + ∇f(x) = 0 would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their careful reading and for identifying key issues concerning the well-posedness of the ODE and the reproducibility of the construction. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract / construction of the potential] The central claim requires a classical solution to the Nesterov system ẋ = v, v̇ = -(3/t)v - ∇f(x) = 0 that is defined for all t > 0, converges pointwise, and has infinite length. However, the manuscript only assumes f is differentiable and convex, which guarantees existence of ∇f but not its continuity. If the constructed f has discontinuous gradient (possible for merely differentiable convex functions), the vector field is not continuous and Peano existence or Picard-Lindelöf uniqueness may fail along the purported orbit. This is load-bearing for the existence of the claimed trajectory.

    Authors: We agree that continuity of ∇f is necessary for the vector field to be continuous and for the existence of a unique classical solution via the Picard-Lindelöf theorem. The construction generated by the model yields a potential f that is in fact C¹ (hence ∇f continuous), which ensures local Lipschitz continuity of the right-hand side and well-posedness of the trajectory for all t > 0. We will revise the manuscript to state explicitly that the constructed f is C¹ and to include a short argument confirming that the ODE admits a classical solution along the claimed orbit. revision: yes

  2. Referee: [Abstract] The manuscript states that 'all proofs of this work are due entirely to an internal model at OpenAI' with no explicit construction, no verification steps, and no external reproducibility provided. Without an explicit formula for f or a human-readable proof sketch, it is impossible to check whether the potential is indeed convex and differentiable everywhere, whether the ODE is well-posed, or whether the infinite-length property holds.

    Authors: We acknowledge that reliance on an internal model without an accompanying explicit formula or human-readable proof sketch makes independent verification difficult. The model supplied both the specific potential and the verification of its properties, but we currently lack a closed-form expression or step-by-step derivation that can be checked without access to the model. In the revised manuscript we will extract and present as much concrete detail from the model output as possible (including the form of f and the key steps establishing convexity, differentiability, and infinite length) to improve checkability, while noting the origin of the construction. revision: partial

standing simulated objections not resolved
  • Full external reproducibility of the explicit potential and proof, which remain internal to the OpenAI model and cannot be supplied in human-generated form.

Circularity Check

0 steps flagged

Existence construction is self-contained with no reduction to inputs

full rationale

The paper's central result is an explicit construction of a differentiable convex potential in R^2 for which the Nesterov ODE trajectory converges pointwise to the minimizer yet has infinite arc length. This is established directly by verifying the properties of the constructed function against the ODE definition, without any fitted parameters, self-referential definitions, or load-bearing self-citations that would make the claim tautological. Background citations to prior point-convergence results are external and non-circular; the new negative result on rectifiability stands independently.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result depends on the existence of one specially chosen convex function whose properties are asserted by construction; no free parameters, new entities, or nonstandard axioms are introduced beyond ordinary convex analysis and ODE theory.

axioms (1)
  • domain assumption The Nesterov ODE is well-posed and the flow converges pointwise for the constructed potential.
    Required for the trajectory to be defined and to reach the minimizer.

pith-pipeline@v0.9.0 · 5388 in / 1086 out tokens · 59865 ms · 2026-05-10T18:19:36.171420+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Absil, R

    P.-A. Absil, R. Mahony, and B. Andrews , Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization , 16 (2005), pp. 531--547

  2. [2]

    Attouch, X

    H. Attouch, X. Goudou, and P. Redont , The heavy ball with friction method, I . The continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Communications in Contemporary Mathematics , 2 (2000), pp. 1--34

  3. [3]

    Attouch, J

    H. Attouch, J. Peypouquet, and P. Redont , Fast convergence of an inertial gradient-like system with vanishing viscosity, arXiv preprint arXiv:1507.04782, 2015

  4. [4]

    Attouch, J

    H. Attouch, J. Peypouquet, and P. Redont , Fast convex optimization via inertial dynamics with H essian driven damping, Journal of Differential Equations , 261 (2016), pp. 5734--5783

  5. [5]

    Attouch, Z

    H. Attouch, Z. Chbani, and H. Riahi , Rate of convergence of the N esterov accelerated gradient method in the subcritical case 3 , ESAIM: Control, Optimisation and Calculus of Variations , 25 (2019), article no. 2

  6. [6]

    Attouch, R

    H. Attouch, R. I. Bo t , D. A. Hulett, and D.-K. Nguyen , Recovering N esterov accelerated dynamics from heavy ball dynamics via time rescaling, arXiv preprint arXiv:2504.15852, 2025

  7. [7]

    Attouch, J

    H. Attouch, J. Bolte, P. Redont, and A. Soubeyran , Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the K urdyka-- ojasiewicz inequality, Mathematics of Operations Research , 35 (2010), pp. 438--457

  8. [8]

    Attouch, J

    H. Attouch, J. Bolte, and B. F. Svaiter , Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized G auss-- S eidel methods, Mathematical Programming , 137 (2013), pp. 91--129

  9. [9]

    Attouch, Z

    H. Attouch, Z. Chbani, J. Peypouquet, and P. Redont , Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming , 168 (2018), pp. 123--175

  10. [10]

    Attouch, Z

    H. Attouch, Z. Chbani, and H. Riahi , Fast proximal methods via time scaling of damped inertial dynamics, SIAM Journal on Optimization , 29 (2019), pp. 2227--2256

  11. [11]

    Bolte, A

    J. Bolte, A. Daniilidis, O. Ley, and L. Mazet , Characterizations of L ojasiewicz inequalities: Subgradient flows, talweg, convexity, Transactions of the American Mathematical Society , 362 (2010), pp. 3319--3363

  12. [12]

    R. I. Bo t , J. Fadili, and D.-K. Nguyen , The iterates of N esterov's accelerated algorithm converge in the critical regimes, arXiv preprint arXiv:2510.22715, 2025

  13. [13]

    Chambolle and C

    A. Chambolle and C. Dossal , On the convergence of the iterates of the ``fast iterative shrinkage/thresholding algorithm'', Journal of Optimization Theory and Applications , 166 (2015), pp. 968--982

  14. [14]

    D'Acunto and K

    D. D'Acunto and K. Kurdyka , Bounding the length of gradient trajectories, Annales Polonici Mathematici , 127 (2021), pp. 13--50

  15. [15]

    Daniilidis, G

    A. Daniilidis, G. David, E. Durand-Cartagena, and A. Lemenant , Rectifiability of self-contracted curves in the E uclidean space and applications, Journal of Geometric Analysis , 25 (2015), pp. 1211--1239

  16. [16]

    Daniilidis, O

    A. Daniilidis, O. Ley, and S. Sabourau , Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions, Journal de Math\'ematiques Pures et Appliqu\'ees , 94 (2010), pp. 183--199

  17. [17]

    Gupta, S

    C. Gupta, S. Balakrishnan, and A. Ramdas , Path length bounds for gradient descent and flow, Journal of Machine Learning Research , 22 (2021), pp. 1--63

  18. [18]

    Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

    U. Jang and E. K. Ryu , Point convergence of N esterov's accelerated gradient method: An AI -assisted proof, arXiv preprint arXiv:2510.23513, 2025

  19. [19]

    Kurdyka , On gradients of functions definable in o-minimal structures, Annales de l'Institut Fourier , 48 (1998), pp

    K. Kurdyka , On gradients of functions definable in o-minimal structures, Annales de l'Institut Fourier , 48 (1998), pp. 769--783

  20. [20]

    S. ojasiewicz , Une propri\'et\'e topologique des sous-ensembles analytiques r\'eels, In Les \'Equations aux D\'eriv\'ees Partielles (Paris, 1962) , \'Editions du Centre National de la Recherche Scientifique, Paris, 1963, pp. 87--89

  21. [21]

    Manselli and C

    P. Manselli and C. Pucci , Maximum length of steepest descent curves for quasi-convex functions, Geometriae Dedicata , 38 (1991), pp. 211--227

  22. [22]

    May , Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Mathematics , 41 (2017), pp

    R. May , Asymptotic for a second-order evolution equation with convex potential and vanishing damping term, Turkish Journal of Mathematics , 41 (2017), pp. 681--685

  23. [23]

    Nesterov , A method for solving the convex programming problem with convergence rate O(1/k^2) , Soviet Mathematics Doklady , 27 (1983), pp

    Y. Nesterov , A method for solving the convex programming problem with convergence rate O(1/k^2) , Soviet Mathematics Doklady , 27 (1983), pp. 372--376

  24. [24]

    Stepanov and Y

    E. Stepanov and Y. Teplitskaya , Self-contracted curves have finite length, Journal of the London Mathematical Society , 96 (2017), pp. 455--481

  25. [25]

    W. Su, S. Boyd, and E. J. Cand\`es , A differential equation for modeling N esterov's accelerated gradient method: Theory and insights, Advances in Neural Information Processing Systems , 27 (2014), pp. 2510--2518

  26. [26]

    W. Su, S. Boyd, and E. J. Cand\`es , A differential equation for modeling N esterov's accelerated gradient method: Theory and insights, Journal of Machine Learning Research , 17 (2016), pp. 1--43