pith. sign in

arxiv: 2606.30050 · v1 · pith:65Q55CFTnew · submitted 2026-06-29 · 🧮 math.OC

A Restart-Free Accelerated Algorithm for Non-Convex Minimization: Continuous and Discrete Analysis

Pith reviewed 2026-06-30 05:22 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonconvex optimizationfirst-order methodsrestart-free algorithmsaccelerated convergencecontinuous-time analysisperformance estimation problemadaptive Lipschitz estimationordinary differential equations
0
0 comments X

The pith

Two new first-order methods reach an ε-approximate stationary point for nonconvex problems in O(ε^{-7/4}) evaluations without restarts or prior knowledge of ε.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two algorithms for minimizing nonconvex functions whose gradients and Hessians are Lipschitz continuous. Both reach an approximate first-order stationary point using O(ε^{-7/4}) function and gradient evaluations, and neither requires ε as an input parameter or any restart mechanism. The methods arise from discretizing a new continuous-time ordinary differential equation model, with convergence proved in parallel for the continuous and discrete versions inside the Performance Estimation Problem framework. The first algorithm keeps its last iterate differentiable with respect to the starting point; the second estimates the Lipschitz constants on the fly.

Core claim

By discretizing a newly proposed continuous-time ODE model, the authors obtain two restart-free first-order methods that attain the O(ε^{-7/4}) complexity for finding an ε-approximate first-order stationary point of a nonconvex function with Lipschitz gradient and Hessian, without using ε as an input parameter and without any restart mechanism. The continuous- and discrete-time analyses proceed in parallel under the Performance Estimation Problem framework. One version yields a simple implementation whose final iterate remains differentiable with respect to the initial point; the other version estimates the Lipschitz constants adaptively and therefore requires no a priori knowledge of those

What carries the argument

A newly introduced continuous-time ordinary differential equation model whose discretizations produce the algorithms and whose convergence properties transfer to the discrete setting under the Performance Estimation Problem framework.

If this is right

  • The first algorithm admits a simple implementation in which the last iterate is differentiable with respect to the initial point.
  • The second algorithm estimates the required Lipschitz constants during execution and therefore needs no a priori knowledge of them.
  • Both algorithms avoid restart mechanisms while matching the best known complexity for this problem class.
  • The continuous and discrete analyses are carried out in parallel inside the same Performance Estimation Problem framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the final iterate of the first algorithm remains differentiable, the method could be embedded inside larger differentiable pipelines such as hyperparameter optimization or end-to-end learning.
  • The restart-free design may reduce sensitivity to the choice of restart frequency on problems where the landscape changes across scales.
  • The parallel continuous-discrete analysis suggests that similar ODE models could be discretized for other nonconvex problem classes while preserving the PEP-based rate.
  • Adaptive estimation of Lipschitz constants without restarts may improve practical robustness on functions where the local smoothness varies.

Load-bearing premise

The convergence guarantees proved for the continuous ODE model carry over to its discretizations inside the Performance Estimation Problem framework without extra error terms that would worsen the rate.

What would settle it

A concrete nonconvex function with known Lipschitz constants on which the proposed discrete algorithm requires more than O(ε^{-7/4}) evaluations to reach an ε-stationary point, or a counter-example showing that the PEP analysis introduces a discretization error that degrades the rate below O(ε^{-7/4}).

read the original abstract

We propose two novel first-order methods for minimizing nonconvex functions with Lipschitz-continuous gradients and Hessians. These algorithms attain an $\varepsilon$-approximate first-order stationary point in $\mathrm{O}(\varepsilon^{-7/4})$ function and gradient evaluations, without using $\varepsilon$ as an input parameter. While existing methods rely on restart mechanisms to achieve this complexity, our methods do not. Consequently, the first algorithm enjoys a simple implementation, making its last iterate differentiable with respect to the initial point. By estimating the Lipschitz constants adaptively, we develop the second algorithm that does not require prior knowledge of the constants. This algorithm exhibits better numerical performance than existing parameter-free methods for certain problems, which can be attributed to its restart-free design. Both algorithms are derived by discretizing a newly introduced continuous-time model represented by an ordinary differential equation, and their continuous- and discrete-time convergence analyses proceed in a parallel manner under the Performance Estimation Problem framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes two novel restart-free first-order methods for non-convex minimization of functions with Lipschitz gradients and Hessians. Derived by discretizing a new continuous-time ODE, the methods achieve an O(ε^{-7/4}) complexity bound for ε-approximate first-order stationary points without restarts or knowledge of ε in advance. One method is simple with a differentiable last iterate; the second adaptively estimates the Lipschitz constants. Convergence analyses for both the continuous ODE and discrete algorithms are carried out in parallel under the Performance Estimation Problem (PEP) framework.

Significance. If the central claims hold, the work is significant for providing restart-free accelerated rates in non-convex optimization, simplifying implementation compared to restart-based methods and enabling better practical performance in the adaptive variant. The parallel continuous-discrete PEP analysis is a methodological contribution, and the differentiability property supports applications such as bilevel optimization. The adaptive algorithm's lack of prior parameter knowledge addresses a common practical limitation.

major comments (2)
  1. [§5] §5 (discretization and PEP transfer): the analysis transfers the O(ε^{-7/4}) rate from the continuous ODE to the discrete algorithm via PEP but does not explicitly bound or cancel discretization error terms; without such bounds the rate preservation for the discrete iterates is not guaranteed and requires additional justification.
  2. [§6.2] §6.2 (adaptive Lipschitz estimation): the second algorithm's adaptive estimation procedure is claimed to preserve the O(ε^{-7/4}) bound on function/gradient evaluations, yet the number of extra evaluations required for the estimation is not bounded in the PEP formulation, which could inflate the total complexity.
minor comments (2)
  1. [Notation] Notation for the new ODE (Eq. (3) or equivalent) uses several auxiliary functions whose definitions are spread across sections; a consolidated table of symbols would improve readability.
  2. [Numerical experiments] Figure 2 (numerical comparison) lacks error bars or multiple random seeds, making it difficult to assess whether the reported advantage over restart-based baselines is statistically consistent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address the two major comments below, proposing revisions to strengthen the presentation and analysis.

read point-by-point responses
  1. Referee: [§5] §5 (discretization and PEP transfer): the analysis transfers the O(ε^{-7/4}) rate from the continuous ODE to the discrete algorithm via PEP but does not explicitly bound or cancel discretization error terms; without such bounds the rate preservation for the discrete iterates is not guaranteed and requires additional justification.

    Authors: We appreciate the referee pointing out the need for clearer justification. The PEP analysis is formulated directly on the discrete algorithm's update rules (in parallel with the continuous ODE), so that the rate is obtained without relying on a transfer from the continuous-time bound. Nevertheless, to make this explicit and address the concern about discretization error terms, we will revise §5 to include a dedicated paragraph explaining why no additional error bounds are required in the PEP setup and how the discrete dynamics are captured exactly. revision: yes

  2. Referee: [§6.2] §6.2 (adaptive Lipschitz estimation): the second algorithm's adaptive estimation procedure is claimed to preserve the O(ε^{-7/4}) bound on function/gradient evaluations, yet the number of extra evaluations required for the estimation is not bounded in the PEP formulation, which could inflate the total complexity.

    Authors: We thank the referee for this observation. The adaptive estimation steps are part of the algorithm, but we agree that their contribution to the total number of evaluations must be bounded explicitly within the PEP framework. We will revise §6.2 to augment the PEP formulation with the estimation procedure and derive a bound showing that the extra evaluations are absorbed into the overall O(ε^{-7/4}) complexity without inflation. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces a new continuous-time ODE and derives discrete algorithms by discretization, with both continuous and discrete analyses conducted in parallel under the PEP framework. No equations or steps are quoted that reduce the claimed O(ε^{-7/4}) rate to a fitted parameter, self-defined quantity, or self-citation chain by construction. The abstract presents the ODE as newly introduced rather than reverse-engineered from the target rate, and the PEP application is described as transferring properties without additional degrading terms. This leaves the central claim with independent analytical content, consistent with a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; no explicit free parameters, axioms beyond the stated Lipschitz assumptions, or invented entities beyond the new ODE are visible.

axioms (1)
  • domain assumption Objective functions have Lipschitz-continuous gradients and Hessians
    Required for the stated complexity and stated in the abstract.
invented entities (1)
  • New continuous-time ODE model no independent evidence
    purpose: To serve as the source from which both discrete algorithms are derived
    Described as newly introduced in the abstract

pith-pipeline@v0.9.1-grok · 5691 in / 1286 out tokens · 46667 ms · 2026-06-30T05:22:09.073520+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 32 canonical work pages · 2 internal anchors

  1. [1]

    Springer Optimization and Its Applications, vol

    Nesterov, Y .: Lectures on Convex Optimization. Springer Optimization and Its Applications, vol. 137, p. 589. Springer, Cham (2018). https://doi.org/10.1007/ 978-3-319-91578-4

  2. [2]

    In: Proceedings of the 39th Inter- national Conference on Machine Learning, vol

    Li, H., Lin, Z.: Restarted nonconvex accelerated gradient descent: No more poly- logarithmic factor in the o(ε−7/4) complexity. In: Proceedings of the 39th Inter- national Conference on Machine Learning, vol. 162, pp. 12901–12916 (2022). https://proceedings.mlr.press/v162/li22o.html

  3. [3]

    Marumo, N., Takeda, A.: Parameter-free accelerated gradient descent for noncon- vex minimization. SIAM J. Optim. 34(2), 2093–2120 (2024) https://doi.org/10.1137/ 22M1540934

  4. [4]

    Marumo, N., Takeda, A.: Universal heavy-ball method for nonconvex optimization under H ¨older continuous Hessians. Math. Program. 212(1-2), 147–175 (2025) https: //doi.org/10.1007/s10107-024-02100-4 46

  5. [5]

    Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

    Zhou, D.: Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization. https://arxiv.org/abs/2606.05438

  6. [6]

    Cartis, C., Gould, N.I.M., Toint, P..L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010) https://doi.org/10.1137/090774100

  7. [7]

    Carmon, Y ., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding sta- tionary points I. Math. Program. 184(1-2), 71–120 (2020) https://doi.org/10.1007/ s10107-019-01406-y

  8. [8]

    Ochs, P., Chen, Y ., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014) https://doi.org/10.1137/ 130942954

  9. [9]

    O’Neill, M., Wright, S.J.: Behavior of accelerated gradient methods near critical points of nonconvex functions. Math. Program. 176(1-2), 403–427 (2019) https://doi.org/10. 1007/s10107-018-1340-y

  10. [10]

    In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp

    Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima faster than gradient descent. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199 (2017). https://doi.org/10.1145/ 3055399.3055464

  11. [11]

    Carmon, Y ., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for noncon- vex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018) https://doi.org/10.1137/ 17M1114296

  12. [12]

    Royer, C.W., O’Neill, M., Wright, S.J.: A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math. Program. 180(1-2), 451–488 (2020) https://doi.org/10.1007/s10107-019-01362-7

  13. [13]

    Royer, C.W., Wright, S.J.: Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28(2), 1448–1477 (2018) https: //doi.org/10.1137/17M1134329

  14. [14]

    NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization

    Xu, Y ., Jin, R., Yang, T.: Neon+: Accelerated gradient methods for extracting negative curvature for non-convex optimization. arXiv preprint arXiv:1712.01033 (2017)

  15. [15]

    In: Advances in Neural Information Processing Systems, vol

    Allen-Zhu, Z., Li, Y .: Neon2: Finding local minima via first-order oracles. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

  16. [16]

    In: Proceedings of the 31st Conference On Learning The- ory, vol

    Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent. In: Proceedings of the 31st Conference On Learning The- ory, vol. 75, pp. 1042–1085 (2018). https://proceedings.mlr.press/v75/jin18a.html

  17. [17]

    Convex until proven guilty

    Carmon, Y ., Duchi, J.C., Hinder, O., Sidford, A.: “Convex until proven guilty”: 47 Dimension-free acceleration of gradient descent on non-convex functions. In: Proceed- ings of the 34th International Conference on Machine Learning, vol. 70, pp. 654–663 (2017). https://proceedings.mlr.press/v70/carmon17a.html

  18. [18]

    In: Pro- ceedings of the 57th Annual ACM Symposium on Theory of Computing, pp

    Jiang, R., Mokhtari, A., Patitucci, F.: Improved complexity for smooth nonconvex optimization: A two-level online learning approach with quasi-newton methods. In: Pro- ceedings of the 57th Annual ACM Symposium on Theory of Computing, pp. 2225–2236 (2025). https://doi.org/10.1145/3717823.3718308

  19. [19]

    Carmon, Y ., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points II: first-order methods. Math. Program.185(1-2), 315–355 (2021) https://doi.org/ 10.1007/s10107-019-01431-x

  20. [20]

    In: Advances in Neural Information Processing Systems, vol

    Andrychowicz, M., Denil, M., G ´omez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

  21. [21]

    In: Proceedings of the 34th International Conference on Machine Learn- ing, vol

    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learn- ing, vol. 70, pp. 1126–1135 (2017).https://proceedings.mlr.press/v70/finn17a.html

  22. [22]

    In: International Conference on Learning Representations (2017)

    Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversar- ial networks. In: International Conference on Learning Representations (2017). https://openreview.net/forum?id=BydrOIcle

  23. [23]

    In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp

    Gregor, K., LeCun, Y .: Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 399–406 (2010)

  24. [24]

    In: Advances in Neural Information Processing Systems, vol

    Yang, Y ., Sun, J., Li, H., Xu, Z.: Deep ADMM-Net for compressive sensing mri. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

  25. [25]

    USSR Comput

    Polyak, B.T.: Some methods of speeding up the convergence of iterative methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964) https://doi.org/10.1016/0041-5553(64) 90137-5

  26. [26]

    Su, W., Boyd, S., Cand `es, E.J.: A differential equation for modeling Nesterov’s accel- erated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)

  27. [27]

    Nesterov, Y .E.: A method for solving the convex programming problem with conver- gence rate O(1/k2). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)

  28. [28]

    In: Advances in Neural Information Processing Systems, vol

    Krichene, W., Bayen, A., Bartlett, P.L.: Accelerated mirror descent in continuous and discrete time. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

  29. [29]

    Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon 48 via high-resolution differential equations. Math. Program. 195(1-2), 79–148 (2022) https://doi.org/10.1007/s10107-021-01681-8

  30. [30]

    Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of accelerated methods in optimization. J. Mach. Learn. Res. 22(113), 1–34 (2021)

  31. [31]

    arXiv preprint arXiv:2406.06100 (2024)

    Okamura, K., Marumo, N., Takeda, A.: Primitive heavy-ball dynamics achieves o(ε−7/4) convergence for nonconvex optimization. arXiv preprint arXiv:2406.06100 (2024)

  32. [32]

    In: Advances in Neural Information Processing Systems, vol

    Ushiyama, K., Sato, S., Matsuo, T.: A unified discretization framework for differential equation approach with Lyapunov arguments for convex optimiza- tion. In: Advances in Neural Information Processing Systems, vol. 37 (2023). https://openreview.net/forum?id=8YN62t19AW

  33. [33]

    Foundations and Trends in Optimization 5(1-2), 1–245 (2021) https://doi.org/10.1561/2400000036

    d’Aspremont, A., Scieur, D., Taylor, A.: Acceleration methods. Foundations and Trends in Optimization 5(1-2), 1–245 (2021) https://doi.org/10.1561/2400000036

  34. [34]

    In: International Conference on Machine Learning (2018)

    Taylor, A., Van Scoy, B., Lessard, L.: Lyapunov functions for first-order methods: Tight automated convergence guarantees. In: International Conference on Machine Learning (2018). https://proceedings.mlr.press/v80/taylor18a.html

  35. [35]

    In: Proceedings of the Thirty- Second Conference on Learning Theory, vol

    Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In: Proceedings of the Thirty- Second Conference on Learning Theory, vol. 99, pp. 2934–2992 (2019). https://proceedings.mlr.press/v99/taylor19a.html

  36. [36]

    Upadhyaya, M., Banert, S., Taylor, A.B., Giselsson, P.: Automated tight Lyapunov anal- ysis for first-order methods. Math. Program. 209(1-2), 133–170 (2025) https://doi.org/ 10.1007/s10107-024-02061-8

  37. [37]

    In: International Conference on Machine Learning, pp

    Suh, J.J., Roh, G., Ryu, E.K.: Continuous-time analysis of accelerated gra- dient methods via conservation laws in dilated coordinate systems. In: International Conference on Machine Learning, pp. 20640–20667 (2022). https://proceedings.mlr.press/v162/suh22a.html

  38. [38]

    Moucer, C., Taylor, A., Bach, F.: A systematic approach to Lyapunov analyses of continuous-time models in convex optimization. SIAM J. Optim. 33(3), 1558–1586 (2023) https://doi.org/10.1137/22M1498486

  39. [39]

    JSIAM Letters 16, 29–32 (2024) https://doi.org/10.14495/jsiaml.16.29

    Kamijima, T., Sato, S., Ushiyama, K., Matsuo, T., Tanaka, K.: Analysis of continuous dynamical system models with hessians derived from optimization methods. JSIAM Letters 16, 29–32 (2024) https://doi.org/10.14495/jsiaml.16.29

  40. [40]

    Drori, Y ., Teboulle, M.: Performance of first-order methods for smooth convex mini- mization: a novel approach. Math. Program. 145(1-2, Ser. A), 451–482 (2014) https: //doi.org/10.1007/s10107-013-0653-0 49

  41. [41]

    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1-2), 307– 345 (2017) https://doi.org/10.1007/s10107-016-1009-3

  42. [42]

    In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp

    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance estimation toolbox (PESTO): Automated worst-case analysis of first-order optimization methods. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1278–1283 (2017). https://doi. org/10.1109/CDC.2017.8263832

  43. [43]

    arXiv preprint arXiv:2201.04040 (2022)

    Goujaud, B., Moucer, C., Glineur, F., Hendrickx, J., Taylor, A., Dieuleveut, A.: PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python. arXiv preprint arXiv:2201.04040 (2022)

  44. [44]

    Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex mini- mization. Math. Program. 159(1-2, Ser. A), 81–107 (2016) https://doi.org/10.1007/ s10107-015-0949-3

  45. [45]

    Taylor, A., Drori, Y .: An optimal gradient method for smooth strongly convex minimiza- tion. Math. Program., 1–38 (2022) https://doi.org/10.1007/s10107-022-01839-y

  46. [46]

    Jang, U., Gupta, S.D., Ryu, E.K.: Computer-assisted design of accelerated compos- ite optimization methods: OptISTA. Math. Program. in press (2025) https://doi.org/10. 1007/s10107-025-02258-5

  47. [47]

    Drori, Y ., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1-2), 321–351 (2016) https://doi.org/10.1007/s10107-016-0985-7

  48. [48]

    Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190(1-2), 57–87 (2021) https://doi.org/10.1007/s10107-021-01643-0

  49. [49]

    Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett.15(2), 405–418 (2021) https://doi.org/10.1007/s11590-020-01617-9

  50. [50]

    In: Proceedings of the 39th International Conference on Machine Learning, vol

    Park, J., Ryu, E.K.: Exact optimal accelerated complexity for fixed-point iterations. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 17420–17457 (2022). https://proceedings.mlr.press/v162/park22c.html

  51. [51]

    Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreas- ing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1), 192–219 (2021) https://doi.org/10.1007/s10957-020-01770-2

  52. [52]

    In: Proceedings of the 38th International Conference on Machine Learning, vol

    Yoon, T., Ryu, E.K.: Accelerated algorithms for smooth convex-concave minimax problems with o(1/k2) rate on squared gradient norm. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 12098–12109 (2021). https://proceedings.mlr.press/v139/yoon21d.html

  53. [53]

    Kim, D., Fessler, J.A.: Another look at the fast iterative shrinkage/thresholding 50 algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018) https://doi.org/10.1137/ 16M108940X

  54. [54]

    Klerk, E., Glineur, F.c., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett.11(7), 1185–1199 (2017) https://doi.org/10.1007/s11590-016-1087-4

  55. [55]

    In: Proceedings of the 37th International Conference on Machine Learning, vol

    Drori, Y ., Shamir, O.: The complexity of finding stationary points with stochastic gradi- ent descent. In: Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 2658–2667 (2020). https://proceedings.mlr.press/v119/drori20a.html

  56. [56]

    Das Gupta, S., Freund, R.M., Sun, X.A., Taylor, A.: Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses. Math. Program. 213(1-2), 1–49 (2025) https://doi.org/10.1007/s10107-024-02127-7

  57. [57]

    Dragomir, R.-A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods. Math. Program. 194(1-2), 41–83 (2022) https://doi.org/10.1007/s10107-021-01618-1

  58. [58]

    Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020) https://doi.org/10.1137/19M1304854

  59. [59]

    Klerk, E., Glineur, F.c., Taylor, A.B.: Worst-case convergence analysis of inexact gra- dient and Newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3), 2053–2082 (2020) https://doi.org/10.1137/19M1281368

  60. [60]

    Barr ´e, M., Taylor, A.B., Bach, F.: Principled analyses and design of first-order methods with inexact proximal operators. Math. Program. 201(1-2), 185–230 (2023) https://doi. org/10.1007/s10107-022-01903-7

  61. [61]

    Das Gupta, S., Van Parys, B.P.G., Ryu, E.K.: Branch-and-bound performance esti- mation programming: a unified methodology for constructing optimal optimiza- tion methods. Math. Program. 204(1-2), 567–639 (2024) https://doi.org/10.1007/ s10107-023-01973-1

  62. [62]

    In: Advances in Neural Information Processing Systems, vol

    Kim, J., Yang, I.: Convergence analysis of ODE models for accelerated first-order meth- ods via positive semidefinite kernels. In: Advances in Neural Information Processing Systems, vol. 37 (2023). https://openreview.net/forum?id=XFE6zpevLc

  63. [63]

    METR 2024-02 (2024)

    Ushiyama, K., Sato, S., Matsuo, T.: Deriving optimal rates of continuous-time acceler- ated first-order methods via performance estimation problems. METR 2024-02 (2024)

  64. [64]

    arXiv preprint arXiv:2504.17193 (2025) 51

    Bot ¸, R.I., Dao, M.N., Liu, T., Lourenc ¸o, B.F., Marumo, N.: On the equiva- lence of a hessian-free inequality and lipschitz continuous hessian. arXiv preprint arXiv:2504.17193 (2025) 51

  65. [65]

    International Journal of Mathematical Modelling and Numerical Optimisation 4(2), 150–194 (2013) 52

    Jamil, M., Yang, X.-S.: A literature survey of benchmark functions for global opti- misation problems. International Journal of Mathematical Modelling and Numerical Optimisation 4(2), 150–194 (2013) 52