pith. sign in

arxiv: 2604.03124 · v1 · submitted 2026-04-03 · 🧮 math.OC

Swarm-Based Inertial Methods for Optimization

Pith reviewed 2026-05-13 18:29 UTC · model grok-4.3

classification 🧮 math.OC
keywords swarm optimizationinertial dynamicsenergy dissipationconvergence ratesglobal minimizationHessian-based dampingOnsager principle
0
0 comments X

The pith

Swarm inertial dynamics with gradient-Hessian friction achieve an O(1/δ(t)) bound on the gap to the global minimum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces swarm-based inertial methods as coupled dissipative inertial dynamical systems derived from the generalized Onsager principle. It identifies the friction operator and the scaling of the objective function as the governing elements of relaxation over the energy landscape. A new underdamped system is proposed whose damping incorporates both gradient and Hessian information so that acceleration or damping adjusts to agent trajectories and landscape curvature. Under suitable conditions the system obeys an energy dissipation law that supplies an asymptotic upper bound O(1/δ(t)) on the gap between the objective value and its global minimum. Structure-preserving discretizations are built that inherit both the discrete energy law and the same O(1/δ_k) rate, and numerical tests confirm the rates on convex problems while showing stable global-minimum success on nonconvex benchmarks.

Core claim

The central claim is that, under suitable conditions, the proposed underdamped inertial dynamics satisfies an energy dissipation law from which an upper bound O(1/δ(t)) follows for the asymptotic decay of the gap between the objective function and its global minimum; the same rate O(1/δ_k) is retained by the structure-preserving discretizations constructed for the system.

What carries the argument

The underdamped inertial dynamics whose friction operator incorporates both gradient and Hessian information, which enforces the energy dissipation law and thereby produces the stated convergence bound.

If this is right

  • The energy dissipation law directly implies the O(1/δ(t)) upper bound on the objective gap.
  • Structure-preserving discretizations inherit both discrete energy dissipation and the O(1/δ_k) convergence estimate.
  • Numerical algorithms built from the dynamics exhibit faster practical convergence than the proven O(1/δ_k) guarantee on convex problems.
  • On nonconvex benchmark problems the methods reach the global minimum with high success rates and more stable energy decay than swarm gradient descent or Nesterov methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same friction construction could be applied to other inertial schemes that already use curvature information, potentially tightening their existing rates.
  • Because the swarm is coupled through the shared energy law, the framework may improve exploration in high-dimensional landscapes where single-particle inertial methods stagnate.
  • The rate O(1/δ(t)) suggests testing the method on problems where δ(t) grows slowly, such as ill-conditioned or high-dimensional convex objectives, to check practical scaling.

Load-bearing premise

The energy dissipation law holds only when unspecified suitable conditions are satisfied by the friction operator derived from the generalized Onsager principle.

What would settle it

A convex quadratic test case in which the function-value gap fails to decay at rate O(1/δ(t)) or faster while the friction operator is applied as defined would falsify the asymptotic bound.

Figures

Figures reproduced from arXiv: 2604.03124 by Kunhui Luan, Qi Wang, Qiyu Wu.

Figure 3
Figure 3. Figure 3: These results provide a comprehensive view of the methods’ accuracy, convergence behavior, and [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Comparison of F − F ∗ across methods for the 10D Rotated Hyper-Ellipsoid Function (h = 1/16). Each subplot is one method (left-to-right, top-to-bottom). The potential energy decays monotonically during the iteration for all methods except for the Nesterov method, which exhibits some slight oscillations. The GD method converges in two steps to the minimum value of F(x ∗ ). 20 [PITH_FULL_IMAGE:figures/full_… view at source ↗
Figure 2
Figure 2. Figure 2: Local convergence rate exponent pk across methods (10D Rotated Hyper-Ellipsoid Function, h = 1/16). The first iteration is initialized by x1 = x0 − 10−4∇F(x0), and the subsequent iterations follow the corresponding numerical schemes. All inertial methods demonstrate a uniform plateau except for the Nesterov method, which shows violent oscillations during the iteration. (a) FD (b) IMEX-RB (c) FB (d) Semi (e… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of E k across methods for the 10D Rotated Hyper-Ellipsoid Function (h = 1/16). Each subplot is one method (left-to-right, top-to-bottom). The discrete energy oscillates wildly when using the Nesterov method. In Tables 1–4, we present the average convergence rate, ¯p, for the Rotated Hyper-Ellipsoid function in different dimensions. Numerical results on convergence rates of other test functions a… view at source ↗
Figure 4
Figure 4. Figure 4: 1D test functions (B = 0). The Ackley function is smooth and symmetric, with a global minimum at the origin. It has a deep and narrow central valley surrounded by small oscillations caused by the cos term. These oscillations decrease away from the origin, and the function becomes nearly flat in the outer region. This combination of flat regions and a sharp valley makes the problem challenging for optimizat… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of ∥vk∥2 across methods for the 10D Rotated Hyper-Ellipsoid Function (h = 1/16). Each subplot is one method (left-to-right, top-to-bottom). C.2 All tables of convergence rates on higher-dimensional testing functions [PITH_FULL_IMAGE:figures/full_fig_p038_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of E k across methods for the 1D Ackley Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). 41 [PITH_FULL_IMAGE:figures/full_fig_p041_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of E k across methods for the 2D Ackley Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). (a) SB-FB (b) SB-IMEX-RB (c) SB-Semi (d) SB-IPAHD (e) SB-FD (f) SB-NM (g) SB-GD [PITH_FULL_IMAGE:figures/full_fig_p042_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of E k across methods for the 10D Ackley Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). 42 [PITH_FULL_IMAGE:figures/full_fig_p042_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of E k across methods for the 1D Rastrigin Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). (a) SB-FB (b) SB-IMEX-RB (c) SB-Semi (d) SB-IPAHD (e) SB-FD (f) SB-NM (g) SB-GD [PITH_FULL_IMAGE:figures/full_fig_p043_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of E k across methods for the 2D Rastrigin Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). 43 [PITH_FULL_IMAGE:figures/full_fig_p043_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of E k across methods for the 10D Rastrigin Function with B = 0. Each subplot is one method (left-to-right, top-to-bottom). 44 [PITH_FULL_IMAGE:figures/full_fig_p044_11.png] view at source ↗
read the original abstract

We introduce a new class of swarm-based inertial methods (SBIMs) for global minimization, formulated as coupled dissipative inertial dynamical systems derived from the generalized Onsager principle. The proposed framework identifies the friction operator and the scaling of the potential energy, namely the objective function to be minimized, as the key ingredients governing relaxation dynamics over the energy landscape. Within this framework, we propose a new underdamped inertial dynamics whose damping mechanisms incorporate both gradient and Hessian information, allowing the system to adjust damping or acceleration according to the agent trajectories and the curvature of the landscape. Under suitable conditions, we prove that the underdamped system satisfies an energy dissipation law, from which we establish an upper bound on the asymptotic decay rate of the gap between the objective function and its global minimum, given by $O(1/\delta(t))$ (defined in \S 3). We further construct structure-preserving discretizations that retain both discrete energy dissipation and the convergence rate estimate, $O(1/\delta_k)$ (defined in \S3). In addition, we present several other efficient numerical algorithms for the dynamical system. Numerical experiments for all proposed algorithms validate the theory on convex test problems and demonstrate convergence rates in function values that are substantially faster than the theoretical guarantees ($O(1/\delta_k)$). On nonconvex benchmark problems, the proposed methods achieve high success rates in reaching the global minimum, and exhibit more stable energy decay than swarm-based gradient descent and Nesterov methods. Overall, this work provides a systematic framework for the construction and analysis of SBIMs from an energy-dissipative perspective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces swarm-based inertial methods (SBIMs) as coupled dissipative inertial dynamical systems derived from the generalized Onsager principle. It proposes an underdamped inertial dynamics whose friction operator incorporates both gradient and Hessian information, asserts an energy dissipation law under suitable conditions from which an O(1/δ(t)) upper bound on the objective gap follows, constructs structure-preserving discretizations that retain discrete energy dissipation and the O(1/δ_k) rate, and supplies additional numerical algorithms. Numerical experiments on convex problems show rates faster than the theoretical bound, while nonconvex benchmarks exhibit high success rates in reaching global minima and more stable energy decay than swarm gradient descent or Nesterov methods.

Significance. If the energy dissipation law holds under the stated conditions, the work supplies a systematic energy-based framework for designing and analyzing inertial swarm methods with provable asymptotic rates and structure-preserving discretizations. The numerical validation on both convex and nonconvex problems, together with the explicit construction of discretizations that inherit the continuous dissipation property, constitutes a concrete contribution to the literature on inertial optimization methods.

major comments (2)
  1. [Abstract and §3] Abstract and §3: The energy dissipation law for the underdamped system is stated to hold 'under suitable conditions' on the friction operator, yet these conditions (sign/definiteness requirements, growth conditions on the potential, or restrictions on swarm coupling) are never enumerated. Without them the derivation of the O(1/δ(t)) gap bound cannot be verified and the central convergence claim remains unconfirmed.
  2. [§4] §4 (discretization section): The claim that the proposed structure-preserving schemes retain both discrete energy dissipation and the O(1/δ_k) rate estimate is asserted without an explicit error analysis relating the discretization step to the continuous dissipation inequality; this step is load-bearing for the discrete convergence guarantee.
minor comments (2)
  1. [§3] The definition and dependence of δ(t) (and δ_k) on time or other parameters should be stated explicitly at first use in §3 to avoid ambiguity in the rate statements.
  2. Notation for the swarm coupling terms and the precise form of the friction operator could be collected in a single preliminary section for easier reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments highlight important points for improving clarity and rigor, which we address point by point below. We will revise the manuscript to incorporate these suggestions.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3: The energy dissipation law for the underdamped system is stated to hold 'under suitable conditions' on the friction operator, yet these conditions (sign/definiteness requirements, growth conditions on the potential, or restrictions on swarm coupling) are never enumerated. Without them the derivation of the O(1/δ(t)) gap bound cannot be verified and the central convergence claim remains unconfirmed.

    Authors: We agree that the conditions must be explicitly enumerated to allow verification of the energy dissipation law and the O(1/δ(t)) bound. In the revised manuscript, we will insert a new subsection at the beginning of §3 that lists all required assumptions on the friction operator in full detail, including sign and definiteness conditions, growth restrictions on the potential, and any constraints on the swarm coupling. With these stated, the derivation of the dissipation law and the subsequent gap bound will be fully verifiable. revision: yes

  2. Referee: [§4] §4 (discretization section): The claim that the proposed structure-preserving schemes retain both discrete energy dissipation and the O(1/δ_k) rate estimate is asserted without an explicit error analysis relating the discretization step to the continuous dissipation inequality; this step is load-bearing for the discrete convergence guarantee.

    Authors: We acknowledge that an explicit error analysis is needed to rigorously justify that the discrete schemes inherit the continuous dissipation property and the O(1/δ_k) rate. In the revised §4 we will add a dedicated error analysis subsection that relates the local truncation error of the discretization to the continuous energy dissipation inequality, showing that the discrete energy decay holds up to higher-order terms that vanish asymptotically and therefore preserve the O(1/δ_k) rate. revision: yes

Circularity Check

0 steps flagged

No significant circularity: energy dissipation law and O(1/δ(t)) bound derived from external generalized Onsager principle

full rationale

The paper formulates SBIMs as dissipative inertial systems derived from the generalized Onsager principle (external to this work). It then proves an energy dissipation law under suitable conditions on the friction operator and establishes the O(1/δ(t)) gap bound directly from that law. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain. The 'suitable conditions' are a standard hypothesis list rather than a circular reduction; the central claims remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the generalized Onsager principle as the source of the dissipative structure; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Generalized Onsager principle governs the dissipative inertial dynamics
    Invoked to formulate the coupled swarm system and identify friction operator and potential scaling as key ingredients.

pith-pipeline@v0.9.0 · 5587 in / 1165 out tokens · 30178 ms · 2026-05-13T18:29:43.000254+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Frontiers and progress of current soft matter research

    Qi Wang. Frontiers and progress of current soft matter research. In Xiang you Liu, editor,Chapter 3: Generalized Onsager Principle and its Applications, pages 101–132. Springer Nature, New York, 2020

  2. [2]

    Swarm-based gradient descent method for non- convex optimization.Communications of the American Mathematical Society, 4(17):787–822, 2024

    Jingcheng Lu, Eitan Tadmor, and Anil Zengino˘ glu. Swarm-based gradient descent method for non- convex optimization.Communications of the American Mathematical Society, 4(17):787–822, 2024

  3. [3]

    First-order optimization algorithms via inertial systems with hessian driven damping.Mathematical Programming, 193(1):113–155, 2022

    Hedy Attouch, Zaki Chbani, Jalal Fadili, and Hassan Riahi. First-order optimization algorithms via inertial systems with hessian driven damping.Mathematical Programming, 193(1):113–155, 2022

  4. [4]

    Particle swarm optimization

    James Kennedy and Russell Eberhart. Particle swarm optimization. InProceedings of ICNN’95- international conference on neural networks, volume 4, pages 1942–1948. ieee, 1995

  5. [5]

    Parameter selection in particle swarm optimization

    Yuhui Shi and Russell C Eberhart. Parameter selection in particle swarm optimization. InInternational conference on evolutionary programming, pages 591–600. Springer, 1998

  6. [6]

    Empirical study of particle swarm optimization

    Yuhui Shi and Russell C Eberhart. Empirical study of particle swarm optimization. InProceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), volume 3, pages 1945–1950. IEEE, 1999

  7. [7]

    The particle swarm-explosion, stability, and convergence in a multidimensional complex space.IEEE transactions on Evolutionary Computation, 6(1):58–73, 2002

    Maurice Clerc and James Kennedy. The particle swarm-explosion, stability, and convergence in a multidimensional complex space.IEEE transactions on Evolutionary Computation, 6(1):58–73, 2002

  8. [8]

    Tracking and optimizing dynamic systems with particle swarms

    Russell C Eberhart and Yuhui Shi. Tracking and optimizing dynamic systems with particle swarms. In Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), volume 1, pages 94–100. IEEE, 2001. 31

  9. [9]

    Defining a standard for particle swarm optimization

    Daniel Bratton and James Kennedy. Defining a standard for particle swarm optimization. In2007 IEEE swarm intelligence symposium, pages 120–127. IEEE, 2007

  10. [10]

    A simple diversity guided particle swarm optimization

    Millie Pant, T Radha, and Ved Pal Singh. A simple diversity guided particle swarm optimization. In 2007 IEEE Congress on Evolutionary Computation, pages 3294–3299. IEEE, 2007

  11. [11]

    A diversity-guided hybrid particle swarm optimization based on gradient search

    Fei Han and Qing Liu. A diversity-guided hybrid particle swarm optimization based on gradient search. Neurocomputing, 137:234–240, 2014

  12. [12]

    The fully informed particle swarm: simpler, maybe better.IEEE transactions on evolutionary computation, 8(3):204–210, 2004

    Rui Mendes, James Kennedy, and Jos´ e Neves. The fully informed particle swarm: simpler, maybe better.IEEE transactions on evolutionary computation, 8(3):204–210, 2004

  13. [13]

    Dynamic multi-swarm particle swarm op- timizer

    Jane-Jing Liang and Ponnuthurai Nagaratnam Suganthan. Dynamic multi-swarm particle swarm op- timizer. InProceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005., pages 124–129. IEEE, 2005

  14. [14]

    Adaptively choosing niching parameters in a pso

    Stefan Bird and Xiaodong Li. Adaptively choosing niching parameters in a pso. InProceedings of the 8th annual conference on Genetic and evolutionary computation, pages 3–10, 2006

  15. [15]

    Population structure and particle swarm performance

    James Kennedy and Rui Mendes. Population structure and particle swarm performance. InProceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), volume 2, pages 1671–1676. IEEE, 2002

  16. [16]

    Using selection to improve particle swarm optimization

    Peter J Angeline. Using selection to improve particle swarm optimization. In1998 IEEE Interna- tional Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), pages 84–89. IEEE, 1998

  17. [17]

    Hybrid particle swarm optimiser with breeding and subpopulations

    Morten Lovbjerg, Thomas Kiel Rasmussen, Thiemo Krink, et al. Hybrid particle swarm optimiser with breeding and subpopulations. InProceedings of the genetic and evolutionary computation conference, volume 2001, pages 469–476. San Francisco, USA, 2001

  18. [18]

    Hybrid pso and ga for global maximization.Int

    Kandasamy Premalatha and AM Natarajan. Hybrid pso and ga for global maximization.Int. J. Open Problems Compt. Math, 2(4):597–608, 2009

  19. [19]

    Particle swarm optimization with recombi- nation and dynamic linkage discovery.IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(6):1460–1470, 2007

    Ying-Ping Chen, Wen-Chih Peng, and Ming-Chung Jian. Particle swarm optimization with recombi- nation and dynamic linkage discovery.IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(6):1460–1470, 2007

  20. [20]

    Clerc and J

    M. Clerc and J. Kennedy. The particle swarm - explosion, stability, and convergence in a multidi- mensional complex space.IEEE Transactions on Evolutionary Computation, 6(1):58–73, 2002. doi: 10.1109/4235.985692

  21. [21]

    On convergence analysis of particle swarm optimization algorithm.Jour- nal of Computational and Applied Mathematics, 333:65–73, 2018

    Gang Xu and Guosong Yu. On convergence analysis of particle swarm optimization algorithm.Jour- nal of Computational and Applied Mathematics, 333:65–73, 2018. ISSN 0377-0427. doi: https:// doi.org/10.1016/j.cam.2017.10.026. URLhttps://www.sciencedirect.com/science/article/pii/ S0377042717305277

  22. [22]

    Huang, J

    H. Huang, J. Qiu, and K. Riedl. On the global convergence of particle swarm optimization methods. Applied Mathematics and Optimization, 88:30, 2023. doi: 10.1007/s00245-023-09983-3. URLhttps: //doi.org/10.1007/s00245-023-09983-3

  23. [23]

    Swarm-based gradient descent meets simulated annealing.SIAM Journal on Numerical Analysis, 62(6):2745–2781, 2024

    Zhiyan Ding, Martin Guerra, Qin Li, and Eitan Tadmor. Swarm-based gradient descent meets simulated annealing.SIAM Journal on Numerical Analysis, 62(6):2745–2781, 2024. doi: 10.1137/24M1657808. URLhttps://doi.org/10.1137/24M1657808

  24. [24]

    Swarm-based optimization with random descent.Acta Applicandae Mathematicae, 190(1):2, 2024

    Eitan Tadmor and Anil Zengino˘ glu. Swarm-based optimization with random descent.Acta Applicandae Mathematicae, 190(1):2, 2024

  25. [25]

    Some methods of speeding up the convergence of iteration methods.Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964

    Boris T Polyak. Some methods of speeding up the convergence of iteration methods.Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964. 32

  26. [26]

    Nesterov

    Y.E. Nesterov. A method for solving the convex programming problem with convergence rateo(1/k 2). Doklady Akademii Nauk SSSR, 269(3):543–547, 1983

  27. [27]

    The heavy ball with friction method, i

    Hedy Attouch, Xavier Goudou, and Patrick Redont. The heavy ball with friction method, i. the contin- uous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system.Communications in Contemporary Mathematics, 2(01):1–34, 2000

  28. [28]

    A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights.Journal of Machine Learning Research, 17(153): 1–43, 2016

    Weijie Su, Stephen Boyd, and Emmanuel J Candes. A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights.Journal of Machine Learning Research, 17(153): 1–43, 2016

  29. [29]

    Fast convergence of inertial dynam- ics and algorithms with asymptotic vanishing viscosity.Mathematical Programming, 168(1):123–175, 2018

    Hedy Attouch, Zaki Chbani, Juan Peypouquet, and Patrick Redont. Fast convergence of inertial dynam- ics and algorithms with asymptotic vanishing viscosity.Mathematical Programming, 168(1):123–175, 2018

  30. [30]

    A second-order gradient-like dis- sipative dynamical system with hessian-driven damping.: Application to optimization and mechanics

    Felipe Alvarez, Hedy Attouch, J´ erˆ ome Bolte, and Patrick Redont. A second-order gradient-like dis- sipative dynamical system with hessian-driven damping.: Application to optimization and mechanics. Journal de math´ ematiques pures et appliqu´ ees, 81(8):747–779, 2002

  31. [31]

    Fast optimization via inertial dynamics with closed-loop damping.Journal of the European Mathematical Society, 25(5):1985–2056, 2022

    Hedy Attouch, Radu Ioan Bot ¸, and Ern¨ o Robert Csetnek. Fast optimization via inertial dynamics with closed-loop damping.Journal of the European Mathematical Society, 25(5):1985–2056, 2022

  32. [32]

    Newton-like inertial dynamics and proximal algorithms gov- erned by maximally monotone operators.SIAM Journal on Optimization, 30(4):3252–3283, 2020

    Hedy Attouch and Szil´ ard Csaba L´ aszl´ o. Newton-like inertial dynamics and proximal algorithms gov- erned by maximally monotone operators.SIAM Journal on Optimization, 30(4):3252–3283, 2020

  33. [33]

    Fast convergence of inertial dynamics with hessian-driven damping under geometry assumptions.Applied Mathematics & Optimization, 88(3):81, 2023

    Jean-Fran¸ cois Aujol, Charles Dossal, Van Hao Ho` ang, Hippolyte Labarri` ere, and Aude Rondepierre. Fast convergence of inertial dynamics with hessian-driven damping under geometry assumptions.Applied Mathematics & Optimization, 88(3):81, 2023

  34. [34]

    Optimization with momentum: Dynamical, control- theoretic, and symplectic perspectives.Journal of Machine Learning Research, 22(73):1–50, 2021

    Michael Muehlebach and Michael I Jordan. Optimization with momentum: Dynamical, control- theoretic, and symplectic perspectives.Journal of Machine Learning Research, 22(73):1–50, 2021

  35. [35]

    Acceleration via symplectic discretization of high-resolution differential equations.Advances in Neural Information Processing Systems, 32, 2019

    Bin Shi, Simon S Du, Weijie Su, and Michael I Jordan. Acceleration via symplectic discretization of high-resolution differential equations.Advances in Neural Information Processing Systems, 32, 2019

  36. [36]

    Regularized nonlinear acceleration.Advances In Neural Information Processing Systems, 29, 2016

    Damien Scieur, Alexandre d’Aspremont, and Francis Bach. Regularized nonlinear acceleration.Advances In Neural Information Processing Systems, 29, 2016

  37. [37]

    Analysis of optimiza- tion algorithms via integral quadratic constraints: Nonstrongly convex problems.SIAM Journal on Optimization, 28(3):2654–2689, 2018

    Mahyar Fazlyab, Alejandro Ribeiro, Manfred Morari, and Victor M Preciado. Analysis of optimiza- tion algorithms via integral quadratic constraints: Nonstrongly convex problems.SIAM Journal on Optimization, 28(3):2654–2689, 2018

  38. [38]

    Optimal convergence rates for nesterov acceleration.SIAM Journal on Optimization, 29(4):3131–3153, 2019

    Jean-Francois Aujol, Charles Dossal, and Aude Rondepierre. Optimal convergence rates for nesterov acceleration.SIAM Journal on Optimization, 29(4):3131–3153, 2019

  39. [39]

    Imex-rb: A self-adaptive imex time integration scheme exploiting the rb method.arXiv preprint arXiv:2506.16470, 2025

    Micol Bassanini, Simone Deparis, Francesco Sala, and Riccardo Tenderini. Imex-rb: A self-adaptive imex time integration scheme exploiting the rb method.arXiv preprint arXiv:2506.16470, 2025

  40. [40]

    Springer, 2006

    Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

  41. [41]

    Quantitative spectral gaps for hypoelliptic stochastic differential equations with small noise.Probab

    Jacob Bedrossian and Kyle Liss. Quantitative spectral gaps for hypoelliptic stochastic differential equations with small noise.Probab. Math. Phys., 2(3):477–532, 2021. ISSN 2690-0998,2690-1005. doi: 10.2140/pmp.2021.2.477. URLhttps://doi.org/10.2140/pmp.2021.2.477. 33 Appendix A Proof of the SBIM Nesterov Scheme Proof of Theorem 2.1 Proof.By the definitio...