On The Mathematics of the Natural Physics of Optimization

I. M. Ross

arxiv: 2604.17645 · v1 · submitted 2026-04-19 · 🧮 math.OC · cs.AI· cs.LG· cs.NA· math-ph· math.MP· math.NA

On The Mathematics of the Natural Physics of Optimization

I. M. Ross This is my paper

Pith reviewed 2026-05-10 05:12 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LGcs.NAmath-phmath.MPmath.NA

keywords optimization algorithmsoptimal controlPontryagin minimum principleKKT conditionsvector fieldsHamilton-Jacobi inequalityinverse optimizationLyapunov function

0 comments

The pith

Optimization problems generate natural vector fields in hidden space by equating optimal control transversality to generalized KKT conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that optimization algorithms follow universal non-Newtonian dynamics derived from physics-like laws. Equating the terminal conditions of an optimal control problem to the generalized KKT conditions of any constrained optimization problem turns the problem data into a vector field that fills a hidden space with optimality information. A Pontryagin-type minimum principle then acts at a distance to convert local moves into global results through a Hamilton-Jacobi inequality. Algorithms emerge when control jumps dissipate quantized energy measured by a search Lyapunov function. This single framework accounts for many existing methods and produces new ones.

Core claim

By equating the terminal transversality conditions of an optimal control problem to the generalized Karush/John-Kuhn-Tucker conditions of an optimization problem, the data functions of a given constrained optimization problem generate a natural vector field that permeates an entire hidden space with information on the optimality conditions. An action-at-a-distance operation via a Pontryagin-type minimum principle produces a local action to deliver a globalized result by way of a Hamilton-Jacobi inequality. An inverse-optimal algorithm is generated by performing control jumps that dissipate quantized energy defined by a search Lyapunov function.

What carries the argument

The equivalence between terminal transversality conditions and generalized KKT conditions that creates the natural vector field and permits application of the Pontryagin minimum principle.

If this is right

Many known optimization algorithms can be derived and explained as special cases of the same vector-field dynamics.
New inverse-optimal algorithms arise by selecting different quantized energy dissipation rules.
The Hamilton-Jacobi inequality supplies a certificate that local control jumps achieve global optimality.
The hidden-space vector field encodes all optimality information without needing explicit search over the original feasible set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same construction might be applied to discrete or combinatorial problems by replacing continuous control with jump dynamics on a suitable lattice.
If the vector field can be computed explicitly, it offers a way to visualize or approximate the entire optimality landscape before any iteration begins.
The approach suggests treating algorithm design as choosing a control law rather than tuning heuristic parameters.

Load-bearing premise

The terminal transversality conditions of an optimal control problem can be directly equated to the generalized Karush/John-Kuhn-Tucker conditions to generate a meaningful natural vector field and inverse-optimal algorithms for arbitrary optimization problems.

What would settle it

Construct the vector field for a simple nonlinear program with known KKT points and check whether the minimum-principle jumps fail to reach those points or violate the original constraints.

read the original abstract

A number of optimization algorithms have been inspired by the physics of Newtonian motion. Here, we ask the question: do algorithms themselves obey some ``natural laws of motion,'' and can they be derived by an application of these laws? We explore this question by positing the theory that optimization algorithms may be considered as some manifestation of hidden algorithm primitives that obey certain universal non-Newtonian dynamics. This natural physics of optimization is developed by equating the terminal transversality conditions of an optimal control problem to the generalized Karush/John-Kuhn-Tucker conditions of an optimization problem. Through this equivalence formulation, the data functions of a given constrained optimization problem generate a natural vector field that permeates an entire hidden space with information on the optimality conditions. An ``action-at-a-distance'' operation via a Pontryagin-type minimum principle produces a local action to deliver a globalized result by way of a Hamilton-Jacobi inequality. An inverse-optimal algorithm is generated by performing control jumps that dissipate quantized ``energy'' defined by a search Lyapunov function. Illustrative applications of the proposed theory show that a large number of algorithms can be generated and explained in terms of the new mathematical physics of optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ross frames optimization as obeying natural non-Newtonian dynamics by equating transversality conditions to KKT, but the mapping lacks a clear general derivation and risks circularity.

read the letter

The core claim is that optimization algorithms can be derived from a physics-like equivalence between optimal control transversality conditions and generalized KKT conditions. This is supposed to produce a natural vector field in a hidden space, which then yields algorithms through a Pontryagin-style minimum principle and quantized energy dissipation via a Lyapunov function. The abstract presents this as a way to generate and explain many existing methods under one framework.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a theory of the 'natural physics of optimization' in which algorithms are manifestations of hidden algorithm primitives obeying universal non-Newtonian dynamics. The central construction equates the terminal transversality conditions of an optimal control problem (with free terminal state) to the generalized Karush/John-Kuhn-Tucker stationarity conditions of a constrained optimization problem. This equivalence is asserted to generate a natural vector field over a hidden space; a Pontryagin-type minimum principle combined with a Hamilton-Jacobi inequality then yields inverse-optimal algorithms realized by control jumps that dissipate quantized energy levels defined from a search Lyapunov function. Illustrative applications are said to show that a large number of existing algorithms can be generated and explained within the framework.

Significance. If a structure-preserving, non-circular mapping between transversality and generalized KKT conditions can be established for arbitrary nonlinear problems and if the resulting vector field reproduces known convergence rates without extra Lyapunov assumptions, the work could supply a unifying optimal-control lens on algorithm design. The explicit use of quantized energy dissipation and Hamilton-Jacobi inequalities offers a potentially falsifiable route to new algorithms, but the abstract supplies no derivations, counter-examples, or independent benchmarks that would allow assessment of whether the framework adds predictive power beyond re-description of optimality conditions.

major comments (3)

Abstract: the asserted equivalence between terminal transversality conditions of an optimal-control problem and generalized KKT conditions is presented without an explicit structure-preserving mapping or derivation for general nonlinear objectives and constraints; without this step the subsequent natural vector field and inverse-optimal construction risk being circular, as both the control embedding and the quantized energy are defined from the same stationarity conditions the theory seeks to explain.
Abstract: the claim that 'data functions generate a natural vector field that permeates an entire hidden space' is not accompanied by a concrete construction, coordinate chart, or verification that the vector field reproduces standard convergence behavior (e.g., linear rates for strongly convex problems) without additional assumptions on the search Lyapunov function.
Abstract: the 'action-at-a-distance' operation via a Pontryagin-type minimum principle plus Hamilton-Jacobi inequality is invoked to produce local actions that deliver globalized results, yet no explicit statement of the resulting Hamilton-Jacobi inequality or the quantization rule for energy levels is supplied, preventing verification that the procedure is independent of the target algorithm.

minor comments (2)

Abstract: the phrase 'hidden algorithm primitives' is introduced without a formal definition or relation to existing concepts such as state-space embeddings or lifted dynamical systems.
Abstract: the manuscript would benefit from a single concrete example (even a low-dimensional quadratic program) showing the explicit mapping from KKT conditions to the vector field and the first control jump, to make the central construction accessible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications drawn from the full development in the paper while indicating the revisions we will make to improve the abstract's explicitness.

read point-by-point responses

Referee: Abstract: the asserted equivalence between terminal transversality conditions of an optimal-control problem and generalized KKT conditions is presented without an explicit structure-preserving mapping or derivation for general nonlinear objectives and constraints; without this step the subsequent natural vector field and inverse-optimal construction risk being circular, as both the control embedding and the quantized energy are defined from the same stationarity conditions the theory seeks to explain.

Authors: The full manuscript constructs the equivalence via a structure-preserving mapping that directly equates the terminal transversality conditions (for free terminal state) to the generalized KKT stationarity conditions for arbitrary nonlinear objectives and constraints. This mapping generates the natural vector field from the problem data functions independently of the subsequent control embedding; the quantized energy is then defined separately from a search Lyapunov function to drive the jumps. The construction is therefore not circular. We agree, however, that the abstract presents the equivalence concisely without outlining the mapping. We will revise the abstract to include a brief statement of the mapping and derivation for general nonlinear cases, with a pointer to the detailed development in the body. revision: yes
Referee: Abstract: the claim that 'data functions generate a natural vector field that permeates an entire hidden space' is not accompanied by a concrete construction, coordinate chart, or verification that the vector field reproduces standard convergence behavior (e.g., linear rates for strongly convex problems) without additional assumptions on the search Lyapunov function.

Authors: The manuscript supplies the concrete construction of the vector field by embedding the optimization data functions into the hidden space through the transversality-KKT equivalence, together with verification that it recovers standard convergence rates (including linear rates for strongly convex problems) using only the baseline search Lyapunov function and no extra assumptions. The abstract summarizes the claim without these details. We will revise the abstract to note the construction method and the verification via illustrative applications, making the independence from additional Lyapunov assumptions explicit. revision: yes
Referee: Abstract: the 'action-at-a-distance' operation via a Pontryagin-type minimum principle plus Hamilton-Jacobi inequality is invoked to produce local actions that deliver globalized results, yet no explicit statement of the resulting Hamilton-Jacobi inequality or the quantization rule for energy levels is supplied, preventing verification that the procedure is independent of the target algorithm.

Authors: The manuscript derives the Hamilton-Jacobi inequality from the Pontryagin minimum principle applied to the hidden-space dynamics and defines the quantization rule as discrete decrements of the Lyapunov energy levels realized by the control jumps. The resulting procedure is general and independent of any particular target algorithm, as shown by the generation of multiple distinct algorithms from the same principles. We acknowledge that the abstract invokes these elements without stating the inequality or rule explicitly. We will revise the abstract to include concise statements of both, enabling direct verification of generality. revision: yes

Circularity Check

2 steps flagged

Equating transversality conditions to generalized KKT forms the foundational equivalence without independent mapping

specific steps

self definitional [Abstract]
"This natural physics of optimization is developed by equating the terminal transversality conditions of an optimal control problem to the generalized Karush/John-Kuhn-Tucker conditions of an optimization problem. Through this equivalence formulation, the data functions of a given constrained optimization problem generate a natural vector field that permeates an entire hidden space with information on the optimality conditions."

The 'natural vector field' and 'natural physics' are defined by the act of equating the two sets of optimality conditions; the subsequent claims that this field 'permeates' the space with optimality information and enables inverse-optimal algorithms via Pontryagin/Hamilton-Jacobi therefore rest on the same KKT data that any correct algorithm must already satisfy, rendering the derivation equivalent to its input by construction.
fitted input called prediction [Abstract]
"An inverse-optimal algorithm is generated by performing control jumps that dissipate quantized 'energy' defined by a search Lyapunov function."

The search Lyapunov function is the standard descent function whose decrease encodes progress toward KKT satisfaction; defining quantized energy dissipation in terms of this function and then presenting the resulting jumps as a 'prediction' of algorithm behavior forces the output to reproduce known convergence properties of the original optimization problem.

full rationale

The paper's central construction begins by positing an equivalence between terminal transversality conditions (from an optimal control formulation) and generalized KKT conditions of the target optimization problem. This equivalence is used to define the 'natural vector field' and the subsequent Pontryagin minimum principle plus Hamilton-Jacobi inequality that generate inverse-optimal algorithms. Because the equivalence is asserted rather than derived from a structure-preserving embedding that holds for arbitrary nonlinear problems, the resulting 'natural physics' and quantized energy dissipation are built directly from the same stationarity conditions the algorithms are designed to satisfy. This reduces the claimed first-principles derivation to a control-theoretic re-description of KKT stationarity, with the Lyapunov function and control jumps inheriting the same optimality information. No external benchmark or falsifiable prediction independent of the KKT input is exhibited in the provided text, producing partial circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The theory rests on the posited equivalence between transversality and KKT conditions, the existence of a hidden space vector field, and the definition of quantized energy via a search Lyapunov function; these are introduced without independent derivation in the abstract.

free parameters (1)

quantized energy levels
Defined via the search Lyapunov function and control jumps; specific discretization appears chosen to recover known algorithms.

axioms (2)

domain assumption Terminal transversality conditions of an optimal control problem are equivalent to generalized KKT conditions of a constrained optimization problem
Invoked as the foundation that generates the natural vector field.
domain assumption Pontryagin-type minimum principle applies in the hidden space to produce local actions from global information
Used to convert the vector field into algorithmic steps.

invented entities (2)

hidden algorithm primitives no independent evidence
purpose: Manifestations of optimization algorithms that obey universal non-Newtonian dynamics
Introduced to frame algorithms as physical objects; no independent evidence provided.
natural vector field no independent evidence
purpose: Permeates hidden space with optimality information generated from problem data functions
Core construct arising from the equivalence; no external verification shown.

pith-pipeline@v0.9.0 · 5515 in / 1640 out tokens · 42144 ms · 2026-05-10T05:12:44.421785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

[1]

F. H. Clarke, Functional Analysis, Calculus of Variations and Optimal Control, Springer-Verlag, London, 2013

work page 2013
[2]

B. S. Mordukhovich, Variational Analysis and Generalized Differentiation, I: Basic Theory, Grundlehren Math. Wiss. 330, Springer, Berlin, 2006

work page 2006
[3]

R. B. Vinter, Optimal Control, Birkh ¨auser, Boston, 2000

work page 2000
[4]

F. H. Clarke, Optimization and Nonsmooth Analysis, SIAM, Philadelphia, 1990

work page 1990
[5]

M. S. Bazaraa, H. D. Sherali, C. M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley-Inter- science, New York, 2006

work page 2006
[6]

Nocedal, S

J. Nocedal, S. Wright, Numerical Optimization, Springer, 2006

work page 2006
[7]

D. G. Luenberger, Y . Ye, Linear and Nonlinear Programming, Springer, 2008

work page 2008
[8]

I. M. Ross, An optimal control theory for nonlinear optimization, J. Comput. Appl. Math. 354 (2019), 39–51

work page 2019
[9]

I. M. Ross, Generating Nesterov’s accelerated gradient algorithm by using optimal control theory for opti- mization, J. Comput. Appl. Math. 423 (2023), 114968

work page 2023
[10]

I. M. Ross, Derivation of coordinate descent algorithms from optimal control theory, Oper. Res. Forum 4 (2023), 31

work page 2023
[11]

S. T. Glad, Robustness of nonlinear state feedback – a survey, Automatica, 23 (1987), 425–445

work page 1987
[12]

R. A. Freeman, P. V . Kokotovic, Inverse optimality in robost stabilization, SIAM J. Control Optim. 34 (1996), 1365–1391

work page 1996
[13]

F. H. Clarke, Yu. S. Ledyaev, A. I. Subbotin, Universal feedback control via proximal aiming in problems of control under disturbances and differential games, Univ. de Montr´eal, Report CRM 2386, 1994

work page 1994
[14]

F. H. Clarke, Lyapunov functions and feedback in nonlinear control, In: M.S. de Queiroz, M. Malisoff, P. Wolenski (eds) Optimal control, stabilization and nonsmooth analysis. Lecture Notes in Control and Infor- mation Science, vol 301. Springer, Berlin, Heidelberg (2004), 267–282

work page 2004
[15]

M. K. Gavurin, Nonlinear functional equations and continuous analogues of iteration methods, Izv. Vyssh. Uchebn. Zaved. Mat., 5 (1958) 18–31

work page 1958
[16]

Smale, A convergent process of price adjustment and global Newton methods, J

S. Smale, A convergent process of price adjustment and global Newton methods, J. mathematical economics, 3 (1976), 107–120

work page 1976
[17]

H. B. Curry, The method of steepest descent for non-linear minimization problems, J. Quart. Appl. Math. 2 (1944), 258–261

work page 1944
[18]

Lemar ´echal, Cauchy and the gradient method, Documenta Mathematica, Extra V olume: ISMP

C. Lemar ´echal, Cauchy and the gradient method, Documenta Mathematica, Extra V olume: ISMP. (2012), 251–254

work page 2012
[19]

P. T. Boggs, The solution of nonlinear system of equations byA-stable integration techniques, SIAM J. Numer. Anal. 8 (1971), 767–785

work page 1971
[20]

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Math. and Math. Phys., 4/5 (1964) 1–17 (Translated by H. F. Cleaves)

work page 1964
[21]

A. A. Brown, M. C. Bartholomew-Biggs, Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations, J. optimization theory and applications, 62/2 (1989) 211–224

work page 1989
[22]

Yamashita, A differential equation approach to nonlinear programming, Mathematical Programming, 18 (1980), 155–168

H. Yamashita, A differential equation approach to nonlinear programming, Mathematical Programming, 18 (1980), 155–168

work page 1980
[23]

D. M. Murray, S. J. Yakowitz, The application of optimal control methodology to nonlinear programming problems. Math. Programming, 21/3 (1981), 331–347

work page 1981
[24]

A. A. Brown, M. C. Bartholomew-Biggs, ODE versus SQP methods for constrained optimization, J. opti- mization theory and applications, 62/3 (1989) 371–386

work page 1989
[25]

Evtushenko, V .G

Yu.G. Evtushenko, V .G. Zhadan, Stable barrier-projection and barrier-Newton methods in nonlinear program- ming, Optim. Methods Software, 3 (1994), 237–256. THE NATURAL PHYSICS OF OPTIMIZATION 25

work page 1994
[26]

Bhaya, E

A. Bhaya, E. Kaszkurewicz, Control Perspectives on Numerical Algorithms and Matrix Problems, Advances in Design and Control, SIAM, Philadelphia, PA, 2006

work page 2006
[27]

L. Zhou, Y . Wu, L. Zhang, G. Zhang, Convergence analysis of a differential equation approach for solving nonlinear programming problems, Appl. Math. Comput., 184 (2007), 789–797

work page 2007
[28]

Karafyllis, M

I. Karafyllis, M. Krstic, Global dynamical solvers for nonlinear programming problems, SIAM J. Control and Optimization, 55/2 (2017), 1302–1331

work page 2017
[29]

W. Su, S. Boyd, E. J. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, J. Mach. Learn. Res. 17 (2016) 1–43

work page 2016
[30]

Wibisono, A

A. Wibisono, A. C. Wilson, M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. of the National Academy of Sciences 113.47 (2016): E7351-E7358

work page 2016
[31]

Lessard, B

L. Lessard, B. Recht, A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, 26/1 (2016) 57–95

work page 2016
[32]

Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM Journal on Optimization 29.1 (2019), 660-689

J, Diakonikolas, L. Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM Journal on Optimization 29.1 (2019), 660-689

work page 2019
[33]

A. C. Wilson, B. Recht, M. I. Jordan, A Lyapunov analysis of accelerated methods in optimization, J. of Machine Learning Research 22 (2021), 1–34

work page 2021
[34]

M. Even, R. Berthier, F. Bach, N. Flammarion, H. Hendrikx, P. Gaillard, L. Massouli ´e, A. Taylor, A con- tinuized view on Nesterov acceleration for stochastic gradient descent and randomized gossip, Proc. NeurIPS 2149 (2021), 28054–28066

work page 2021
[35]

Diakonikolas, M

J. Diakonikolas, M. I. Jordan, Generalized momentum-based methods: A Hamiltonian perspective, SIAM J. on Optimization 31.1 (2021), 915–944

work page 2021
[36]

Guilherme, M

F. Guilherme, M. I. Jordan, R. Vidal, On dissipative symplectic integration with applications to gradient-based optimization, J. of Statistical Mechanics: Theory and Experiment, 2021.4 (2021), 043402

work page 2021
[37]

B. Shi, S. S. Du, M. I. Jordan, W. J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, Math. Prog. 195 (2022), 79–148

work page 2022
[38]

Bengio, Y

Y . Bengio, Y . LeCun, G. Hinton, Deep learning, Nature, 521 (2015), 436–444

work page 2015
[39]

Bottou, F

L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, 60/2 (2018), 223–311

work page 2018
[40]

Yu. E. Nesterov, A method of solving a convex programming problem with convergence rateO(1/k2), Soviet Math. Dokl., 27/2 (1983) 371–376 (Translated by A. Rosa)

work page 1983
[41]

I. M. Ross, A Primer on Pontryagin’s Principle in Optimal Control, Collegiate Publishers, San Francisco, 2015

work page 2015
[42]

I. M. Ross, An optimal control theory for accelerated optimization, doi = 10.48550/arxiv. 1902.09004, https://arxiv.org/abs/1902.09004

work page internal anchor Pith review doi:10.48550/arxiv 1902
[43]

E. D. Sontag, Mathematical Control Theory: Deterministic Finite Dimensional Systems, Springer, 1998

work page 1998
[44]

Benzi, G

M. Benzi, G. H. Golub, J. Liesen, Numerical solution of saddle point problems, Acta Numerica (2005) 1–137

work page 2005
[45]

Benzi, V

M. Benzi, V . Simoncini, On the eigenvalues of a class of saddle point matrices, Numer. Math. 103, (2006) 173–196

work page 2006
[46]

Liesen, B

J. Liesen, B. N. Parlett, On nonsymmetric saddle point matrices that allow conjugate gradient iterations, Numerische Mathematik (2008) 605–624

work page 2008
[47]

W. M. Haddad, A. L’Afflitto, Finite-time partial stability and stabilization, and optimal feedback control, J. Franklin Institute, 352 (2015), 2329–2357

work page 2015
[48]

A. L. Zuev, Stabilization of non-autonomous systems with respect to a part of the variables by means of controlled Lyapunov functions, J. Automation and Information Sciences, 32/10 (2000), 18–25

work page 2000
[49]

V . I. V orotnikov, Partial stability, stabilization and control: some recent results, 15th IFAC World Congress, Barcelona, Spain, July 2002

work page 2002
[50]

Chellabonia, W

V . Chellabonia, W. M. Haddad, A unification between partial stability and stability theory for time-varying systems, IEEE Control Systems Magazine, December (2002), 66–75

work page 2002
[51]

Jammazi, Continuous and discontinuous homogeneous feedbacks finite-time partially stabilizing control- lable multichained systems, SIAM J

C. Jammazi, Continuous and discontinuous homogeneous feedbacks finite-time partially stabilizing control- lable multichained systems, SIAM J. Control Optim. 52/1 (2014) 520–544

work page 2014
[52]

Clarke, Discontinuous feedback and nonlinear systems, Proc

F. Clarke, Discontinuous feedback and nonlinear systems, Proc. IFAC conference on nonlinear control (NOL- COS), Bologna (2010) 1–29. 26 I. M. ROSS

work page 2010
[53]

Osinenko, P

P. Osinenko, P. Schmidt, S. Streif, Nonsmooth stabilization and its computational aspects, IFAC PapersOn- Line, 53-2 (2020) 6370–6377

work page 2020
[54]

S. P. Bhat, D. S. Bernstein, Finite-time stability of continuous autonomous systems, SIAM J. Control Optim., 38/3 (2000) 751–766

work page 2000
[55]

Polyakov, Discontinuous Lyapunov functions for nonasymptotic stability analysis, Proc

A. Polyakov, Discontinuous Lyapunov functions for nonasymptotic stability analysis, Proc. 19th World Con- gress, IFAC, Cape Town, South Africa (2014) 5455–5460

work page 2014
[56]

S. R. Bernfeld, V . Lakshmikantham, Practical stability and Lyapunov functions, T ˆohoku Math. Journ. 32 (1980), 607–613

work page 1980
[57]

A. A. Martynyuk, On practical stability and optimal stabilization of controlled motion, Banach Center Publi- cations 14.1 (1985), 383–400

work page 1985
[58]

Hairer, S

E. Hairer, S. P. Nørsett, G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, Springer- Verlag, 1993

work page 1993
[59]

Y . Liu, C. Lageman, B. D.O. Anderson, G. Shi, An Arrow-Hurwicz-Uzawa type flow as least squares solver for network linear equations, Automatic, 100 (2019), 187–193

work page 2019
[60]

Feijer, F

D. Feijer, F. Paganini, Stability of primal-dual gradient dynamics and applications to network optimization, 46 (2010) 1974–1981

work page 2010
[61]

B. He, S. Xu, X. Yuan, On convergence of the Arrow-Hurwicz method for saddle point problems, J. Mathe- matical Imaging and Vision 64 (2022), 662–671

work page 2022
[62]

W. C. Davidon, Variable metric method for minimization, SIAM J. optimization, 1/1 (1991), 1–17 (originally published as Argonne National Laboratory Research and Development Report 5990, May 1959; revised November 1959)

work page 1991
[63]

Moulay, V

E. Moulay, V . L´echapp´e, F. Plestan, Properties of the sign gradient descent algorithms, Information Sciences, 492 (2019), 29–39

work page 2019
[64]

Bernstein, Yu-X

J. Bernstein, Yu-X. Wang, K. Azizzadenesheli, A. Anandkumar, Compression by the signs: distributed learn- ing is a two-way street, 6th International Conference on Learning Representations, (2018), 1–6

work page 2018
[65]

Pandey, M

M. Pandey, M. Fernandez, F. Gentile, F. et al., The transformational role of GPU computing and deep learning in drug discovery, Nat. Mach. Intell. 4 (2022) 211–221

work page 2022
[66]

Nutini, M

J. Nutini, M. Schmidt, I. H. Laradji, M. Friedlander, H. Koepke, Coordinate descent converges faster with the Gauss-Southwell rule than random selection, Proc. 32nd Inter. Conf. Machine Learning, Lille, France, 37 (2015) 1632–1641

work page 2015
[67]

Wolfson-Pou, E

J. Wolfson-Pou, E. Chow, Distributed Southwell: An iterative method with low communication costs. 2017, Proc. SC17, Denver, CO, USA, November 12-17, 2017

work page 2017
[68]

Q. T. Dinh, M. Diehl, Local convergence of sequential convex programming for nonconvex optimization, Recent Advances in Optimization and its Applications in Engineering, Springer, Berlin, Heidelberg, 2010

work page 2010
[69]

Messerer, K

F. Messerer, K. Baumg ¨artner, M. Diehl, Survey of sequential convex programming and generalized Gauss- Newton methods, ESAIM: ProcS 71 (2021), 64–88

work page 2021
[70]

Kheirandishfard, F

M. Kheirandishfard, F. Zohrizadeh, S. R. Alimo, F. Kamangar, R. Madani, Sequential convex programming revisited, Proc. 60th IEEE CDC, 2021, 3137–3142,

work page 2021
[71]

B. S. Mordukhovich, R. T. Rockafellar, Second-order subdifferential calculus with applications to tilt stability in optimization, SIAM J. Optim. 22/3 (2012), 953–986

work page 2012
[72]

R. T. Rockafellar, R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin, 2009

work page 2009
[73]

W. P. Schleich, D. M. Greenberger, D. H. Kobe, M. O. Scully, Schr ¨odinger equation revisited, Proc. Natl. Acad. Sci. U.S.A., 110 (2013), 5374–5379

work page 2013
[74]

J. H. Field, Derivation of the Schr ¨odinger equation from the Hamilton-Jacobi equation in Feynman’s path integral formulation of quantum mechanics, Eur. J. Phys. 32 (2011) 63–87

work page 2011

[1] [1]

F. H. Clarke, Functional Analysis, Calculus of Variations and Optimal Control, Springer-Verlag, London, 2013

work page 2013

[2] [2]

B. S. Mordukhovich, Variational Analysis and Generalized Differentiation, I: Basic Theory, Grundlehren Math. Wiss. 330, Springer, Berlin, 2006

work page 2006

[3] [3]

R. B. Vinter, Optimal Control, Birkh ¨auser, Boston, 2000

work page 2000

[4] [4]

F. H. Clarke, Optimization and Nonsmooth Analysis, SIAM, Philadelphia, 1990

work page 1990

[5] [5]

M. S. Bazaraa, H. D. Sherali, C. M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley-Inter- science, New York, 2006

work page 2006

[6] [6]

Nocedal, S

J. Nocedal, S. Wright, Numerical Optimization, Springer, 2006

work page 2006

[7] [7]

D. G. Luenberger, Y . Ye, Linear and Nonlinear Programming, Springer, 2008

work page 2008

[8] [8]

I. M. Ross, An optimal control theory for nonlinear optimization, J. Comput. Appl. Math. 354 (2019), 39–51

work page 2019

[9] [9]

I. M. Ross, Generating Nesterov’s accelerated gradient algorithm by using optimal control theory for opti- mization, J. Comput. Appl. Math. 423 (2023), 114968

work page 2023

[10] [10]

I. M. Ross, Derivation of coordinate descent algorithms from optimal control theory, Oper. Res. Forum 4 (2023), 31

work page 2023

[11] [11]

S. T. Glad, Robustness of nonlinear state feedback – a survey, Automatica, 23 (1987), 425–445

work page 1987

[12] [12]

R. A. Freeman, P. V . Kokotovic, Inverse optimality in robost stabilization, SIAM J. Control Optim. 34 (1996), 1365–1391

work page 1996

[13] [13]

F. H. Clarke, Yu. S. Ledyaev, A. I. Subbotin, Universal feedback control via proximal aiming in problems of control under disturbances and differential games, Univ. de Montr´eal, Report CRM 2386, 1994

work page 1994

[14] [14]

F. H. Clarke, Lyapunov functions and feedback in nonlinear control, In: M.S. de Queiroz, M. Malisoff, P. Wolenski (eds) Optimal control, stabilization and nonsmooth analysis. Lecture Notes in Control and Infor- mation Science, vol 301. Springer, Berlin, Heidelberg (2004), 267–282

work page 2004

[15] [15]

M. K. Gavurin, Nonlinear functional equations and continuous analogues of iteration methods, Izv. Vyssh. Uchebn. Zaved. Mat., 5 (1958) 18–31

work page 1958

[16] [16]

Smale, A convergent process of price adjustment and global Newton methods, J

S. Smale, A convergent process of price adjustment and global Newton methods, J. mathematical economics, 3 (1976), 107–120

work page 1976

[17] [17]

H. B. Curry, The method of steepest descent for non-linear minimization problems, J. Quart. Appl. Math. 2 (1944), 258–261

work page 1944

[18] [18]

Lemar ´echal, Cauchy and the gradient method, Documenta Mathematica, Extra V olume: ISMP

C. Lemar ´echal, Cauchy and the gradient method, Documenta Mathematica, Extra V olume: ISMP. (2012), 251–254

work page 2012

[19] [19]

P. T. Boggs, The solution of nonlinear system of equations byA-stable integration techniques, SIAM J. Numer. Anal. 8 (1971), 767–785

work page 1971

[20] [20]

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Math. and Math. Phys., 4/5 (1964) 1–17 (Translated by H. F. Cleaves)

work page 1964

[21] [21]

A. A. Brown, M. C. Bartholomew-Biggs, Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations, J. optimization theory and applications, 62/2 (1989) 211–224

work page 1989

[22] [22]

Yamashita, A differential equation approach to nonlinear programming, Mathematical Programming, 18 (1980), 155–168

H. Yamashita, A differential equation approach to nonlinear programming, Mathematical Programming, 18 (1980), 155–168

work page 1980

[23] [23]

D. M. Murray, S. J. Yakowitz, The application of optimal control methodology to nonlinear programming problems. Math. Programming, 21/3 (1981), 331–347

work page 1981

[24] [24]

A. A. Brown, M. C. Bartholomew-Biggs, ODE versus SQP methods for constrained optimization, J. opti- mization theory and applications, 62/3 (1989) 371–386

work page 1989

[25] [25]

Evtushenko, V .G

Yu.G. Evtushenko, V .G. Zhadan, Stable barrier-projection and barrier-Newton methods in nonlinear program- ming, Optim. Methods Software, 3 (1994), 237–256. THE NATURAL PHYSICS OF OPTIMIZATION 25

work page 1994

[26] [26]

Bhaya, E

A. Bhaya, E. Kaszkurewicz, Control Perspectives on Numerical Algorithms and Matrix Problems, Advances in Design and Control, SIAM, Philadelphia, PA, 2006

work page 2006

[27] [27]

L. Zhou, Y . Wu, L. Zhang, G. Zhang, Convergence analysis of a differential equation approach for solving nonlinear programming problems, Appl. Math. Comput., 184 (2007), 789–797

work page 2007

[28] [28]

Karafyllis, M

I. Karafyllis, M. Krstic, Global dynamical solvers for nonlinear programming problems, SIAM J. Control and Optimization, 55/2 (2017), 1302–1331

work page 2017

[29] [29]

W. Su, S. Boyd, E. J. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, J. Mach. Learn. Res. 17 (2016) 1–43

work page 2016

[30] [30]

Wibisono, A

A. Wibisono, A. C. Wilson, M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. of the National Academy of Sciences 113.47 (2016): E7351-E7358

work page 2016

[31] [31]

Lessard, B

L. Lessard, B. Recht, A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, 26/1 (2016) 57–95

work page 2016

[32] [32]

Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM Journal on Optimization 29.1 (2019), 660-689

J, Diakonikolas, L. Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM Journal on Optimization 29.1 (2019), 660-689

work page 2019

[33] [33]

A. C. Wilson, B. Recht, M. I. Jordan, A Lyapunov analysis of accelerated methods in optimization, J. of Machine Learning Research 22 (2021), 1–34

work page 2021

[34] [34]

M. Even, R. Berthier, F. Bach, N. Flammarion, H. Hendrikx, P. Gaillard, L. Massouli ´e, A. Taylor, A con- tinuized view on Nesterov acceleration for stochastic gradient descent and randomized gossip, Proc. NeurIPS 2149 (2021), 28054–28066

work page 2021

[35] [35]

Diakonikolas, M

J. Diakonikolas, M. I. Jordan, Generalized momentum-based methods: A Hamiltonian perspective, SIAM J. on Optimization 31.1 (2021), 915–944

work page 2021

[36] [36]

Guilherme, M

F. Guilherme, M. I. Jordan, R. Vidal, On dissipative symplectic integration with applications to gradient-based optimization, J. of Statistical Mechanics: Theory and Experiment, 2021.4 (2021), 043402

work page 2021

[37] [37]

B. Shi, S. S. Du, M. I. Jordan, W. J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, Math. Prog. 195 (2022), 79–148

work page 2022

[38] [38]

Bengio, Y

Y . Bengio, Y . LeCun, G. Hinton, Deep learning, Nature, 521 (2015), 436–444

work page 2015

[39] [39]

Bottou, F

L. Bottou, F. E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, 60/2 (2018), 223–311

work page 2018

[40] [40]

Yu. E. Nesterov, A method of solving a convex programming problem with convergence rateO(1/k2), Soviet Math. Dokl., 27/2 (1983) 371–376 (Translated by A. Rosa)

work page 1983

[41] [41]

I. M. Ross, A Primer on Pontryagin’s Principle in Optimal Control, Collegiate Publishers, San Francisco, 2015

work page 2015

[42] [42]

I. M. Ross, An optimal control theory for accelerated optimization, doi = 10.48550/arxiv. 1902.09004, https://arxiv.org/abs/1902.09004

work page internal anchor Pith review doi:10.48550/arxiv 1902

[43] [43]

E. D. Sontag, Mathematical Control Theory: Deterministic Finite Dimensional Systems, Springer, 1998

work page 1998

[44] [44]

Benzi, G

M. Benzi, G. H. Golub, J. Liesen, Numerical solution of saddle point problems, Acta Numerica (2005) 1–137

work page 2005

[45] [45]

Benzi, V

M. Benzi, V . Simoncini, On the eigenvalues of a class of saddle point matrices, Numer. Math. 103, (2006) 173–196

work page 2006

[46] [46]

Liesen, B

J. Liesen, B. N. Parlett, On nonsymmetric saddle point matrices that allow conjugate gradient iterations, Numerische Mathematik (2008) 605–624

work page 2008

[47] [47]

W. M. Haddad, A. L’Afflitto, Finite-time partial stability and stabilization, and optimal feedback control, J. Franklin Institute, 352 (2015), 2329–2357

work page 2015

[48] [48]

A. L. Zuev, Stabilization of non-autonomous systems with respect to a part of the variables by means of controlled Lyapunov functions, J. Automation and Information Sciences, 32/10 (2000), 18–25

work page 2000

[49] [49]

V . I. V orotnikov, Partial stability, stabilization and control: some recent results, 15th IFAC World Congress, Barcelona, Spain, July 2002

work page 2002

[50] [50]

Chellabonia, W

V . Chellabonia, W. M. Haddad, A unification between partial stability and stability theory for time-varying systems, IEEE Control Systems Magazine, December (2002), 66–75

work page 2002

[51] [51]

Jammazi, Continuous and discontinuous homogeneous feedbacks finite-time partially stabilizing control- lable multichained systems, SIAM J

C. Jammazi, Continuous and discontinuous homogeneous feedbacks finite-time partially stabilizing control- lable multichained systems, SIAM J. Control Optim. 52/1 (2014) 520–544

work page 2014

[52] [52]

Clarke, Discontinuous feedback and nonlinear systems, Proc

F. Clarke, Discontinuous feedback and nonlinear systems, Proc. IFAC conference on nonlinear control (NOL- COS), Bologna (2010) 1–29. 26 I. M. ROSS

work page 2010

[53] [53]

Osinenko, P

P. Osinenko, P. Schmidt, S. Streif, Nonsmooth stabilization and its computational aspects, IFAC PapersOn- Line, 53-2 (2020) 6370–6377

work page 2020

[54] [54]

S. P. Bhat, D. S. Bernstein, Finite-time stability of continuous autonomous systems, SIAM J. Control Optim., 38/3 (2000) 751–766

work page 2000

[55] [55]

Polyakov, Discontinuous Lyapunov functions for nonasymptotic stability analysis, Proc

A. Polyakov, Discontinuous Lyapunov functions for nonasymptotic stability analysis, Proc. 19th World Con- gress, IFAC, Cape Town, South Africa (2014) 5455–5460

work page 2014

[56] [56]

S. R. Bernfeld, V . Lakshmikantham, Practical stability and Lyapunov functions, T ˆohoku Math. Journ. 32 (1980), 607–613

work page 1980

[57] [57]

A. A. Martynyuk, On practical stability and optimal stabilization of controlled motion, Banach Center Publi- cations 14.1 (1985), 383–400

work page 1985

[58] [58]

Hairer, S

E. Hairer, S. P. Nørsett, G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, Springer- Verlag, 1993

work page 1993

[59] [59]

Y . Liu, C. Lageman, B. D.O. Anderson, G. Shi, An Arrow-Hurwicz-Uzawa type flow as least squares solver for network linear equations, Automatic, 100 (2019), 187–193

work page 2019

[60] [60]

Feijer, F

D. Feijer, F. Paganini, Stability of primal-dual gradient dynamics and applications to network optimization, 46 (2010) 1974–1981

work page 2010

[61] [61]

B. He, S. Xu, X. Yuan, On convergence of the Arrow-Hurwicz method for saddle point problems, J. Mathe- matical Imaging and Vision 64 (2022), 662–671

work page 2022

[62] [62]

W. C. Davidon, Variable metric method for minimization, SIAM J. optimization, 1/1 (1991), 1–17 (originally published as Argonne National Laboratory Research and Development Report 5990, May 1959; revised November 1959)

work page 1991

[63] [63]

Moulay, V

E. Moulay, V . L´echapp´e, F. Plestan, Properties of the sign gradient descent algorithms, Information Sciences, 492 (2019), 29–39

work page 2019

[64] [64]

Bernstein, Yu-X

J. Bernstein, Yu-X. Wang, K. Azizzadenesheli, A. Anandkumar, Compression by the signs: distributed learn- ing is a two-way street, 6th International Conference on Learning Representations, (2018), 1–6

work page 2018

[65] [65]

Pandey, M

M. Pandey, M. Fernandez, F. Gentile, F. et al., The transformational role of GPU computing and deep learning in drug discovery, Nat. Mach. Intell. 4 (2022) 211–221

work page 2022

[66] [66]

Nutini, M

J. Nutini, M. Schmidt, I. H. Laradji, M. Friedlander, H. Koepke, Coordinate descent converges faster with the Gauss-Southwell rule than random selection, Proc. 32nd Inter. Conf. Machine Learning, Lille, France, 37 (2015) 1632–1641

work page 2015

[67] [67]

Wolfson-Pou, E

J. Wolfson-Pou, E. Chow, Distributed Southwell: An iterative method with low communication costs. 2017, Proc. SC17, Denver, CO, USA, November 12-17, 2017

work page 2017

[68] [68]

Q. T. Dinh, M. Diehl, Local convergence of sequential convex programming for nonconvex optimization, Recent Advances in Optimization and its Applications in Engineering, Springer, Berlin, Heidelberg, 2010

work page 2010

[69] [69]

Messerer, K

F. Messerer, K. Baumg ¨artner, M. Diehl, Survey of sequential convex programming and generalized Gauss- Newton methods, ESAIM: ProcS 71 (2021), 64–88

work page 2021

[70] [70]

Kheirandishfard, F

M. Kheirandishfard, F. Zohrizadeh, S. R. Alimo, F. Kamangar, R. Madani, Sequential convex programming revisited, Proc. 60th IEEE CDC, 2021, 3137–3142,

work page 2021

[71] [71]

B. S. Mordukhovich, R. T. Rockafellar, Second-order subdifferential calculus with applications to tilt stability in optimization, SIAM J. Optim. 22/3 (2012), 953–986

work page 2012

[72] [72]

R. T. Rockafellar, R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin, 2009

work page 2009

[73] [73]

W. P. Schleich, D. M. Greenberger, D. H. Kobe, M. O. Scully, Schr ¨odinger equation revisited, Proc. Natl. Acad. Sci. U.S.A., 110 (2013), 5374–5379

work page 2013

[74] [74]

J. H. Field, Derivation of the Schr ¨odinger equation from the Hamilton-Jacobi equation in Feynman’s path integral formulation of quantum mechanics, Eur. J. Phys. 32 (2011) 63–87

work page 2011