arxiv: 2511.20370 · v2 · submitted 2025-11-25 · 🧮 math.OC · math.DS

Nonlinearly preconditioned gradient flows

Konstantinos Oikonomidis , Alexander Bodard , Jan Quan , Panagiotis Patrinos This is my paper

Pith reviewed 2026-05-17 04:39 UTC · model grok-4.3

classification 🧮 math.OC math.DS

keywords nonlinearly preconditioned gradient methodscontinuous-time gradient flowsBregman divergencemirror descent dualityoptimal control formulationgradient dominanceLyapunov convergencenon-Euclidean optimization

0 comments

The pith

A nonlinearly preconditioned gradient flow solves an infinite-horizon optimal control problem with the Bregman divergence as its value function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines a continuous-time dynamical system that arises as the limit of nonlinearly preconditioned gradient methods. It establishes global existence of solutions and Lyapunov-based convergence under mild assumptions. For convex costs the flow shows sublinear decay in the geometry induced by a reference function, while a generalized gradient-dominance condition produces exponential convergence. By revealing a duality with mirror descent the authors prove that the flow solves an infinite-horizon optimal control problem whose value function is the Bregman divergence generated by the cost. The results give a unified view of these methods and link them to existing continuous-time models in non-Euclidean optimization.

Core claim

The nonlinearly preconditioned gradient flow is the continuous-time limit of a broad class of nonlinearly preconditioned gradient methods. Under mild assumptions global solutions exist and Lyapunov analysis yields convergence guarantees. For convex costs the flow exhibits sublinear decay in the geometry induced by a reference function. A generalized gradient-dominance condition implies exponential convergence. A duality connection with mirror descent shows that the flow solves an infinite-horizon optimal-control problem of which the value function is the Bregman divergence generated by the cost. This clarifies the structure and optimization behavior of the flows and connects them to known 2D

What carries the argument

The duality connection with mirror descent that establishes the nonlinearly preconditioned gradient flow as the solution to an infinite-horizon optimal control problem whose value function is the Bregman divergence generated by the cost.

If this is right

Convex costs produce sublinear convergence of the flow in the geometry defined by the reference function.
A generalized gradient-dominance condition yields exponential convergence.
The flow is dual to mirror descent.
The continuous-time model explains the optimization behavior of the corresponding discrete nonlinearly preconditioned methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The optimal-control interpretation may suggest new choices of preconditioners obtained by solving related control problems.
Discretizations of the flow could produce novel algorithms whose rates inherit directly from the continuous-time analysis.
Analogous optimal-control views may extend to other families of preconditioned flows.

Load-bearing premise

The analysis rests on mild conditions on the cost and preconditioner that guarantee global existence of the dynamical system and permit a Lyapunov function to certify convergence.

What would settle it

A specific convex cost and preconditioner for which the flow trajectory fails to exhibit the predicted sublinear decay rate measured in the Bregman geometry induced by the reference function.

Figures

Figures reproduced from arXiv: 2511.20370 by Alexander Bodard, Jan Quan, Konstantinos Oikonomidis, Panagiotis Patrinos.

**Figure 2.** Figure 2: Visualization of a mirror descent update. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of a nonlinearly preconditioned gradient method [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

We study a continuous-time dynamical system which arises as the limit of a broad class of nonlinearly preconditioned gradient methods. Under mild assumptions, we establish existence of global solutions and derive Lyapunov-based convergence guarantees. For convex costs, we prove a sublinear decay in a geometry induced by some reference function, and under a generalized gradient-dominance condition we obtain exponential convergence. We further uncover a duality connection with mirror descent, and use it to establish that the flow of interest solves an infinite-horizon optimal-control problem of which the value function is the Bregman divergence generated by the cost. These results clarify the structure and optimization behavior of nonlinearly preconditioned gradient flows and connect them to known continuous-time models in non-Euclidean optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames nonlinearly preconditioned gradient flows as the solution to an infinite-horizon optimal control problem whose value function is the Bregman divergence from mirror descent.

read the letter

The central new piece here is the continuous-time limit of nonlinearly preconditioned methods together with its claimed duality to mirror descent and the resulting optimal-control formulation. The authors show that the flow exists globally under mild assumptions and recover the expected convergence: sublinear decay for convex costs in the geometry induced by a reference function, plus exponential rates when a generalized gradient-dominance condition holds. These rates come from standard Lyapunov arguments, which is straightforward but solid for this setting. The mirror-descent link and the optimal-control interpretation are the parts that feel fresh relative to the cited literature on gradient flows and preconditioning. They give a clean way to view the preconditioner as shaping both the dynamics and the underlying value function. That connection is worth having on record. The main soft spot is the verification step for the Hamilton-Jacobi-Bellman equation. The abstract invokes only mild assumptions for existence and Lyapunov convergence, yet the duality claim requires that the Bregman divergence actually satisfies the HJB along trajectories. If the full derivations supply the needed regularity or growth conditions on the preconditioner and cost, the argument goes through; otherwise the identification is conditional. A referee would want to see the precise hypotheses that close this gap. The paper is aimed at researchers who already work with continuous-time models in non-Euclidean optimization. Someone looking for a unifying dynamical-systems view of preconditioned methods will find the organization useful and the optimal-control angle suggestive. It is not a foundational shift, but the connections are clear enough and the technical work is reproducible in principle. I would send it to peer review rather than desk-reject it.

Referee Report

1 major / 2 minor

Summary. The manuscript studies nonlinearly preconditioned gradient flows arising as continuous-time limits of a broad class of preconditioned gradient methods. Under mild assumptions it proves global existence of solutions together with Lyapunov-based convergence guarantees. For convex costs it establishes sublinear decay rates in the geometry induced by a reference function; under a generalized gradient-dominance condition it obtains exponential convergence. A duality with mirror descent is used to show that the flow solves an infinite-horizon optimal-control problem whose value function is precisely the Bregman divergence generated by the cost.

Significance. If the derivations hold, the work supplies a unified continuous-time perspective on nonlinear preconditioning and forges an explicit link to infinite-horizon optimal control via the Bregman value function. The combination of Lyapunov analysis, rate results, and the optimal-control equivalence could inform both theoretical understanding and practical design of non-Euclidean optimization algorithms.

major comments (1)

[§4] §4 (Duality and Optimal-Control Equivalence), statement of the main duality theorem: the claim that the Bregman divergence generated by the cost is the value function of the infinite-horizon OCP requires verification that it satisfies the associated HJB equation along flow trajectories. The paper invokes only the 'mild assumptions' of §2 for global existence and Lyapunov convergence; these do not appear to include the C^1 regularity, growth conditions, or viscosity-solution framework needed for a rigorous HJB verification lemma when the preconditioner is an arbitrary nonlinear map. Without these hypotheses the duality identification does not go through in full generality.

minor comments (2)

[§2] The precise statement of the 'mild assumptions' (global existence, Lyapunov function, etc.) should be collected in a single numbered assumption block early in the paper rather than scattered across lemmas.
[Notation] Notation for the nonlinear preconditioner and the reference function generating the Bregman divergence is introduced in §1 but used with slight variations in §3 and §4; a consolidated table of symbols would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. We address the single major comment below.

read point-by-point responses

Referee: [§4] §4 (Duality and Optimal-Control Equivalence), statement of the main duality theorem: the claim that the Bregman divergence generated by the cost is the value function of the infinite-horizon OCP requires verification that it satisfies the associated HJB equation along flow trajectories. The paper invokes only the 'mild assumptions' of §2 for global existence and Lyapunov convergence; these do not appear to include the C^1 regularity, growth conditions, or viscosity-solution framework needed for a rigorous HJB verification lemma when the preconditioner is an arbitrary nonlinear map. Without these hypotheses the duality identification does not go through in full generality.

Authors: We appreciate the referee's observation. The duality result is first obtained via the explicit connection to mirror descent flows, which already identifies the Bregman divergence as the natural value function. To make the optimal-control equivalence fully rigorous, we agree that a direct verification step is needed. In the revised manuscript we will insert a short proposition that substitutes the nonlinearly preconditioned dynamics into the time derivative of the Bregman divergence and shows that the resulting expression coincides with the Hamiltonian evaluated along the trajectory. This calculation uses only the chain rule, the definition of the preconditioner, and the convexity assumptions already stated in §2; no additional C^1 regularity or viscosity-solution machinery is required because the verification is performed pointwise along the explicitly constructed solutions whose existence is guaranteed by the mild hypotheses. We will also add a brief remark clarifying that the same verification holds for the generalized gradient-dominance case used for exponential rates. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The abstract and described results establish global existence, Lyapunov convergence, sublinear decay in a Bregman geometry, exponential rates under generalized gradient dominance, and a duality link to mirror descent that identifies the flow as solving an infinite-horizon OCP with Bregman value function. These steps rely on standard convex-analysis arguments and mild regularity assumptions rather than any reduction of the claimed value function or rates to fitted parameters, self-definitions, or unverified self-citations by the paper's own equations. No load-bearing claim collapses to an input by construction, and the duality verification is presented as following from independent HJB analysis under the stated hypotheses.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard existence theory for ODEs and Lyapunov analysis for convex functions; no free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption Mild assumptions guarantee global existence of solutions to the dynamical system
Invoked for the existence result stated in the abstract
domain assumption Generalized gradient-dominance condition yields exponential convergence
Used to obtain the exponential rate

pith-pipeline@v0.9.0 · 5423 in / 1221 out tokens · 19961 ms · 2026-05-17T04:39:43.395405+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the flow of interest solves an infinite-horizon optimal-control problem of which the value function is the Bregman divergence generated by the cost
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

duality connection with mirror descent

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Attouch, X

H. Attouch, X. Goudou, and P. Redont, “The heavy ball with friction method, I. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system,”Communications in Contemporary Mathematics, vol. 2, no. 01, pp. 1–34, 2000

work page 2000
[2]

A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights,

W. Su, S. Boyd, and E. J. Candes, “A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights,”Journal of Machine Learning Research, vol. 17, no. 153, pp. 1–43, 2016

work page 2016
[3]

Accelerated mirror descent in continuous and discrete time,

W. Krichene, A. Bayen, and P. L. Bartlett, “Accelerated mirror descent in continuous and discrete time,”Advances in neural information processing systems, vol. 28, 2015

work page 2015
[4]

Variational principles for mirror descent and mirror Langevin dynamics,

B. Tzen, A. Raj, M. Raginsky, and F. Bach, “Variational principles for mirror descent and mirror Langevin dynamics,”IEEE Control Systems Letters, vol. 7, pp. 1542–1547, 2023

work page 2023
[5]

Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators,

H. Attouch and J. Peypouquet, “Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators,” Mathematical Programming, vol. 174, no. 1, pp. 391–432, 2019

work page 2019
[6]

Finite-time convergent gradient flows with applications to network consensus,

J. Cort ´es, “Finite-time convergent gradient flows with applications to network consensus,”Automatica, vol. 42, no. 11, pp. 1993–2000, 2006

work page 1993
[7]

From gradient clipping to normalization for heavy tailed SGD,

F. H ¨ubler, I. Fatkhullin, and N. He, “From gradient clipping to normalization for heavy tailed SGD,” inInternational Conference on Artificial Intelligence and Statistics, pp. 2413–2421, PMLR, 2025

work page 2025
[8]

Why gradient clipping accelerates training: A theoretical justification for adaptivity,

J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why gradient clipping accelerates training: A theoretical justification for adaptivity,” inIn- ternational Conference on Learning Representations

work page
[9]

Nonlinearly preconditioned gradient methods under generalized smoothness,

K. Oikonomidis, J. Quan, E. Laude, and P. Patrinos, “Nonlinearly preconditioned gradient methods under generalized smoothness,” in Forty-second International Conference on Machine Learning

work page
[10]

Nonlinearly preconditioned gradient methods: Momentum and stochastic analysis,

K. Oikonomidis, J. Quan, and P. Patrinos, “Nonlinearly preconditioned gradient methods: Momentum and stochastic analysis,”arXiv preprint arXiv:2510.11312, 2025

work page arXiv 2025
[11]

Anisotropic proximal gradient,

E. Laude and P. Patrinos, “Anisotropic proximal gradient,”Mathemat- ical Programming, pp. 1–45, 2025

work page 2025
[12]

Dual space preconditioning for gradient descent,

C. J. Maddison, D. Paulin, Y . W. Teh, and A. Doucet, “Dual space preconditioning for gradient descent,”SIAM Journal on Optimization, vol. 31, no. 1, pp. 991–1016, 2021

work page 2021
[13]

Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning,

A. Bodard and P. Patrinos, “Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning,”arXiv preprint arXiv:2509.15817, 2025

work page arXiv 2025
[14]

The Brezis–Ekeland principle for doubly nonlinear equations,

U. Stefanelli, “The Brezis–Ekeland principle for doubly nonlinear equations,”SIAM Journal on Control and Optimization, vol. 47, no. 3, pp. 1615–1642, 2008

work page 2008
[15]

H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2017

work page 2017
[16]

Nonsmooth analysis of doubly nonlinear evolution equations,

A. Mielke, R. Rossi, and G. Savar ´e, “Nonsmooth analysis of doubly nonlinear evolution equations,”Calculus of Variations and Partial Differential Equations, vol. 46, no. 1, pp. 253–310, 2013

work page 2013
[17]

Doubly nonlinear evolution equa- tions of second order: Existence and fully discrete approximation,

E. Emmrich and M. Thalhammer, “Doubly nonlinear evolution equa- tions of second order: Existence and fully discrete approximation,” Journal of Differential Equations, vol. 251, no. 1, pp. 82–118, 2011

work page 2011
[18]

Quadratic and rate-independent limits for a large-deviations functional,

G. A. Bonaschi and M. A. Peletier, “Quadratic and rate-independent limits for a large-deviations functional,”Continuum Mechanics and Thermodynamics, vol. 28, no. 4, pp. 1191–1219, 2016

work page 2016
[19]

R. T. Rockafellar,Convex analysis, vol. 28. Princeton university press, 1997

work page 1997
[20]

R. T. Rockafellar and R. J. Wets,Variational Analysis. New York: Springer, 1998

work page 1998
[21]

Higher derivatives of conjugate convex functions,

R. T. Rockafellar, “Higher derivatives of conjugate convex functions,” Int. J. Applied Analysis, no. 1, pp. 41–43, 1977

work page 1977
[22]

Beck,First-order methods in optimization

A. Beck,First-order methods in optimization. SIAM, 2017

work page 2017
[23]

Mirror duality in convex optimization,

J. Kim, C. Park, A. Ozdaglar, J. Diakonikolas, and E. K. Ryu, “Mirror duality in convex optimization,”arXiv preprint arXiv:2311.17296, 2023

work page arXiv 2023