pith. machine review for the scientific record. sign in

arxiv: 2511.20370 · v2 · submitted 2025-11-25 · 🧮 math.OC · math.DS

Nonlinearly preconditioned gradient flows

Pith reviewed 2026-05-17 04:39 UTC · model grok-4.3

classification 🧮 math.OC math.DS
keywords nonlinearly preconditioned gradient methodscontinuous-time gradient flowsBregman divergencemirror descent dualityoptimal control formulationgradient dominanceLyapunov convergencenon-Euclidean optimization
0
0 comments X

The pith

A nonlinearly preconditioned gradient flow solves an infinite-horizon optimal control problem with the Bregman divergence as its value function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines a continuous-time dynamical system that arises as the limit of nonlinearly preconditioned gradient methods. It establishes global existence of solutions and Lyapunov-based convergence under mild assumptions. For convex costs the flow shows sublinear decay in the geometry induced by a reference function, while a generalized gradient-dominance condition produces exponential convergence. By revealing a duality with mirror descent the authors prove that the flow solves an infinite-horizon optimal control problem whose value function is the Bregman divergence generated by the cost. The results give a unified view of these methods and link them to existing continuous-time models in non-Euclidean optimization.

Core claim

The nonlinearly preconditioned gradient flow is the continuous-time limit of a broad class of nonlinearly preconditioned gradient methods. Under mild assumptions global solutions exist and Lyapunov analysis yields convergence guarantees. For convex costs the flow exhibits sublinear decay in the geometry induced by a reference function. A generalized gradient-dominance condition implies exponential convergence. A duality connection with mirror descent shows that the flow solves an infinite-horizon optimal-control problem of which the value function is the Bregman divergence generated by the cost. This clarifies the structure and optimization behavior of the flows and connects them to known 2D

What carries the argument

The duality connection with mirror descent that establishes the nonlinearly preconditioned gradient flow as the solution to an infinite-horizon optimal control problem whose value function is the Bregman divergence generated by the cost.

If this is right

  • Convex costs produce sublinear convergence of the flow in the geometry defined by the reference function.
  • A generalized gradient-dominance condition yields exponential convergence.
  • The flow is dual to mirror descent.
  • The continuous-time model explains the optimization behavior of the corresponding discrete nonlinearly preconditioned methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The optimal-control interpretation may suggest new choices of preconditioners obtained by solving related control problems.
  • Discretizations of the flow could produce novel algorithms whose rates inherit directly from the continuous-time analysis.
  • Analogous optimal-control views may extend to other families of preconditioned flows.

Load-bearing premise

The analysis rests on mild conditions on the cost and preconditioner that guarantee global existence of the dynamical system and permit a Lyapunov function to certify convergence.

What would settle it

A specific convex cost and preconditioner for which the flow trajectory fails to exhibit the predicted sublinear decay rate measured in the Bregman geometry induced by the reference function.

Figures

Figures reproduced from arXiv: 2511.20370 by Alexander Bodard, Jan Quan, Konstantinos Oikonomidis, Panagiotis Patrinos.

Figure 1
Figure 1. Figure 1: Preconditioners corresponding to different reference functions. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of a mirror descent update. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of a nonlinearly preconditioned gradient method [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

We study a continuous-time dynamical system which arises as the limit of a broad class of nonlinearly preconditioned gradient methods. Under mild assumptions, we establish existence of global solutions and derive Lyapunov-based convergence guarantees. For convex costs, we prove a sublinear decay in a geometry induced by some reference function, and under a generalized gradient-dominance condition we obtain exponential convergence. We further uncover a duality connection with mirror descent, and use it to establish that the flow of interest solves an infinite-horizon optimal-control problem of which the value function is the Bregman divergence generated by the cost. These results clarify the structure and optimization behavior of nonlinearly preconditioned gradient flows and connect them to known continuous-time models in non-Euclidean optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript studies nonlinearly preconditioned gradient flows arising as continuous-time limits of a broad class of preconditioned gradient methods. Under mild assumptions it proves global existence of solutions together with Lyapunov-based convergence guarantees. For convex costs it establishes sublinear decay rates in the geometry induced by a reference function; under a generalized gradient-dominance condition it obtains exponential convergence. A duality with mirror descent is used to show that the flow solves an infinite-horizon optimal-control problem whose value function is precisely the Bregman divergence generated by the cost.

Significance. If the derivations hold, the work supplies a unified continuous-time perspective on nonlinear preconditioning and forges an explicit link to infinite-horizon optimal control via the Bregman value function. The combination of Lyapunov analysis, rate results, and the optimal-control equivalence could inform both theoretical understanding and practical design of non-Euclidean optimization algorithms.

major comments (1)
  1. [§4] §4 (Duality and Optimal-Control Equivalence), statement of the main duality theorem: the claim that the Bregman divergence generated by the cost is the value function of the infinite-horizon OCP requires verification that it satisfies the associated HJB equation along flow trajectories. The paper invokes only the 'mild assumptions' of §2 for global existence and Lyapunov convergence; these do not appear to include the C^1 regularity, growth conditions, or viscosity-solution framework needed for a rigorous HJB verification lemma when the preconditioner is an arbitrary nonlinear map. Without these hypotheses the duality identification does not go through in full generality.
minor comments (2)
  1. [§2] The precise statement of the 'mild assumptions' (global existence, Lyapunov function, etc.) should be collected in a single numbered assumption block early in the paper rather than scattered across lemmas.
  2. [Notation] Notation for the nonlinear preconditioner and the reference function generating the Bregman divergence is introduced in §1 but used with slight variations in §3 and §4; a consolidated table of symbols would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. We address the single major comment below.

read point-by-point responses
  1. Referee: [§4] §4 (Duality and Optimal-Control Equivalence), statement of the main duality theorem: the claim that the Bregman divergence generated by the cost is the value function of the infinite-horizon OCP requires verification that it satisfies the associated HJB equation along flow trajectories. The paper invokes only the 'mild assumptions' of §2 for global existence and Lyapunov convergence; these do not appear to include the C^1 regularity, growth conditions, or viscosity-solution framework needed for a rigorous HJB verification lemma when the preconditioner is an arbitrary nonlinear map. Without these hypotheses the duality identification does not go through in full generality.

    Authors: We appreciate the referee's observation. The duality result is first obtained via the explicit connection to mirror descent flows, which already identifies the Bregman divergence as the natural value function. To make the optimal-control equivalence fully rigorous, we agree that a direct verification step is needed. In the revised manuscript we will insert a short proposition that substitutes the nonlinearly preconditioned dynamics into the time derivative of the Bregman divergence and shows that the resulting expression coincides with the Hamiltonian evaluated along the trajectory. This calculation uses only the chain rule, the definition of the preconditioner, and the convexity assumptions already stated in §2; no additional C^1 regularity or viscosity-solution machinery is required because the verification is performed pointwise along the explicitly constructed solutions whose existence is guaranteed by the mild hypotheses. We will also add a brief remark clarifying that the same verification holds for the generalized gradient-dominance case used for exponential rates. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The abstract and described results establish global existence, Lyapunov convergence, sublinear decay in a Bregman geometry, exponential rates under generalized gradient dominance, and a duality link to mirror descent that identifies the flow as solving an infinite-horizon OCP with Bregman value function. These steps rely on standard convex-analysis arguments and mild regularity assumptions rather than any reduction of the claimed value function or rates to fitted parameters, self-definitions, or unverified self-citations by the paper's own equations. No load-bearing claim collapses to an input by construction, and the duality verification is presented as following from independent HJB analysis under the stated hypotheses.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard existence theory for ODEs and Lyapunov analysis for convex functions; no free parameters or invented entities are visible in the abstract.

axioms (2)
  • domain assumption Mild assumptions guarantee global existence of solutions to the dynamical system
    Invoked for the existence result stated in the abstract
  • domain assumption Generalized gradient-dominance condition yields exponential convergence
    Used to obtain the exponential rate

pith-pipeline@v0.9.0 · 5423 in / 1221 out tokens · 19961 ms · 2026-05-17T04:39:43.395405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Attouch, X

    H. Attouch, X. Goudou, and P. Redont, “The heavy ball with friction method, I. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system,”Communications in Contemporary Mathematics, vol. 2, no. 01, pp. 1–34, 2000

  2. [2]

    A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights,

    W. Su, S. Boyd, and E. J. Candes, “A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights,”Journal of Machine Learning Research, vol. 17, no. 153, pp. 1–43, 2016

  3. [3]

    Accelerated mirror descent in continuous and discrete time,

    W. Krichene, A. Bayen, and P. L. Bartlett, “Accelerated mirror descent in continuous and discrete time,”Advances in neural information processing systems, vol. 28, 2015

  4. [4]

    Variational principles for mirror descent and mirror Langevin dynamics,

    B. Tzen, A. Raj, M. Raginsky, and F. Bach, “Variational principles for mirror descent and mirror Langevin dynamics,”IEEE Control Systems Letters, vol. 7, pp. 1542–1547, 2023

  5. [5]

    Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators,

    H. Attouch and J. Peypouquet, “Convergence of inertial dynamics and proximal algorithms governed by maximally monotone operators,” Mathematical Programming, vol. 174, no. 1, pp. 391–432, 2019

  6. [6]

    Finite-time convergent gradient flows with applications to network consensus,

    J. Cort ´es, “Finite-time convergent gradient flows with applications to network consensus,”Automatica, vol. 42, no. 11, pp. 1993–2000, 2006

  7. [7]

    From gradient clipping to normalization for heavy tailed SGD,

    F. H ¨ubler, I. Fatkhullin, and N. He, “From gradient clipping to normalization for heavy tailed SGD,” inInternational Conference on Artificial Intelligence and Statistics, pp. 2413–2421, PMLR, 2025

  8. [8]

    Why gradient clipping accelerates training: A theoretical justification for adaptivity,

    J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why gradient clipping accelerates training: A theoretical justification for adaptivity,” inIn- ternational Conference on Learning Representations

  9. [9]

    Nonlinearly preconditioned gradient methods under generalized smoothness,

    K. Oikonomidis, J. Quan, E. Laude, and P. Patrinos, “Nonlinearly preconditioned gradient methods under generalized smoothness,” in Forty-second International Conference on Machine Learning

  10. [10]

    Nonlinearly preconditioned gradient methods: Momentum and stochastic analysis,

    K. Oikonomidis, J. Quan, and P. Patrinos, “Nonlinearly preconditioned gradient methods: Momentum and stochastic analysis,”arXiv preprint arXiv:2510.11312, 2025

  11. [11]

    Anisotropic proximal gradient,

    E. Laude and P. Patrinos, “Anisotropic proximal gradient,”Mathemat- ical Programming, pp. 1–45, 2025

  12. [12]

    Dual space preconditioning for gradient descent,

    C. J. Maddison, D. Paulin, Y . W. Teh, and A. Doucet, “Dual space preconditioning for gradient descent,”SIAM Journal on Optimization, vol. 31, no. 1, pp. 991–1016, 2021

  13. [13]

    Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning,

    A. Bodard and P. Patrinos, “Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning,”arXiv preprint arXiv:2509.15817, 2025

  14. [14]

    The Brezis–Ekeland principle for doubly nonlinear equations,

    U. Stefanelli, “The Brezis–Ekeland principle for doubly nonlinear equations,”SIAM Journal on Control and Optimization, vol. 47, no. 3, pp. 1615–1642, 2008

  15. [15]

    H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2017

  16. [16]

    Nonsmooth analysis of doubly nonlinear evolution equations,

    A. Mielke, R. Rossi, and G. Savar ´e, “Nonsmooth analysis of doubly nonlinear evolution equations,”Calculus of Variations and Partial Differential Equations, vol. 46, no. 1, pp. 253–310, 2013

  17. [17]

    Doubly nonlinear evolution equa- tions of second order: Existence and fully discrete approximation,

    E. Emmrich and M. Thalhammer, “Doubly nonlinear evolution equa- tions of second order: Existence and fully discrete approximation,” Journal of Differential Equations, vol. 251, no. 1, pp. 82–118, 2011

  18. [18]

    Quadratic and rate-independent limits for a large-deviations functional,

    G. A. Bonaschi and M. A. Peletier, “Quadratic and rate-independent limits for a large-deviations functional,”Continuum Mechanics and Thermodynamics, vol. 28, no. 4, pp. 1191–1219, 2016

  19. [19]

    R. T. Rockafellar,Convex analysis, vol. 28. Princeton university press, 1997

  20. [20]

    R. T. Rockafellar and R. J. Wets,Variational Analysis. New York: Springer, 1998

  21. [21]

    Higher derivatives of conjugate convex functions,

    R. T. Rockafellar, “Higher derivatives of conjugate convex functions,” Int. J. Applied Analysis, no. 1, pp. 41–43, 1977

  22. [22]

    Beck,First-order methods in optimization

    A. Beck,First-order methods in optimization. SIAM, 2017

  23. [23]

    Mirror duality in convex optimization,

    J. Kim, C. Park, A. Ozdaglar, J. Diakonikolas, and E. K. Ryu, “Mirror duality in convex optimization,”arXiv preprint arXiv:2311.17296, 2023