arxiv: 2605.12666 · v1 · submitted 2026-05-12 · 🧮 math.OC

Recognition: unknown

Newton methods beyond Hessian Lipschitz continuity: A nonlinear preconditioning approach

Alexander Bodard , Panagiotis Patrinos

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:27 UTC · model grok-4.3

classification 🧮 math.OC

keywords Newton methodsnonlinear preconditioningpreconditioned Hessiansuperlinear convergencequadratic convergenceiteration complexityglobalization strategy

0 comments

The pith

Newton methods achieve local superlinear and quadratic convergence by nonlinearly preconditioning the optimality mapping under Lipschitz continuity of the preconditioned Hessian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a class of nonlinearly preconditioned Newton methods that apply Newton's root-finding scheme to a transformed optimality mapping. This approach relaxes the standard assumption of Hessian Lipschitz continuity to Lipschitz continuity of a preconditioned Hessian. Under this condition, local superlinear and quadratic convergence are established. A globalization strategy is developed for the nonregularized method, and a regularized variant achieves O(ε^{-3/2}) iteration complexity with an adaptive version allowing inexact solutions.

Core claim

By introducing nonlinear preconditioning to Newton-type methods, the analysis shifts from requiring the Hessian to be Lipschitz continuous to requiring only the preconditioned Hessian to be Lipschitz continuous. This enables local superlinear and quadratic convergence guarantees for the methods, along with globalization strategies and complexity bounds that hold even when the preconditioned Newton direction is not a descent direction.

What carries the argument

Nonlinear preconditioning applied to the optimality mapping, transforming it so that Newton's method is applied to the new mapping whose Jacobian (the preconditioned Hessian) satisfies a Lipschitz condition.

If this is right

The methods converge locally superlinearly or quadratically under the preconditioned Lipschitz assumption.
Globalization is possible even without the direction being a descent direction.
The regularized variant attains O(ε^{-3/2}) iteration complexity.
The adaptive version preserves the complexity while allowing inexact subproblem solutions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framework may enable Newton methods on problems with polynomial or higher-order growth where classical assumptions fail.
Similar preconditioning could be tested on nonconvex optimization landscapes.
Connections to first-order nonlinear preconditioning suggest broader applicability across optimization methods.

Load-bearing premise

There exists a nonlinear preconditioner making the preconditioned Hessian Lipschitz continuous, and globalization works even if the preconditioned Newton direction is not always a descent direction.

What would settle it

An optimization problem where no nonlinear preconditioner renders the preconditioned Hessian Lipschitz continuous, and the proposed methods fail to exhibit superlinear convergence on that instance.

Figures

Figures reproduced from arXiv: 2605.12666 by Alexander Bodard, Panagiotis Patrinos.

**Figure 1.** Figure 1: Iterations to minimize f(x) = 1 p x p + 1 2 x 2 : (PN) is consistent for different p and x 0 . 15 16 17 18 19 20 21 Iterations to convergence N h1 h2 h3 25% div. 0% div. 12% div. 12% div [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗

**Figure 3.** Figure 3: Vanilla versus preconditioned Newton, with [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Gradient norm versus function iterations for 80 random matrix factorization problems. [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Gradient norm versus function iterations for 80 random matrix factorization problems. [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

read the original abstract

Newton-type methods are typically analyzed under Lipschitz continuity of the Hessian, an assumption that can fail for objectives with higher-order or polynomial growth. We introduce a class of nonlinearly preconditioned Newton methods that apply Newton's root-finding scheme to a transformed optimality mapping, thereby extending recent nonlinear preconditioning ideas from first-order methods to the second-order setting. The resulting methods are naturally analyzed under Lipschitz continuity of a preconditioned Hessian, a condition that significantly relaxes the classical Hessian Lipschitz continuity assumption. Under this generalized smoothness model, we establish local superlinear and quadratic convergence guarantees, and develop a globalization strategy for the nonregularized method despite the fact that the preconditioned Newton direction need not be a descent direction. We further propose a regularized variant for isotropic preconditioners, and show that it attains an $O(\varepsilon^{-3/2})$ iteration complexity. An adaptive version removes the need to know the smoothness constant and allows inexact subproblem solutions while preserving the same complexity order.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends Newton methods via nonlinear preconditioning to relax Hessian Lipschitz continuity, with solid convergence and complexity results under the new assumption.

read the letter

The core new idea is applying nonlinear preconditioning to the optimality mapping so Newton steps can be analyzed under Lipschitz continuity of the preconditioned Hessian rather than the original one. This directly relaxes the assumption that breaks for objectives with polynomial or higher growth, and the paper carries the local superlinear/quadratic rates and the O(ε^{-3/2}) complexity for the regularized isotropic case over to this setting. The adaptive version that drops the need to know the constant and permits inexact solves is a practical addition as well. They also give a globalization argument that works even when the preconditioned direction is not guaranteed to be descent, which is a useful technical step beyond the first-order literature they build on. The claims stay consistent with the transformed mapping and do not appear to sneak in classical assumptions. The main limitation is that everything still requires a suitable nonlinear preconditioner to exist and be usable for the problem at hand; if constructing one is nontrivial, the method stays mostly theoretical. The proofs would need checking for how the transformation is handled in the error bounds and globalization, but the high-level structure looks clean. This is for people working on second-order optimization theory who want to broaden the class of problems Newton methods can target. It has enough new analysis and a competitive complexity bound to deserve a serious referee.

Referee Report

2 major / 3 minor

Summary. The paper introduces a class of nonlinearly preconditioned Newton methods that apply Newton's root-finding scheme to a transformed optimality mapping. This extends nonlinear preconditioning ideas from first-order to second-order optimization, allowing analysis under Lipschitz continuity of the preconditioned Hessian rather than the original Hessian. The authors establish local superlinear and quadratic convergence guarantees, develop a globalization strategy for the nonregularized method (even when the preconditioned Newton direction is not a descent direction), propose a regularized variant for isotropic preconditioners attaining O(ε^{-3/2}) iteration complexity, and present an adaptive version that preserves this complexity while allowing inexact subproblem solutions without knowledge of the smoothness constant.

Significance. If the results hold, this work is significant for extending the applicability of Newton-type methods to nonconvex problems with higher-order or polynomial growth where classical Hessian Lipschitz continuity fails. The nonlinear preconditioning approach provides a principled relaxation of smoothness assumptions while retaining strong local convergence and competitive global complexity bounds. The adaptive regularized variant enhances practicality by removing the need for known constants and tolerating inexact solves. The framework builds directly on recent nonlinear preconditioning literature with a novel second-order extension.

major comments (2)

[Section 3.2, Eq. (12)] Section 3.2, Eq. (12): The globalization strategy for the nonregularized method is developed despite the preconditioned Newton direction not necessarily being a descent direction; however, the argument relies on an implicit assumption about the preconditioner ensuring sufficient decrease that is not fully detailed in the main theorem statement, making this load-bearing for the global convergence claim.
[Theorem 5.1] Theorem 5.1: The local quadratic convergence under Lipschitz continuity of the preconditioned Hessian assumes existence of a nonlinear preconditioner rendering the transformed mapping sufficiently smooth; the manuscript provides no constructive conditions or examples verifying this for objectives with non-Lipschitz Hessians, which is central to the paper's main extension beyond classical assumptions.

minor comments (3)

[Abstract] Abstract: The phrase 'isotropic preconditioners' is used without a brief inline definition or forward reference, which may reduce accessibility for readers outside the immediate subfield.
[Section 2] Section 2: The notation for the transformed optimality mapping F_P is introduced late; defining it earlier would improve flow when discussing the preconditioned Hessian in subsequent sections.
[References] References: The citation list omits a direct pointer to the foundational nonlinear preconditioning work for first-order methods in the introduction, which would better contextualize the extension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and positive evaluation of the significance of our work. We respond to each of the major comments below.

read point-by-point responses

Referee: Section 3.2, Eq. (12): The globalization strategy for the nonregularized method is developed despite the preconditioned Newton direction not necessarily being a descent direction; however, the argument relies on an implicit assumption about the preconditioner ensuring sufficient decrease that is not fully detailed in the main theorem statement, making this load-bearing for the global convergence claim.

Authors: We agree that the main theorem would benefit from an explicit statement of the assumption on the preconditioner. The sufficient decrease is ensured by the properties of the nonlinear preconditioner as defined in the paper. We will revise the theorem statement in Section 3.2 to include this condition explicitly and provide additional details in the proof sketch to clarify the argument. revision: yes
Referee: Theorem 5.1: The local quadratic convergence under Lipschitz continuity of the preconditioned Hessian assumes existence of a nonlinear preconditioner rendering the transformed mapping sufficiently smooth; the manuscript provides no constructive conditions or examples verifying this for objectives with non-Lipschitz Hessians, which is central to the paper's main extension beyond classical assumptions.

Authors: The manuscript establishes the convergence results under the assumption that such a preconditioner exists. To enhance the paper, we will include in the revision specific examples of nonlinear preconditioners for classes of problems with non-Lipschitz Hessians, such as those with polynomial growth of order greater than 2. This will provide constructive conditions and demonstrate the practical relevance of the framework. revision: yes

Circularity Check

0 steps flagged

Minor reliance on prior nonlinear preconditioning literature; central Newton extension and preconditioned-Hessian analysis remain independently grounded

full rationale

The paper applies Newton's method to a transformed optimality mapping and derives local superlinear/quadratic convergence plus O(ε^{-3/2}) complexity under Lipschitz continuity of the preconditioned Hessian rather than the original Hessian. This constitutes a genuine relaxation of the classical assumption and does not reduce any claimed result to a fitted parameter or self-definition by construction. While the abstract explicitly positions the work as extending recent first-order nonlinear preconditioning ideas, the second-order analysis, globalization strategy (despite non-descent directions), and adaptive inexact variant are developed directly from the transformed mapping without load-bearing self-citations or ansatzes that collapse the claims. No equation or step in the provided description equates a prediction to its own input.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that a suitable nonlinear preconditioner exists making the transformed Hessian Lipschitz continuous; no explicit free parameters are fitted to data in the abstract, and no new entities are postulated.

axioms (1)

domain assumption Existence of a nonlinear preconditioner such that the preconditioned Hessian is Lipschitz continuous.
This is the key relaxation of the classical Hessian Lipschitz assumption invoked throughout the abstract.

pith-pipeline@v0.9.0 · 5463 in / 1264 out tokens · 86768 ms · 2026-05-14T20:27:59.597009+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning

A. Bodard and P. Patrinos. “Escaping saddle points without Lipschitz smoothness: the power of nonlinear preconditioning”. In: Advances in Neural Information Processing Systems. 2025, pp. 124173–124208

2025
[2]

Mirror and Preconditioned Gradient Descent in Wasserstein Space

C. Bonet, T. Uscidda, A. David, P. -C. Aubin-Frankowski, and A. Korba. “Mirror and Preconditioned Gradient Descent in Wasserstein Space”. In: Advances in Neural Information Processing Systems. 2024, pp. 25311–25374

2024
[3]

A generalized multivariable Newton method

R. S. Burachik, B. I. Caldwell, and C. Y. Kaya. “A generalized multivariable Newton method”. In: Fixed Point Theory and Algorithms for Sciences and Engineering 2021.1 (2021), p. 15

2021
[4]

A generalized univariate Newton method mo- tivated by proximal regularization

R. S. Burachik, C. Y. Kaya, and S. Sabach. “A generalized univariate Newton method mo- tivated by proximal regularization”. In: Journal of Optimization Theory and Applications 155.3 (2012), pp. 923–940

2012
[5]

Lower bounds for finding stationary points I

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford. “Lower bounds for finding stationary points I”. In: Mathematical Programming 184.1 (2020), pp. 71–120

2020
[6]

Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results

C. Cartis, N. I. Gould, and P. L. Toint. “Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results”. In: Mathematical Programming 127.2 (2011), pp. 245–295

2011
[7]

Adaptive cubic regularisation methods for uncon- strained optimization. Part II: worst-case function-and derivative-evaluation complexity

C. Cartis, N. I. Gould, and P. L. Toint. “Adaptive cubic regularisation methods for uncon- strained optimization. Part II: worst-case function-and derivative-evaluation complexity”. In: Mathematical programming 130.2 (2011), pp. 295–319

2011
[8]

Cartis, N

C. Cartis, N. I. Gould, and P. L. Toint. Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives . SIAM, 2022

2022
[9]

LIBSVM: A library for support vector machines

C.-C. Chang and C.-J. Lin. “LIBSVM: A library for support vector machines”. In: ACM transactions on intelligent systems and technology (TIST) 2.3 (2011), pp. 1–27

2011
[10]

Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization

Z. Chen, Y. Zhou, Y. Liang, and Z. Lu. “Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization”. In: International Conference on Machine Learning. 2023, pp. 5396–5427

2023
[11]

Minimizing Quasi-Self-Concordant Functions by Gradient Regularization of Newton Method

N. Doikov. “Minimizing Quasi-Self-Concordant Functions by Gradient Regularization of Newton Method”. In: Mathematical Programming (2025), pp. 1–39

2025
[12]

Super-universal regularized Newton method

N. Doikov, K. Mishchenko, and Y. Nesterov. “Super-universal regularized Newton method”. In: SIAM Journal on Optimization 34.1 (2024), pp. 27–56

2024
[13]

Gradient regularization of Newton method with Bregman distances

N. Doikov and Y. Nesterov. “Gradient regularization of Newton method with Bregman distances”. In: Mathematical programming 204.1 (2024), pp. 1–25

2024
[14]

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

J. Duchi, E. Hazan, and Y. Singer. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”. In: Journal of Machine Learning Research 12.61 (2011), pp. 2121–2159

2011
[15]

Scalable adaptive cubic regularization methods

J.-P. Dussault, T. Migot, and D. Orban. “Scalable adaptive cubic regularization methods”. In: Mathematical Programming 207.1 (2024), pp. 191–225

2024
[16]

Yet another fast variant of Newton’s method for nonconvex optimization

S. Gratton, S. Jerad, and P. L. Toint. “Yet another fast variant of Newton’s method for nonconvex optimization”. In: IMA Journal of Numerical Analysis 45.2 (2025), pp. 971– 1008. 14

2025
[17]

A Damped Newton Method Achieves Global O(1/k2) and Local Quadratic Convergence Rate

S. Hanzely, D. Kamzolov, D. Pasechnyuk, A. Gasnikov, P. Richt´ arik, and M. Tak´ ac. “A Damped Newton Method Achieves Global O(1/k2) and Local Quadratic Convergence Rate”. In: Advances in Neural Information Processing Systems (2022), pp. 25320–25334

2022
[18]

A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees

C. He, Z. Lu, and T. K. Pong. “A Newton-CG based augmented Lagrangian method for finding a second-order stationary point of nonconvex equality constrained optimization with complexity guarantees”. In: SIAM Journal on Optimization 33.3 (2023), pp. 1734– 1766

2023
[19]

Adam: A Method for Stochastic Optimization

D. P. Kingma. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[20]

Anisotropic proximal gradient

E. Laude and P. Patrinos. “Anisotropic proximal gradient”. In: Mathematical Program- ming 214.1 (2025), pp. 801–845

2025
[21]

Dualities for non-Euclidean smoothness and strong convexity under the light of generalized conjugacy

E. Laude, A. Themelis, and P. Patrinos. “Dualities for non-Euclidean smoothness and strong convexity under the light of generalized conjugacy”. In: SIAM Journal on Optimization 33.4 (2023), pp. 2721–2749

2023
[22]

Gradient descent with a general cost

F. L´ eger and P.-C. Aubin-Frankowski. “Gradient descent with a general cost”. In: arXiv preprint arXiv:2305.04917 (2023)

work page arXiv 2023
[23]

A method for the solution of certain non-linear problems in least squares

K. Levenberg. “A method for the solution of certain non-linear problems in least squares”. In: Quarterly of applied mathematics 2.2 (1944), pp. 164–168

1944
[24]

Regularized Newton methods for convex minimization problems with singular solutions

D.-H. Li, M. Fukushima, L. Qi, and N. Yamashita. “Regularized Newton methods for convex minimization problems with singular solutions”. In: Computational optimization and applications 28.2 (2004), pp. 131–147

2004
[25]

Convex and Non-convex Optimization Under Generalized Smoothness

H. Li, J. Qian, Y. Tian, A. Rakhlin, and A. Jadbabaie. “Convex and Non-convex Optimization Under Generalized Smoothness”. In: Advances in Neural Information Processing Systems (2023), pp. 40238–40271

2023
[26]

Relatively smooth convex optimization by first-order methods, and applications

H. Lu, R. M. Freund, and Y. Nesterov. “Relatively smooth convex optimization by first-order methods, and applications”. In: SIAM Journal on Optimization 28.1 (2018), pp. 333–354

2018
[27]

Dual space preconditioning for gradient descent

C. J. Maddison, D. Paulin, Y. W. Teh, and A. Doucet. “Dual space preconditioning for gradient descent”. In: SIAM Journal on Optimization 31.1 (2021), pp. 991–1016

2021
[28]

An algorithm for least-squares estimation of nonlinear parameters

D. W. Marquardt. “An algorithm for least-squares estimation of nonlinear parameters”. In: Journal of the society for Industrial and Applied Mathematics 11.2 (1963), pp. 431– 441

1963
[29]

Regularized Newton method with global convergence

K. Mishchenko. “Regularized Newton method with global convergence”. In: SIAM Journal on Optimization 33.3 (2023), pp. 1440–1462

2023
[30]

Nesterov

Y. Nesterov. Lectures on convex optimization. Vol. 137. Springer, 2018

2018
[31]

Nesterov and A

Y. Nesterov and A. Nemirovskii. Interior-point polynomial algorithms in convex pro- gramming. SIAM, 1994

1994
[32]

Cubic regularization of Newton method and its global performance

Y. Nesterov and B. T. Polyak. “Cubic regularization of Newton method and its global performance”. In: Mathematical programming 108.1 (2006), pp. 177–205

2006
[33]

Forward-backward splitting under the light of generalized convexity

K. Oikonomidis, E. Laude, and P. Patrinos. “Forward-backward splitting under the light of generalized convexity”. In: arXiv preprint arXiv:2503.18098 (2025). 15

work page arXiv 2025
[34]

Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness

K. Oikonomidis, J. Quan, E. Laude, and P. Patrinos. “Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness”. In: International Conference on Machine Learning. 2025, pp. 47132–47154

2025
[35]

Nonlinearly Preconditioned Gradient Meth- ods: Momentum and Stochastic Analysis

K. Oikonomidis, J. Quan, and P. Patrinos. “Nonlinearly Preconditioned Gradient Meth- ods: Momentum and Stochastic Analysis”. In: Advances in Neural Information Processing Systems. 2025, pp. 38957–38988

2025
[36]

Regularized Newton method for unconstrained convex optimization

R. A. Polyak. “Regularized Newton method for unconstrained convex optimization”. In: Mathematical programming 120.1 (2009), pp. 125–145

2009
[37]

Challenges in Training PINNs: A Loss Landscape Perspective

P. Rathore, W. Lei, Z. Frangella, L. Lu, and M. Udell. “Challenges in Training PINNs: A Loss Landscape Perspective”. In: International Conference on Machine Learning . 2024, pp. 42159–42191

2024
[38]

Higher derivatives of conjugate convex functions

R. T. Rockafellar. “Higher derivatives of conjugate convex functions”. In: Journal of Applied Analysis 1.1 (1977), pp. 41–43

1977
[39]

R. T. Rockafellar and R. J. Wets. Variational analysis. Springer, 1998

1998
[40]

A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization

C. W. Royer, M. O’Neill, and S. J. Wright. “A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization”. In: Mathematical Programming 180.1 (2020), pp. 451–488

2020
[41]

A simple and efficient algorithm for nonlinear model predictive control

L. Stella, A. Themelis, P. Sopasakis, and P. Patrinos. “A simple and efficient algorithm for nonlinear model predictive control”. In: Conference on Decision and Control . 2017, pp. 1939–1944

2017
[42]

Why gradient clipping accelerates training: A theoretical justification for adaptivity

J. Zhang, T. He, S. Sra, and A. Jadbabaie. “Why gradient clipping accelerates training: A theoretical justification for adaptivity”. In: arXiv preprint arXiv:1905.11881 (2019)

work page arXiv 1905
[43]

A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees

Y. Zhou, J. Xu, B. Li, C. Bao, C. Ding, and J. Zhu. “A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees”. In: Advances in Neural Information Processing Systems . 2025, pp. 73447–73502

2025
[44]

A hybrid inexact regularized Newton and negative curvature method

H. Zhu and Y. Xiao. “A hybrid inexact regularized Newton and negative curvature method”. In: Computational Optimization and Applications 88.3 (2024), pp. 849–870. 16 A Additional Results Lemma A.1 (Stationarity). Suppose that Assumption 1.1 holds. Then, (i) x⋆ ∈ Rn is a stationary point of f, i.e., ∇f(x⋆) = 0, if and only if ∇ϕ∗(∇f(x⋆)) = 0 (ii) ϕ(z) ≥ 0 ...

2024