pith. sign in

arxiv: 2605.23761 · v1 · pith:XFPT5LRSnew · submitted 2026-05-22 · 🧮 math.OC

Quasi-Newton and Krylov Methods for the Solution of Nonconvex Trust-Region Subproblems

Pith reviewed 2026-05-25 04:02 UTC · model grok-4.3

classification 🧮 math.OC
keywords quasi-Newton methodsKrylov subspace methodstrust-region subproblemslimited-memory BFGSconjugate gradientnonconvex optimizationDIOMtrust-region methods
0
0 comments X

The pith

LBFGS and DIOM solve nonconvex trust-region subproblems more robustly than CG with fewer Hessian-vector products.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives relationships showing when conjugate gradient and Broyden-class quasi-Newton methods produce identical iterates with quadratic termination on positive-definite systems, then extends the same perspective to limited-memory BFGS. It positions the limited-memory full orthogonalization method DIOM as a close analogue to LBFGS through its memory parameter. These two limited-memory approaches are generalized to the computation of trust-region steps on unconstrained problems whose Hessians may be indefinite. Numerical tests indicate that the memory lever improves robustness over plain CG while often matching accuracy at lower cost in Hessian-vector products.

Core claim

Generalizing LBFGS and DIOM to trust-region subproblems yields solvers that remain consistently more robust than CG on potentially nonconvex problems, deliver comparable accuracy with fewer Hessian-vector products, and serve as practical alternatives precisely when high accuracy is required or Hessian operations are expensive. The limited-memory SR1 variant loses effectiveness because discarding curvature information harms performance, whereas its full-memory form stays competitive.

What carries the argument

Limited-memory BFGS (LBFGS) and limited-memory full orthogonalization method (DIOM) as memory-leveraged solvers generalized from positive-definite linear systems to nonconvex trust-region subproblems.

If this is right

  • Memory acts as the decisive algorithmic lever that improves robustness over CG for trust-region steps.
  • LBFGS and DIOM become preferable when high accuracy is needed or when each Hessian-vector product is costly.
  • Limited-memory SR1 remains viable only in full-memory form because limited memory discards essential curvature information.
  • The methods apply directly to unconstrained optimization problems whose Hessians may lose positive definiteness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the observed reduction in Hessian-vector products holds at larger scale, these solvers could lower the per-iteration cost of second-order methods in high-dimensional nonconvex problems.
  • The same memory-lever mechanism might transfer to other iterative linear solvers used inside optimization frameworks.
  • Explicit comparison on problems with known indefinite Hessians would isolate whether the robustness advantage grows or shrinks with increasing nonconvexity.

Load-bearing premise

Relationships and termination properties established for positive-definite linear systems extend directly to nonconvex trust-region settings without loss of robustness or need for extra safeguards.

What would settle it

A collection of nonconvex test problems on which LBFGS or DIOM either fails to converge or requires strictly more Hessian-vector products than CG to reach the same accuracy would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2605.23761 by Dominique Orban, Jean-Pierre Dussault, Johann Bourhis, Oihan Cordelier, Oussama Mouhtal.

Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
read the original abstract

We study the solution of symmetric positive-definite linear systems by way of families of full- and limited-memory methods. Our contributions are threefold. We first derive new relationships between the conjugate-gradient method (CG) and quasi-Newton methods of the Broyden class that refine existing results, and clarify when those methods generate the same iterates and enjoy quadratic termination. We extend this perspective to the limited-memory BFGS (LBFGS) method. Next, we examine how DIOM, a limited-memory variant of the full orthogonalization Krylov method (FOM), is akin to LBFGS in that it provides a memory lever that is critical in practical performance. Finally, we generalize LBFGS and DIOM to the computation of trust-region steps for unconstrained, potentially nonconvex, optimization. We report numerical experience on positive-definite linear systems and unconstrained optimization problems. The results show that memory is a key algorithmic lever: LBFGS and DIOM are consistently more robust than CG and often achieve comparable accuracy with fewer Hessian-vector products. They emerge as viable alternatives to CG when high accuracy is desirable or when operations with the Hessian are at a premium. The limited-memory SR1 (LSR1) method can be competitive in full-memory form, but its limited-memory variant suffers from discarded curvature information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper derives refined relationships between CG and Broyden-class quasi-Newton methods (including quadratic termination conditions) for symmetric positive-definite linear systems, extends the perspective to LBFGS, draws an analogy with the limited-memory DIOM Krylov method, and generalizes LBFGS and DIOM to the solution of trust-region subproblems arising in unconstrained optimization that may be nonconvex. Numerical results on positive-definite systems and optimization problems indicate that the limited-memory methods are more robust than CG while often requiring fewer Hessian-vector products for comparable accuracy.

Significance. If the generalization to the nonconvex trust-region setting is valid, the work supplies practical alternatives to CG when high accuracy is required or Hessian-vector products are expensive, with memory length identified as a key performance lever. The theoretical refinements of CG/quasi-Newton relationships and the explicit comparison with DIOM constitute clear contributions to the literature on iterative solvers for optimization subproblems.

major comments (2)
  1. [Abstract, final paragraph; generalization section] Abstract (final paragraph) and the generalization section: the numerical experience is confined to positive-definite linear systems and unconstrained problems; no indication is given that the test set includes indefinite Hessians. Because the central claim concerns robustness of the generalized LBFGS/DIOM methods on potentially nonconvex trust-region subproblems, the absence of negative-curvature test cases leaves the robustness advantage unverified for the stated setting.
  2. [Generalization section] Generalization section: the positive-definite relationships and termination properties are shown to carry over to LBFGS and DIOM, but the manuscript does not supply an explicit negative-curvature safeguard (analogous to the Steihaug modification of CG) while retaining the limited-memory structure. Without this adaptation, the claim that the methods remain more robust than CG with fewer Hessian-vector products does not automatically follow from the SPD analysis.
minor comments (2)
  1. [Numerical experiments section] Numerical section: the abstract reports numerical experience but supplies no details on problem dimensions, number of instances, exclusion criteria, or performance metrics (e.g., success rates, iteration counts). These should be stated explicitly to allow reproduction and assessment of the robustness claim.
  2. [Throughout] Notation: the distinction between full-memory and limited-memory variants is clear in the text, but a short table summarizing the memory parameter m for each method would improve readability when comparing LBFGS, LSR1, and DIOM.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The comments highlight important points regarding the scope of the numerical results and the explicit treatment of negative curvature. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract, final paragraph; generalization section] Abstract (final paragraph) and the generalization section: the numerical experience is confined to positive-definite linear systems and unconstrained problems; no indication is given that the test set includes indefinite Hessians. Because the central claim concerns robustness of the generalized LBFGS/DIOM methods on potentially nonconvex trust-region subproblems, the absence of negative-curvature test cases leaves the robustness advantage unverified for the stated setting.

    Authors: We agree that the reported numerical experiments are performed exclusively on positive-definite linear systems and optimization problems with positive-definite Hessians. The theoretical generalization of LBFGS and DIOM to trust-region subproblems is formulated for the potentially nonconvex case, but the empirical validation of robustness is limited to the SPD setting. We will revise the abstract and the generalization section to explicitly state the scope of the numerical results, clarify that the robustness advantage is demonstrated for positive-definite instances, and note that the methods are designed to extend to indefinite Hessians via the trust-region framework. This will prevent any overstatement of the empirical claims. revision: partial

  2. Referee: [Generalization section] Generalization section: the positive-definite relationships and termination properties are shown to carry over to LBFGS and DIOM, but the manuscript does not supply an explicit negative-curvature safeguard (analogous to the Steihaug modification of CG) while retaining the limited-memory structure. Without this adaptation, the claim that the methods remain more robust than CG with fewer Hessian-vector products does not automatically follow from the SPD analysis.

    Authors: The trust-region constraint itself serves as the primary safeguard against negative curvature by restricting the step length, and the limited-memory methods are generalized by embedding the LBFGS/DIOM iterations inside a trust-region solver that monitors the model decrease and curvature. This retains the limited-memory structure while inheriting the robustness properties of trust-region methods. We will expand the generalization section with an explicit description of this negative-curvature handling (including how the memory-limited updates interact with the trust-region boundary) to make the connection to Steihaug-type safeguards clear and to strengthen the justification for the robustness claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper first derives relationships between CG and Broyden-class methods (including quadratic termination) for symmetric positive-definite systems, extends the perspective to LBFGS and DIOM, and then generalizes those limited-memory solvers to trust-region subproblems. These are explicit algorithmic constructions resting on standard linear-algebra identities rather than any self-definition, parameter fitting renamed as prediction, or load-bearing self-citation. Numerical experience is reported separately as validation and does not serve as the derivation's foundation. No equation or claim reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of symmetric Hessians and the existence of trust-region subproblems; no free parameters, invented entities, or ad-hoc axioms are visible in the abstract.

axioms (1)
  • standard math The Hessian matrix is symmetric
    Invoked throughout the derivation of relationships between CG and quasi-Newton methods.

pith-pipeline@v0.9.0 · 5781 in / 1203 out tokens · 24736 ms · 2026-05-25T04:02:46.348349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem.Q. Appl. Math., 9:17–29, 1951. cited on pp. 2, 3, and 24

  2. [2]

    C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations.IMA J. Appl. Math., 6(1):76–90, 03 1970.cited on pp. 2, 3, 6, and 8

  3. [3]

    C. G. Broyden. The convergence of a class of double-rank minimization algorithms: 2. the new algorithm.IMA J. Appl. Math., 6(3):222–231, 1970. cited on p. 4

  4. [4]

    A. R. Conn, N. I. M. Gould, and P. L. Toint.Trust-Region Methods, volume 1 ofMPS/SIAM Series on Optimization. SIAM, Philadelphia, 2000. cited on p. 19

  5. [5]

    Dahito and D

    M.-A. Dahito and D. Orban. The conjugate residual method in linesearch and trust-region methods.SIAM J. Optim., 29(3):1988–2025, 2019. cited on p. 24

  6. [6]

    W. C. Davidon. Variable metric method for minimization.SIAM J. Optim., 1(1):1–17, 1991.cited on p. 4

  7. [7]

    R. S. Dembo and T. Steihaug. Truncated-Newton algorithms for large-scale unconstrained optimization.Math. Program., 26(2):190–212, 1983. cited on p. 1

  8. [8]

    J. E. Dennis and J. J. Mor´ e. A characterization of superlinear convergence and its application to quasi-newton methods.Math. Comp., 28(126):549–560, 1974.cited on p. 4

  9. [9]

    J. E. Dennis and J. J. Mor´ e. Quasi-Newton methods, motivation and theory.SIAM Rev., 19: 46–89, 1977. cited on pp. 4 and 5

  10. [10]

    L. Dixon. Quasi-newton algorithms generate identical points.Math. Program., 2(1):383, 1972. cited on p. 2

  11. [11]

    Ek and A

    D. Ek and A. Forsgren. Exact linesearch limited-memory quasi-Newton methods for minimizing a quadratic function.Comput. Optim. Appl., 79(3):789–816, 2021.cited on pp. 3 and 24

  12. [12]

    Fletcher

    R. Fletcher. A new approach to variable metric algorithms.The computer journal, 13(3): 317–322, 1970. cited on p. 4

  13. [13]

    Fletcher and M

    R. Fletcher and M. J. Powell. A rapidly convergent descent method for minimization.The computer journal, 6(2):163–168, 1963. cited on p. 4

  14. [14]

    Forsgren and T

    A. Forsgren and T. Odland. On exact linesearch quasi-newton methods for minimizing a quadratic function.Comput. Optim. Appl., 69(1):225–241, Jan 2018.cited on p. 3

  15. [15]

    Goldfarb

    D. Goldfarb. A family of variable-metric methods derived by variational means.Math. Comp., 24(109):23–26, 1970. cited on p. 4

  16. [16]

    G. H. Golub and C. F. Van Loan.Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013. cited on p. 6

  17. [17]

    M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems.J. Res. Natl. Bur. Stand., 49(6):565–583, 1952. cited on pp. 1, 2, 6, and 24

  18. [18]

    D. S. Horst. The Lanczos algorithm with partial reorthogonalization.Math. Comp., 42:115–142,

  19. [19]

    C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators.J. Res. Natl. Bur. Stand., 45:225–280, 1950.cited on pp. 1, 15, and 24

  20. [20]

    D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45:503–528, 1989. cited on p. 12

  21. [21]

    E. N. Lorenz. Predictability: A problem partly solved. InProc. Seminar on predictability, volume 1. Reading, 1996. cited on p. 23

  22. [22]

    Migot, D

    T. Migot, D. Monnet, D. Orban, and A. S. Siqueira. JSOSolvers.jl: Unconstrained and bound- constrained optimization solvers.J. Open Source Softw., 11(117):9467, 2026.cited on p. 21

  23. [23]

    Montoison and D

    A. Montoison and D. Orban. Krylov.jl: A Julia basket of hand-picked Krylov methods.J. Open Source Softw., 8(89):5187, 2023. cited on p. 21

  24. [24]

    Nazareth

    L. Nazareth. A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms.SIAM J. Numer. Anal., 16(5):794–800, 1979.cited on pp. 2 and 6

  25. [25]

    J. Nocedal. Updating quasi-Newton matrices with limited storage.Math. Comp., 35:773–782,

  26. [26]

    Nocedal and S

    J. Nocedal and S. J. Wright.Numerical Optimization. Springer, New York, 2nd edition, 2006. cited on p. 5

  27. [27]

    Orban and M

    D. Orban and M. Arioli.Iterative Solution of Symmetric Quasi-Definite Linear Systems, volume 3 ofSpotlights. SIAM, Philadelphia, USA, 2017.cited on p. 6

  28. [28]

    Orban and A

    D. Orban and A. Soares Siqueira. Linearoperators. jl.Zenodo, 2019.cited on p. 21

  29. [29]

    C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations.SIAM J. Numer. Anal., 12(4):617–629, 1975. cited on pp. 15 and 24

  30. [30]

    Saad.Iterative Methods for Sparse Linear Systems

    Y. Saad.Iterative Methods for Sparse Linear Systems. SIAM, second edition, 2003.cited on p. 18

  31. [31]

    Saad and M

    Y. Saad and M. Schultz. GMRES, a generalized minimal residual algorithm for solving nonsym- Cahier du GERAD G-2026-26 26 [toc] metric linear systems.SIAM J. Sci. and Statist. Comput., 7:856–869, 1986.cited on p. 24

  32. [32]

    Saad and K

    Y. Saad and K. Wu. DQGMRES: a direct quasi-minimal residual algorithm based on incomplete orthogonalization.Numer. Linear Algebra Appl., 3(4):329–343, 1996.cited on p. 24

  33. [33]

    D. F. Shanno. Conditioning of quasi-newton methods for function minimization.Math. Comp., 24(111):647–656, 1970. cited on p. 4

  34. [34]

    D. F. Shanno. Conjugate gradient methods with inexact searches.Math. Oper. Res., 3(3): 244–256, 1978. cited on p. 3

  35. [35]

    Steihaug

    T. Steihaug. The conjugate gradient method and trust regions in large scale optimization.SIAM J. Numer. Anal., 20(3):626–637, 1983. cited on pp. 1, 18, and 19

  36. [36]

    R. J. Vanderbei. Symmetric quasi-definite matrices.SIAM J. Optim., 5(1):100–113, 1995.cited on p. 6 Contents. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Exact line search methods for quadratic functions . . . . . . . . . . . 4 2.2 M...