Quasi-Newton and Krylov Methods for the Solution of Nonconvex Trust-Region Subproblems
Pith reviewed 2026-05-25 04:02 UTC · model grok-4.3
The pith
LBFGS and DIOM solve nonconvex trust-region subproblems more robustly than CG with fewer Hessian-vector products.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generalizing LBFGS and DIOM to trust-region subproblems yields solvers that remain consistently more robust than CG on potentially nonconvex problems, deliver comparable accuracy with fewer Hessian-vector products, and serve as practical alternatives precisely when high accuracy is required or Hessian operations are expensive. The limited-memory SR1 variant loses effectiveness because discarding curvature information harms performance, whereas its full-memory form stays competitive.
What carries the argument
Limited-memory BFGS (LBFGS) and limited-memory full orthogonalization method (DIOM) as memory-leveraged solvers generalized from positive-definite linear systems to nonconvex trust-region subproblems.
If this is right
- Memory acts as the decisive algorithmic lever that improves robustness over CG for trust-region steps.
- LBFGS and DIOM become preferable when high accuracy is needed or when each Hessian-vector product is costly.
- Limited-memory SR1 remains viable only in full-memory form because limited memory discards essential curvature information.
- The methods apply directly to unconstrained optimization problems whose Hessians may lose positive definiteness.
Where Pith is reading between the lines
- If the observed reduction in Hessian-vector products holds at larger scale, these solvers could lower the per-iteration cost of second-order methods in high-dimensional nonconvex problems.
- The same memory-lever mechanism might transfer to other iterative linear solvers used inside optimization frameworks.
- Explicit comparison on problems with known indefinite Hessians would isolate whether the robustness advantage grows or shrinks with increasing nonconvexity.
Load-bearing premise
Relationships and termination properties established for positive-definite linear systems extend directly to nonconvex trust-region settings without loss of robustness or need for extra safeguards.
What would settle it
A collection of nonconvex test problems on which LBFGS or DIOM either fails to converge or requires strictly more Hessian-vector products than CG to reach the same accuracy would falsify the robustness claim.
Figures
read the original abstract
We study the solution of symmetric positive-definite linear systems by way of families of full- and limited-memory methods. Our contributions are threefold. We first derive new relationships between the conjugate-gradient method (CG) and quasi-Newton methods of the Broyden class that refine existing results, and clarify when those methods generate the same iterates and enjoy quadratic termination. We extend this perspective to the limited-memory BFGS (LBFGS) method. Next, we examine how DIOM, a limited-memory variant of the full orthogonalization Krylov method (FOM), is akin to LBFGS in that it provides a memory lever that is critical in practical performance. Finally, we generalize LBFGS and DIOM to the computation of trust-region steps for unconstrained, potentially nonconvex, optimization. We report numerical experience on positive-definite linear systems and unconstrained optimization problems. The results show that memory is a key algorithmic lever: LBFGS and DIOM are consistently more robust than CG and often achieve comparable accuracy with fewer Hessian-vector products. They emerge as viable alternatives to CG when high accuracy is desirable or when operations with the Hessian are at a premium. The limited-memory SR1 (LSR1) method can be competitive in full-memory form, but its limited-memory variant suffers from discarded curvature information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives refined relationships between CG and Broyden-class quasi-Newton methods (including quadratic termination conditions) for symmetric positive-definite linear systems, extends the perspective to LBFGS, draws an analogy with the limited-memory DIOM Krylov method, and generalizes LBFGS and DIOM to the solution of trust-region subproblems arising in unconstrained optimization that may be nonconvex. Numerical results on positive-definite systems and optimization problems indicate that the limited-memory methods are more robust than CG while often requiring fewer Hessian-vector products for comparable accuracy.
Significance. If the generalization to the nonconvex trust-region setting is valid, the work supplies practical alternatives to CG when high accuracy is required or Hessian-vector products are expensive, with memory length identified as a key performance lever. The theoretical refinements of CG/quasi-Newton relationships and the explicit comparison with DIOM constitute clear contributions to the literature on iterative solvers for optimization subproblems.
major comments (2)
- [Abstract, final paragraph; generalization section] Abstract (final paragraph) and the generalization section: the numerical experience is confined to positive-definite linear systems and unconstrained problems; no indication is given that the test set includes indefinite Hessians. Because the central claim concerns robustness of the generalized LBFGS/DIOM methods on potentially nonconvex trust-region subproblems, the absence of negative-curvature test cases leaves the robustness advantage unverified for the stated setting.
- [Generalization section] Generalization section: the positive-definite relationships and termination properties are shown to carry over to LBFGS and DIOM, but the manuscript does not supply an explicit negative-curvature safeguard (analogous to the Steihaug modification of CG) while retaining the limited-memory structure. Without this adaptation, the claim that the methods remain more robust than CG with fewer Hessian-vector products does not automatically follow from the SPD analysis.
minor comments (2)
- [Numerical experiments section] Numerical section: the abstract reports numerical experience but supplies no details on problem dimensions, number of instances, exclusion criteria, or performance metrics (e.g., success rates, iteration counts). These should be stated explicitly to allow reproduction and assessment of the robustness claim.
- [Throughout] Notation: the distinction between full-memory and limited-memory variants is clear in the text, but a short table summarizing the memory parameter m for each method would improve readability when comparing LBFGS, LSR1, and DIOM.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. The comments highlight important points regarding the scope of the numerical results and the explicit treatment of negative curvature. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract, final paragraph; generalization section] Abstract (final paragraph) and the generalization section: the numerical experience is confined to positive-definite linear systems and unconstrained problems; no indication is given that the test set includes indefinite Hessians. Because the central claim concerns robustness of the generalized LBFGS/DIOM methods on potentially nonconvex trust-region subproblems, the absence of negative-curvature test cases leaves the robustness advantage unverified for the stated setting.
Authors: We agree that the reported numerical experiments are performed exclusively on positive-definite linear systems and optimization problems with positive-definite Hessians. The theoretical generalization of LBFGS and DIOM to trust-region subproblems is formulated for the potentially nonconvex case, but the empirical validation of robustness is limited to the SPD setting. We will revise the abstract and the generalization section to explicitly state the scope of the numerical results, clarify that the robustness advantage is demonstrated for positive-definite instances, and note that the methods are designed to extend to indefinite Hessians via the trust-region framework. This will prevent any overstatement of the empirical claims. revision: partial
-
Referee: [Generalization section] Generalization section: the positive-definite relationships and termination properties are shown to carry over to LBFGS and DIOM, but the manuscript does not supply an explicit negative-curvature safeguard (analogous to the Steihaug modification of CG) while retaining the limited-memory structure. Without this adaptation, the claim that the methods remain more robust than CG with fewer Hessian-vector products does not automatically follow from the SPD analysis.
Authors: The trust-region constraint itself serves as the primary safeguard against negative curvature by restricting the step length, and the limited-memory methods are generalized by embedding the LBFGS/DIOM iterations inside a trust-region solver that monitors the model decrease and curvature. This retains the limited-memory structure while inheriting the robustness properties of trust-region methods. We will expand the generalization section with an explicit description of this negative-curvature handling (including how the memory-limited updates interact with the trust-region boundary) to make the connection to Steihaug-type safeguards clear and to strengthen the justification for the robustness claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper first derives relationships between CG and Broyden-class methods (including quadratic termination) for symmetric positive-definite systems, extends the perspective to LBFGS and DIOM, and then generalizes those limited-memory solvers to trust-region subproblems. These are explicit algorithmic constructions resting on standard linear-algebra identities rather than any self-definition, parameter fitting renamed as prediction, or load-bearing self-citation. Numerical experience is reported separately as validation and does not serve as the derivation's foundation. No equation or claim reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The Hessian matrix is symmetric
Reference graph
Works this paper leans on
-
[1]
W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem.Q. Appl. Math., 9:17–29, 1951. cited on pp. 2, 3, and 24
work page 1951
-
[2]
C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations.IMA J. Appl. Math., 6(1):76–90, 03 1970.cited on pp. 2, 3, 6, and 8
work page 1970
-
[3]
C. G. Broyden. The convergence of a class of double-rank minimization algorithms: 2. the new algorithm.IMA J. Appl. Math., 6(3):222–231, 1970. cited on p. 4
work page 1970
-
[4]
A. R. Conn, N. I. M. Gould, and P. L. Toint.Trust-Region Methods, volume 1 ofMPS/SIAM Series on Optimization. SIAM, Philadelphia, 2000. cited on p. 19
work page 2000
-
[5]
M.-A. Dahito and D. Orban. The conjugate residual method in linesearch and trust-region methods.SIAM J. Optim., 29(3):1988–2025, 2019. cited on p. 24
work page 1988
-
[6]
W. C. Davidon. Variable metric method for minimization.SIAM J. Optim., 1(1):1–17, 1991.cited on p. 4
work page 1991
-
[7]
R. S. Dembo and T. Steihaug. Truncated-Newton algorithms for large-scale unconstrained optimization.Math. Program., 26(2):190–212, 1983. cited on p. 1
work page 1983
-
[8]
J. E. Dennis and J. J. Mor´ e. A characterization of superlinear convergence and its application to quasi-newton methods.Math. Comp., 28(126):549–560, 1974.cited on p. 4
work page 1974
-
[9]
J. E. Dennis and J. J. Mor´ e. Quasi-Newton methods, motivation and theory.SIAM Rev., 19: 46–89, 1977. cited on pp. 4 and 5
work page 1977
-
[10]
L. Dixon. Quasi-newton algorithms generate identical points.Math. Program., 2(1):383, 1972. cited on p. 2
work page 1972
- [11]
- [12]
-
[13]
R. Fletcher and M. J. Powell. A rapidly convergent descent method for minimization.The computer journal, 6(2):163–168, 1963. cited on p. 4
work page 1963
-
[14]
A. Forsgren and T. Odland. On exact linesearch quasi-newton methods for minimizing a quadratic function.Comput. Optim. Appl., 69(1):225–241, Jan 2018.cited on p. 3
work page 2018
- [15]
-
[16]
G. H. Golub and C. F. Van Loan.Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013. cited on p. 6
work page 2013
-
[17]
M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems.J. Res. Natl. Bur. Stand., 49(6):565–583, 1952. cited on pp. 1, 2, 6, and 24
work page 1952
-
[18]
D. S. Horst. The Lanczos algorithm with partial reorthogonalization.Math. Comp., 42:115–142,
-
[19]
C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators.J. Res. Natl. Bur. Stand., 45:225–280, 1950.cited on pp. 1, 15, and 24
work page 1950
-
[20]
D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45:503–528, 1989. cited on p. 12
work page 1989
-
[21]
E. N. Lorenz. Predictability: A problem partly solved. InProc. Seminar on predictability, volume 1. Reading, 1996. cited on p. 23
work page 1996
- [22]
-
[23]
A. Montoison and D. Orban. Krylov.jl: A Julia basket of hand-picked Krylov methods.J. Open Source Softw., 8(89):5187, 2023. cited on p. 21
work page 2023
- [24]
-
[25]
J. Nocedal. Updating quasi-Newton matrices with limited storage.Math. Comp., 35:773–782,
-
[26]
J. Nocedal and S. J. Wright.Numerical Optimization. Springer, New York, 2nd edition, 2006. cited on p. 5
work page 2006
-
[27]
D. Orban and M. Arioli.Iterative Solution of Symmetric Quasi-Definite Linear Systems, volume 3 ofSpotlights. SIAM, Philadelphia, USA, 2017.cited on p. 6
work page 2017
-
[28]
D. Orban and A. Soares Siqueira. Linearoperators. jl.Zenodo, 2019.cited on p. 21
work page 2019
-
[29]
C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations.SIAM J. Numer. Anal., 12(4):617–629, 1975. cited on pp. 15 and 24
work page 1975
-
[30]
Saad.Iterative Methods for Sparse Linear Systems
Y. Saad.Iterative Methods for Sparse Linear Systems. SIAM, second edition, 2003.cited on p. 18
work page 2003
-
[31]
Y. Saad and M. Schultz. GMRES, a generalized minimal residual algorithm for solving nonsym- Cahier du GERAD G-2026-26 26 [toc] metric linear systems.SIAM J. Sci. and Statist. Comput., 7:856–869, 1986.cited on p. 24
work page 2026
-
[32]
Y. Saad and K. Wu. DQGMRES: a direct quasi-minimal residual algorithm based on incomplete orthogonalization.Numer. Linear Algebra Appl., 3(4):329–343, 1996.cited on p. 24
work page 1996
-
[33]
D. F. Shanno. Conditioning of quasi-newton methods for function minimization.Math. Comp., 24(111):647–656, 1970. cited on p. 4
work page 1970
-
[34]
D. F. Shanno. Conjugate gradient methods with inexact searches.Math. Oper. Res., 3(3): 244–256, 1978. cited on p. 3
work page 1978
- [35]
-
[36]
R. J. Vanderbei. Symmetric quasi-definite matrices.SIAM J. Optim., 5(1):100–113, 1995.cited on p. 6 Contents. 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Exact line search methods for quadratic functions . . . . . . . . . . . 4 2.2 M...
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.