Technical results on the convergence of quasi-Newton methods for nonsmooth optimization
Pith reviewed 2026-05-18 01:34 UTC · model grok-4.3
The pith
Assumptions on quasi-Newton matrix eigenvalues suffice to prove convergence to critical points for piecewise differentiable functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If the eigenvalues of the sequence of quasi-Newton matrices satisfy certain limiting conditions motivated by experiments—specifically that some eigenvalues approach zero at appropriate rates—then the algorithm generates a sequence converging to a critical point of the nonsmooth objective, and the method is capable of exploring the piecewise structure by adapting to different smooth regions.
What carries the argument
The limiting behavior of the eigenvalues of the quasi-Newton matrices generated by the secant updates, which controls the curvature approximation in nonsmooth settings.
Load-bearing premise
The specific pattern of eigenvalue vanishing in the quasi-Newton matrix, as seen in numerical experiments, occurs in the general case for the functions under consideration.
What would settle it
Numerical runs of BFGS on a range of piecewise differentiable test functions where the eigenvalues fail to follow the assumed vanishing pattern while the method fails to reach a critical point.
read the original abstract
It is well-known by now that the BFGS method is an effective method for minimizing nonsmooth functions. However, despite its popularity, theoretical convergence results are almost non-existent. One of the difficulties when analyzing the nonsmooth case is the fact that the secant equation forces certain eigenvalues of the quasi-Newton matrix to vanish, which is a behavior that has not yet been fully analyzed. In this article, we show what kind of behavior of the eigenvalues would be sufficient to be able to prove the convergence for piecewise differentiable functions. More precisely, we derive assumptions on the behavior from numerical experiments and then prove criticality of the limit under these assumptions. Furthermore, we show how quasi-Newton methods are able to explore the piecewise structure. While we do not prove that the observed behavior of the eigenvalues actually occurs, we believe that these results still give insight, and a certain intuition, for the convergence for nonsmooth functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that, under assumptions on the limiting eigenvalue behavior of quasi-Newton matrices (certain eigenvalues vanishing in a controlled manner due to the secant equation) that are extracted from numerical experiments, BFGS-type methods converge to critical points when applied to piecewise differentiable functions and are able to explore the underlying piecewise structure. The authors derive sufficient conditions for criticality from these observed behaviors, prove the corresponding convergence statements, and explicitly note that they do not establish that the eigenvalue assumptions hold for the actual sequence of matrices generated by the algorithm on the target function class.
Significance. If the eigenvalue assumptions hold for the functions of interest, the work supplies a useful conditional framework that explains the practical success of quasi-Newton methods on nonsmooth problems and identifies the precise spectral behavior needed for a convergence proof. This addresses a long-standing theoretical gap and provides intuition that could guide future efforts to verify or relax the assumptions, potentially yielding the first rigorous convergence guarantees for BFGS on nonsmooth piecewise differentiable objectives.
major comments (2)
- [Assumptions section (near the statement of the eigenvalue vanishing conditions)] The central convergence result (presumably Theorem 3.1 or equivalent in the proof section) is conditional on eigenvalue assumptions whose validity is supported only by numerical observation rather than derivation or sufficient conditions on the problem class. This leaves a gap between the proven statement and the practical claim that the method converges for piecewise differentiable functions.
- [Abstract and §1] The paper correctly acknowledges that the observed eigenvalue behavior is not shown to occur in general, yet the abstract and introduction frame the results as giving insight into convergence for nonsmooth functions. A more precise statement of the scope—explicitly limiting the claim to the conditional case—would strengthen the manuscript.
minor comments (2)
- [Throughout] Notation for the limiting eigenvalue sets and the piecewise active manifolds could be introduced earlier and used consistently to improve readability of the proofs.
- [Numerical experiments] The numerical experiments section would benefit from a brief table summarizing the observed eigenvalue decay rates across the test problems.
Simulated Author's Rebuttal
We thank the referee for the careful review and for acknowledging the value of our conditional framework in addressing a theoretical gap for quasi-Newton methods on nonsmooth problems. We respond to each major comment below and agree to revisions that improve precision without altering the manuscript's core contribution.
read point-by-point responses
-
Referee: [Assumptions section (near the statement of the eigenvalue vanishing conditions)] The central convergence result (presumably Theorem 3.1 or equivalent in the proof section) is conditional on eigenvalue assumptions whose validity is supported only by numerical observation rather than derivation or sufficient conditions on the problem class. This leaves a gap between the proven statement and the practical claim that the method converges for piecewise differentiable functions.
Authors: We agree that the central result is conditional on the eigenvalue assumptions, which are motivated by and extracted from numerical experiments rather than derived as necessary conditions for the problem class. The manuscript already states this limitation explicitly in the abstract ('we do not prove that the observed behavior of the eigenvalues actually occurs') and in the discussion following the assumptions. The contribution lies in characterizing the precise spectral behavior that suffices to prove criticality and piecewise exploration, thereby supplying intuition for the observed practical success of BFGS on nonsmooth objectives. We do not claim or prove unconditional convergence. To address the noted gap in presentation, we will revise the assumptions section to more prominently label these as sufficient conditions derived from numerics and to avoid any phrasing that could be read as implying the assumptions hold in general. revision: yes
-
Referee: [Abstract and §1] The paper correctly acknowledges that the observed eigenvalue behavior is not shown to occur in general, yet the abstract and introduction frame the results as giving insight into convergence for nonsmooth functions. A more precise statement of the scope—explicitly limiting the claim to the conditional case—would strengthen the manuscript.
Authors: We thank the referee for this suggestion on framing. While the current abstract and introduction already note the lack of a general proof for the eigenvalue behavior and position the work as providing insight rather than a full convergence guarantee, we agree that the language can be tightened to more explicitly restrict claims to the conditional setting. We will revise the abstract and §1 to state that the results identify sufficient conditions on eigenvalue vanishing (observed numerically) under which convergence to criticality holds for piecewise differentiable functions, and that these conditions offer intuition for why quasi-Newton methods succeed in practice, without suggesting that the assumptions are verified or that convergence is proven unconditionally. revision: yes
Circularity Check
No circularity: conditional convergence proof under numerically motivated assumptions
full rationale
The paper derives sufficient conditions on the limiting eigenvalue behavior of quasi-Newton matrices from numerical experiments, then proves that these conditions imply convergence to a critical point and exploration of piecewise structure for piecewise differentiable functions. It explicitly states that it does not prove the observed eigenvalue behavior occurs in general. This is a standard conditional implication result with no reduction of the central claim to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The derivation chain is self-contained as an implication under externally motivated assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Piecewise differentiable functions admit a well-defined notion of criticality via the Clarke subdifferential.
- domain assumption The secant equation in the quasi-Newton update forces certain eigenvalues toward zero in the nonsmooth case.
Reference graph
Works this paper leans on
-
[1]
Technical Report ANL-5990, Argonne National Lab., Lemont, Ill
Davidon, W.C.: Variable Metric Method for Minimization. Technical Report ANL-5990, Argonne National Lab., Lemont, Ill. (1959) https://doi.org/10.2172/ 4252678
work page 1959
-
[2]
Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York, NY (2006). https://doi.org/10.1007/978-0-387-40065-5
-
[3]
Asl, A., Overton, M.L.: Analysis of the gradient method with an Armijo–Wolfe line search on a class of non-smooth convex functions. Optimization Methods and 17 Software35, 223–242 (2019) https://doi.org/10.1080/10556788.2019.1673388
-
[4]
Bagirov, A., Karmitsa, N., M¨ akel¨ a, M.M.: Introduction to Nonsmooth Optimiza- tion. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08114-4
-
[5]
SIAM Journal on Optimization15, 751–779 (2005) https://doi.org/10.1137/030601296
Burke, J.V., Lewis, A.S., Overton, M.L.: A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization. SIAM Journal on Optimization15, 751–779 (2005) https://doi.org/10.1137/030601296
-
[6]
Lemar´ echal, C.: Numerical experiments in nonsmooth optimization. In: Nurmin- ski, E.A. (ed.) Progress in Nondifferentiable Optimization, Laxenburg, Austria, pp. 61–84 (1982). International Institute for Applied Systems Analysis (IIASA)
work page 1982
-
[7]
Mathematical Programming141, 135–163 (2013) https://doi.org/10.1007/ s10107-012-0514-2
Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton meth- ods. Mathematical Programming141, 135–163 (2013) https://doi.org/10.1007/ s10107-012-0514-2
work page 2013
-
[8]
SIAM Journal on Optimization28(2), 1301–1311 (2018) https://doi
Guo, J., Lewis, A.S.: Nonsmooth Variants of Powell’s BFGS Convergence Theorem. SIAM Journal on Optimization28(2), 1301–1311 (2018) https://doi. org/10.1137/17m1121883
-
[9]
Journal of Optimization Theory and Applications165(1), 151–171 (2015) https://doi
Lewis, A.S., Zhang, S.: Nonsmoothness and a Variable Metric Method. Journal of Optimization Theory and Applications165(1), 151–171 (2015) https://doi. org/10.1007/s10957-014-0622-7
-
[10]
Xie, Y., W¨ achter, A.: On the convergence of BFGS on a class of piecewise lin- ear non-smooth functions. arXiv (2017). https://doi.org/10.48550/ARXIV.1712. 08571
-
[11]
IMA Journal of Numerical Analysis41(1), 1–27 (2020) https://doi.org/10.1093/imanum/drz052
Asl, A., Overton, M.L.: Analysis of limited-memory BFGS on a class of nons- mooth convex functions. IMA Journal of Numerical Analysis41(1), 1–27 (2020) https://doi.org/10.1093/imanum/drz052
-
[12]
Asl, A., Overton, M.L.: Behavior of Limited Memory BFGS When Applied to Nonsmooth Functions and Their Nesterov Smoothings, pp. 25–55. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72040-7 2
-
[13]
Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, New York, NY (2012). https://doi.org/10.1007/978-1-4614-4340-7
-
[14]
Brøndsted, A.: An Introduction to Convex Polytopes. Springer, New York, NY (1983). https://doi.org/10.1007/978-1-4612-1148-8
-
[15]
Ulbrich, M.: Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces. Habilitation, Fakult¨ at f¨ ur Mathematik, Technische Universit¨ at M¨ unchen, M¨ unchen, Germany (2002)
work page 2002
-
[16]
A Geometric Heuristic for Rectilinear Crossing Minimization
Clarke, F.H.: Optimization and Nonsmooth Analysis. Society for Indus- trial and Applied Mathematics, Philadelphia (1990). https://doi.org/10.1137/1. 9781611971309
work page doi:10.1137/1 1990
-
[17]
SIAM Jour- nal on Optimization13(3), 693–701 (2002) https://doi.org/10.1137/ s1052623401383455
Dai, Y.-H.: Convergence Properties of the BFGS Algoritm. SIAM Jour- nal on Optimization13(3), 693–701 (2002) https://doi.org/10.1137/ s1052623401383455
work page 2002
-
[18]
Mathematical Programming 138, 501–530 (2013) https://doi.org/10.1007/s10107-012-0522-2
Dai, Y.-H.: A perfect example for the BFGS method. Mathematical Programming 138, 501–530 (2013) https://doi.org/10.1007/s10107-012-0522-2
-
[19]
A simple Newton method for local nonsmooth optimization
Lewis, A., Wylie, C.: A simple Newton method for local nonsmooth optimization. arXiv (2019). https://doi.org/10.48550/ARXIV.1907.11742 18
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.11742 2019
-
[20]
Liu, S., Sagastiz´ abal, C.: Beyond First Order: VU-Decomposition Methods, pp. 297–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34910-3 9 . Numerical Nonsmooth Optimization
-
[21]
Mifflin, R., Sagastiz´ abal, C.: VU-Decomposition Derivatives for Convex Max- Functions, pp. 167–186. Springer, Berlin, Heidelberg (1999). https://doi.org/10. 1007/978-3-642-45780-7 11
work page 1999
-
[22]
SIAM Journal on Opti- mization13(3), 702–725 (2002) https://doi.org/10.1137/s1052623401387623
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM Journal on Opti- mization13(3), 702–725 (2002) https://doi.org/10.1137/s1052623401387623
- [23]
-
[24]
Axler, S.: Linear Algebra Done Right. Springer, Cham (2024). https://doi.org/ 10.1007/978-3-031-41026-0
-
[25]
Mahdavi-Amiri, N., Yousefpour, R.: An effective nonsmooth optimization algo- rithm for locally lipschitz functions. Journal of Optimization Theory and Applications155, 180–195 (2012) https://doi.org/10.1007/s10957-012-0024-7
-
[26]
Gebken, B.: A note on the convergence of deterministic gradient sampling in nonsmooth optimization. Computational Optimization and Applications88, 151– 165 (2024) https://doi.org/10.1007/s10589-024-00552-0
-
[27]
SIAM Journal on Optimization30(1), 182–209 (2020) https://doi.org/10.1137/ 19m1240794
Xie, Y., Byrd, R.H., Nocedal, J.: Analysis of the BFGS Method with Errors. SIAM Journal on Optimization30(1), 182–209 (2020) https://doi.org/10.1137/ 19m1240794
work page 2020
-
[28]
SIAM Journal on Numerical Analysis26(3), 727–739 (1989) https://doi.org/10.1137/0726042 19
Byrd, R.H., Nocedal, J.: A Tool for the Analysis of Quasi-Newton Methods with Application to Unconstrained Minimization. SIAM Journal on Numerical Analysis26(3), 727–739 (1989) https://doi.org/10.1137/0726042 19
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.