pith. sign in

arxiv: 2511.03296 · v1 · submitted 2025-11-05 · 🧮 math.OC

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

Pith reviewed 2026-05-18 01:34 UTC · model grok-4.3

classification 🧮 math.OC
keywords quasi-Newton methodsBFGSnonsmooth optimizationconvergence analysispiecewise differentiable functionseigenvalue behaviorsecant equationcritical points
0
0 comments X

The pith

Assumptions on quasi-Newton matrix eigenvalues suffice to prove convergence to critical points for piecewise differentiable functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to identify the eigenvalue behavior in quasi-Newton matrices that would allow proving convergence of methods like BFGS when minimizing nonsmooth but piecewise differentiable functions. Readers should care because these methods perform well in practice yet lack strong theoretical backing due to how the secant updates force eigenvalues to zero. The authors extract plausible assumptions from numerical tests and show that under them the iterates reach a first-order critical point. They further establish that the updates can identify and explore the piecewise smooth structure of the function. This offers valuable intuition for future convergence analyses even though the assumptions are not yet proven to always hold.

Core claim

If the eigenvalues of the sequence of quasi-Newton matrices satisfy certain limiting conditions motivated by experiments—specifically that some eigenvalues approach zero at appropriate rates—then the algorithm generates a sequence converging to a critical point of the nonsmooth objective, and the method is capable of exploring the piecewise structure by adapting to different smooth regions.

What carries the argument

The limiting behavior of the eigenvalues of the quasi-Newton matrices generated by the secant updates, which controls the curvature approximation in nonsmooth settings.

Load-bearing premise

The specific pattern of eigenvalue vanishing in the quasi-Newton matrix, as seen in numerical experiments, occurs in the general case for the functions under consideration.

What would settle it

Numerical runs of BFGS on a range of piecewise differentiable test functions where the eigenvalues fail to follow the assumed vanishing pattern while the method fails to reach a critical point.

read the original abstract

It is well-known by now that the BFGS method is an effective method for minimizing nonsmooth functions. However, despite its popularity, theoretical convergence results are almost non-existent. One of the difficulties when analyzing the nonsmooth case is the fact that the secant equation forces certain eigenvalues of the quasi-Newton matrix to vanish, which is a behavior that has not yet been fully analyzed. In this article, we show what kind of behavior of the eigenvalues would be sufficient to be able to prove the convergence for piecewise differentiable functions. More precisely, we derive assumptions on the behavior from numerical experiments and then prove criticality of the limit under these assumptions. Furthermore, we show how quasi-Newton methods are able to explore the piecewise structure. While we do not prove that the observed behavior of the eigenvalues actually occurs, we believe that these results still give insight, and a certain intuition, for the convergence for nonsmooth functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that, under assumptions on the limiting eigenvalue behavior of quasi-Newton matrices (certain eigenvalues vanishing in a controlled manner due to the secant equation) that are extracted from numerical experiments, BFGS-type methods converge to critical points when applied to piecewise differentiable functions and are able to explore the underlying piecewise structure. The authors derive sufficient conditions for criticality from these observed behaviors, prove the corresponding convergence statements, and explicitly note that they do not establish that the eigenvalue assumptions hold for the actual sequence of matrices generated by the algorithm on the target function class.

Significance. If the eigenvalue assumptions hold for the functions of interest, the work supplies a useful conditional framework that explains the practical success of quasi-Newton methods on nonsmooth problems and identifies the precise spectral behavior needed for a convergence proof. This addresses a long-standing theoretical gap and provides intuition that could guide future efforts to verify or relax the assumptions, potentially yielding the first rigorous convergence guarantees for BFGS on nonsmooth piecewise differentiable objectives.

major comments (2)
  1. [Assumptions section (near the statement of the eigenvalue vanishing conditions)] The central convergence result (presumably Theorem 3.1 or equivalent in the proof section) is conditional on eigenvalue assumptions whose validity is supported only by numerical observation rather than derivation or sufficient conditions on the problem class. This leaves a gap between the proven statement and the practical claim that the method converges for piecewise differentiable functions.
  2. [Abstract and §1] The paper correctly acknowledges that the observed eigenvalue behavior is not shown to occur in general, yet the abstract and introduction frame the results as giving insight into convergence for nonsmooth functions. A more precise statement of the scope—explicitly limiting the claim to the conditional case—would strengthen the manuscript.
minor comments (2)
  1. [Throughout] Notation for the limiting eigenvalue sets and the piecewise active manifolds could be introduced earlier and used consistently to improve readability of the proofs.
  2. [Numerical experiments] The numerical experiments section would benefit from a brief table summarizing the observed eigenvalue decay rates across the test problems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for acknowledging the value of our conditional framework in addressing a theoretical gap for quasi-Newton methods on nonsmooth problems. We respond to each major comment below and agree to revisions that improve precision without altering the manuscript's core contribution.

read point-by-point responses
  1. Referee: [Assumptions section (near the statement of the eigenvalue vanishing conditions)] The central convergence result (presumably Theorem 3.1 or equivalent in the proof section) is conditional on eigenvalue assumptions whose validity is supported only by numerical observation rather than derivation or sufficient conditions on the problem class. This leaves a gap between the proven statement and the practical claim that the method converges for piecewise differentiable functions.

    Authors: We agree that the central result is conditional on the eigenvalue assumptions, which are motivated by and extracted from numerical experiments rather than derived as necessary conditions for the problem class. The manuscript already states this limitation explicitly in the abstract ('we do not prove that the observed behavior of the eigenvalues actually occurs') and in the discussion following the assumptions. The contribution lies in characterizing the precise spectral behavior that suffices to prove criticality and piecewise exploration, thereby supplying intuition for the observed practical success of BFGS on nonsmooth objectives. We do not claim or prove unconditional convergence. To address the noted gap in presentation, we will revise the assumptions section to more prominently label these as sufficient conditions derived from numerics and to avoid any phrasing that could be read as implying the assumptions hold in general. revision: yes

  2. Referee: [Abstract and §1] The paper correctly acknowledges that the observed eigenvalue behavior is not shown to occur in general, yet the abstract and introduction frame the results as giving insight into convergence for nonsmooth functions. A more precise statement of the scope—explicitly limiting the claim to the conditional case—would strengthen the manuscript.

    Authors: We thank the referee for this suggestion on framing. While the current abstract and introduction already note the lack of a general proof for the eigenvalue behavior and position the work as providing insight rather than a full convergence guarantee, we agree that the language can be tightened to more explicitly restrict claims to the conditional setting. We will revise the abstract and §1 to state that the results identify sufficient conditions on eigenvalue vanishing (observed numerically) under which convergence to criticality holds for piecewise differentiable functions, and that these conditions offer intuition for why quasi-Newton methods succeed in practice, without suggesting that the assumptions are verified or that convergence is proven unconditionally. revision: yes

Circularity Check

0 steps flagged

No circularity: conditional convergence proof under numerically motivated assumptions

full rationale

The paper derives sufficient conditions on the limiting eigenvalue behavior of quasi-Newton matrices from numerical experiments, then proves that these conditions imply convergence to a critical point and exploration of piecewise structure for piecewise differentiable functions. It explicitly states that it does not prove the observed eigenvalue behavior occurs in general. This is a standard conditional implication result with no reduction of the central claim to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The derivation chain is self-contained as an implication under externally motivated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions from nonsmooth analysis and quasi-Newton theory plus the empirically observed eigenvalue pattern; no new entities are postulated and no free parameters are introduced.

axioms (2)
  • domain assumption Piecewise differentiable functions admit a well-defined notion of criticality via the Clarke subdifferential.
    Invoked when the paper states that the limit point is critical.
  • domain assumption The secant equation in the quasi-Newton update forces certain eigenvalues toward zero in the nonsmooth case.
    Presented as the central analytical difficulty that the paper addresses.

pith-pipeline@v0.9.0 · 5676 in / 1393 out tokens · 45514 ms · 2026-05-18T01:34:45.853713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

  1. [1]

    Technical Report ANL-5990, Argonne National Lab., Lemont, Ill

    Davidon, W.C.: Variable Metric Method for Minimization. Technical Report ANL-5990, Argonne National Lab., Lemont, Ill. (1959) https://doi.org/10.2172/ 4252678

  2. [2]

    Numerical Optimization

    Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York, NY (2006). https://doi.org/10.1007/978-0-387-40065-5

  3. [3]

    Trust -region algorithms for training responses: machine learning methods using indefinite Hessian approximations,

    Asl, A., Overton, M.L.: Analysis of the gradient method with an Armijo–Wolfe line search on a class of non-smooth convex functions. Optimization Methods and 17 Software35, 223–242 (2019) https://doi.org/10.1080/10556788.2019.1673388

  4. [4]

    Springer, Cham (2014)

    Bagirov, A., Karmitsa, N., M¨ akel¨ a, M.M.: Introduction to Nonsmooth Optimiza- tion. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08114-4

  5. [5]

    SIAM Journal on Optimization15, 751–779 (2005) https://doi.org/10.1137/030601296

    Burke, J.V., Lewis, A.S., Overton, M.L.: A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization. SIAM Journal on Optimization15, 751–779 (2005) https://doi.org/10.1137/030601296

  6. [6]

    In: Nurmin- ski, E.A

    Lemar´ echal, C.: Numerical experiments in nonsmooth optimization. In: Nurmin- ski, E.A. (ed.) Progress in Nondifferentiable Optimization, Laxenburg, Austria, pp. 61–84 (1982). International Institute for Applied Systems Analysis (IIASA)

  7. [7]

    Mathematical Programming141, 135–163 (2013) https://doi.org/10.1007/ s10107-012-0514-2

    Lewis, A.S., Overton, M.L.: Nonsmooth optimization via quasi-Newton meth- ods. Mathematical Programming141, 135–163 (2013) https://doi.org/10.1007/ s10107-012-0514-2

  8. [8]

    SIAM Journal on Optimization28(2), 1301–1311 (2018) https://doi

    Guo, J., Lewis, A.S.: Nonsmooth Variants of Powell’s BFGS Convergence Theorem. SIAM Journal on Optimization28(2), 1301–1311 (2018) https://doi. org/10.1137/17m1121883

  9. [9]

    Journal of Optimization Theory and Applications165(1), 151–171 (2015) https://doi

    Lewis, A.S., Zhang, S.: Nonsmoothness and a Variable Metric Method. Journal of Optimization Theory and Applications165(1), 151–171 (2015) https://doi. org/10.1007/s10957-014-0622-7

  10. [10]

    arXiv (2017)

    Xie, Y., W¨ achter, A.: On the convergence of BFGS on a class of piecewise lin- ear non-smooth functions. arXiv (2017). https://doi.org/10.48550/ARXIV.1712. 08571

  11. [11]

    IMA Journal of Numerical Analysis41(1), 1–27 (2020) https://doi.org/10.1093/imanum/drz052

    Asl, A., Overton, M.L.: Analysis of limited-memory BFGS on a class of nons- mooth convex functions. IMA Journal of Numerical Analysis41(1), 1–27 (2020) https://doi.org/10.1093/imanum/drz052

  12. [12]

    Asl, A., Overton, M.L.: Behavior of Limited Memory BFGS When Applied to Nonsmooth Functions and Their Nesterov Smoothings, pp. 25–55. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72040-7 2

  13. [13]

    Springer, New York, NY (2012)

    Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, New York, NY (2012). https://doi.org/10.1007/978-1-4614-4340-7

  14. [14]

    Springer, New York, NY (1983)

    Brøndsted, A.: An Introduction to Convex Polytopes. Springer, New York, NY (1983). https://doi.org/10.1007/978-1-4612-1148-8

  15. [15]

    Habilitation, Fakult¨ at f¨ ur Mathematik, Technische Universit¨ at M¨ unchen, M¨ unchen, Germany (2002)

    Ulbrich, M.: Nonsmooth Newton-like Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces. Habilitation, Fakult¨ at f¨ ur Mathematik, Technische Universit¨ at M¨ unchen, M¨ unchen, Germany (2002)

  16. [16]

    A Geometric Heuristic for Rectilinear Crossing Minimization

    Clarke, F.H.: Optimization and Nonsmooth Analysis. Society for Indus- trial and Applied Mathematics, Philadelphia (1990). https://doi.org/10.1137/1. 9781611971309

  17. [17]

    SIAM Jour- nal on Optimization13(3), 693–701 (2002) https://doi.org/10.1137/ s1052623401383455

    Dai, Y.-H.: Convergence Properties of the BFGS Algoritm. SIAM Jour- nal on Optimization13(3), 693–701 (2002) https://doi.org/10.1137/ s1052623401383455

  18. [18]

    Mathematical Programming 138, 501–530 (2013) https://doi.org/10.1007/s10107-012-0522-2

    Dai, Y.-H.: A perfect example for the BFGS method. Mathematical Programming 138, 501–530 (2013) https://doi.org/10.1007/s10107-012-0522-2

  19. [19]

    A simple Newton method for local nonsmooth optimization

    Lewis, A., Wylie, C.: A simple Newton method for local nonsmooth optimization. arXiv (2019). https://doi.org/10.48550/ARXIV.1907.11742 18

  20. [20]

    Liu, S., Sagastiz´ abal, C.: Beyond First Order: VU-Decomposition Methods, pp. 297–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-34910-3 9 . Numerical Nonsmooth Optimization

  21. [21]

    Mifflin, R., Sagastiz´ abal, C.: VU-Decomposition Derivatives for Convex Max- Functions, pp. 167–186. Springer, Berlin, Heidelberg (1999). https://doi.org/10. 1007/978-3-642-45780-7 11

  22. [22]

    SIAM Journal on Opti- mization13(3), 702–725 (2002) https://doi.org/10.1137/s1052623401387623

    Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM Journal on Opti- mization13(3), 702–725 (2002) https://doi.org/10.1137/s1052623401387623

  23. [23]

    Pearson, Harlow (2013)

    Artin, M.: Algebra, 2nd edn. Pearson, Harlow (2013)

  24. [24]

    Springer, Cham (2024)

    Axler, S.: Linear Algebra Done Right. Springer, Cham (2024). https://doi.org/ 10.1007/978-3-031-41026-0

  25. [25]

    Journal of Optimization Theory and Applications155, 180–195 (2012) https://doi.org/10.1007/s10957-012-0024-7

    Mahdavi-Amiri, N., Yousefpour, R.: An effective nonsmooth optimization algo- rithm for locally lipschitz functions. Journal of Optimization Theory and Applications155, 180–195 (2012) https://doi.org/10.1007/s10957-012-0024-7

  26. [26]

    Computational Optimization and Applications88, 151– 165 (2024) https://doi.org/10.1007/s10589-024-00552-0

    Gebken, B.: A note on the convergence of deterministic gradient sampling in nonsmooth optimization. Computational Optimization and Applications88, 151– 165 (2024) https://doi.org/10.1007/s10589-024-00552-0

  27. [27]

    SIAM Journal on Optimization30(1), 182–209 (2020) https://doi.org/10.1137/ 19m1240794

    Xie, Y., Byrd, R.H., Nocedal, J.: Analysis of the BFGS Method with Errors. SIAM Journal on Optimization30(1), 182–209 (2020) https://doi.org/10.1137/ 19m1240794

  28. [28]

    SIAM Journal on Numerical Analysis26(3), 727–739 (1989) https://doi.org/10.1137/0726042 19

    Byrd, R.H., Nocedal, J.: A Tool for the Analysis of Quasi-Newton Methods with Application to Unconstrained Minimization. SIAM Journal on Numerical Analysis26(3), 727–739 (1989) https://doi.org/10.1137/0726042 19