pith. sign in

arxiv: 1907.11742 · v1 · pith:NOWSORFJnew · submitted 2019-07-26 · 🧮 math.OC · cs.NA· math.NA

A simple Newton method for local nonsmooth optimization

Pith reviewed 2026-05-24 15:16 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA
keywords nonsmooth optimizationbundle methodsNewton methodquadratic convergenceactive setblack-box optimizationconvex optimizationnonconvex optimization
0
0 comments X

The pith

A bundle Newton method achieves local quadratic convergence for nonsmooth optimization given the optimal active set cardinality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new bundle Newton method for black-box nonsmooth optimization that incorporates second-order information alongside the usual linear approximation oracle. For the class of functions that are maxima of several smooth functions, it establishes local quadratic convergence when provided only with the cardinality of the optimal active set. This targets the challenge of slow local convergence in traditional subgradient and bundle methods. The method is tested on general functions showing promise for both convex and nonconvex cases, and hints at first-order extensions.

Core claim

The bundle Newton method, which augments the linear approximation oracle with second-order objective information, attains local quadratic convergence on the representative problem class of maxima of smooth functions when the cardinality of the optimal active set is supplied as input.

What carries the argument

The bundle Newton method that integrates second-order information with the linear approximation oracle, using the cardinality of the optimal active set to achieve quadratic convergence.

If this is right

  • The method converges quadratically locally for maxima of smooth functions with known active set size.
  • It performs well on more general nonsmooth functions, both convex and nonconvex.
  • It opens the door to first-order analogues of the approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the active set cardinality can be estimated reliably, the method could extend to broader problem classes without explicit input.
  • The semi-structured approach might apply to other optimization settings where partial structure is known.
  • Similar ideas could improve convergence in nonconvex nonsmooth problems beyond the tested cases.

Load-bearing premise

The objective function is a maximum of several smooth functions that the oracle cannot access individually.

What would settle it

A counterexample where the method does not exhibit quadratic convergence on a max of smooth functions despite knowing the active set cardinality.

Figures

Figures reproduced from arXiv: 1907.11742 by Adrian Lewis, Calvin Wylie.

Figure 1
Figure 1. Figure 1: Best function value found for the bundle method and bundle Newton [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Optimality measures against iteration count for bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Best function value found for BFGS and the bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Optimality measures against iteration count for bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Function value convergence and optimality measures for the maximum [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Clustering of the six largest eigenvalues of [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗
read the original abstract

Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional bundle methods, are popular in practice, local convergence is still sluggish. Faster variants depend either on problem structure or on analyses that elide sequences of "null" steps. Motivated by a semi-structured approach to optimization and the sequential quadratic programming philosophy, we describe a new bundle Newton method that incorporates second-order objective information with the usual linear approximation oracle. One representative problem class consists of maxima of several smooth functions, individually inaccessible to the oracle. Given as additional input just the cardinality of the optimal active set, we prove local quadratic convergence. A simple implementation shows promise on more general functions, both convex and nonconvex, and suggests first-order analogues.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces a bundle Newton method for nonsmooth optimization that augments the standard linear approximation oracle with second-order information. For the representative class of objective functions that are pointwise maxima of several smooth functions (where the individual smooth functions are inaccessible to the oracle), and given the cardinality of the optimal active set as explicit additional input, the authors prove local quadratic convergence. Numerical results on a simple implementation are reported for both convex and nonconvex test problems, with a suggestion that first-order analogues may be viable.

Significance. If the local quadratic convergence claim holds under the stated assumptions, the result would be significant for black-box nonsmooth optimization, a setting in which superlinear rates have remained elusive even for convex problems. The approach is motivated by semi-structured problem classes and SQP ideas, and the explicit cardinality input allows a precise scoping that avoids eliding null steps. The empirical section indicates practical promise beyond the analyzed class.

minor comments (3)
  1. The abstract states that the method 'incorporates second-order objective information with the usual linear approximation oracle,' but the precise form of the second-order oracle (e.g., whether it returns a Hessian of an active function or an approximation) should be stated explicitly in §2 or §3 to clarify the information model.
  2. In the description of the representative problem class, the phrase 'individually inaccessible to the oracle' is used; a short clarifying sentence on what information about the component functions is nevertheless available (if any) would aid readability.
  3. The numerical experiments section would benefit from a table or plot that isolates the effect of supplying the correct active-set cardinality versus an incorrect value, to illustrate robustness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the recognition of its potential significance for black-box nonsmooth optimization, and the recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central result is a proof of local quadratic convergence for the proposed bundle Newton method on the representative class of max-of-smooth-functions objectives, conditioned on receiving the cardinality of the optimal active set as explicit additional input. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the proof is presented as depending on the algorithm and the supplied cardinality datum. The derivation is therefore self-contained against the stated assumptions and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be identified from the text.

pith-pipeline@v0.9.0 · 5667 in / 963 out tokens · 27973 ms · 2026-05-24T15:16:17.759649+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

    math.OC 2025-11 unverdicted novelty 6.0

    Sufficient conditions on eigenvalue vanishing in quasi-Newton updates, observed numerically, are shown to imply convergence to criticality for piecewise differentiable nonsmooth functions, along with the method's abil...

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper

  1. [1]

    Atkinson and P.M

    D.S. Atkinson and P.M. Vaidya. A cutting plane algorithm for convex pro- gramming that uses analytic centers. Math. Programming, 69:1–43, 1995

  2. [2]

    S. Bubeck. Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning, 8:231–357, 2015

  3. [3]

    Burke, A.S

    J.V. Burke, A.S. Lewis, and M.L. Overton. A robust gradient sampling algo- rithm for nonsmooth, nonconvex optimization. SIAM J. Optim. , 15:751–779, 2005

  4. [4]

    F.H. Clarke. Optimization and Nonsmooth Analysis . Wiley Interscience, New York, 1983. 31

  5. [5]

    M. Coste. An Introduction to Semialgebraic Geometry. RAAG Notes, 78 pages, Institut de Recherche Math´ ematiques de Rennes, October 2002

  6. [6]

    Du and A

    Y. Du and A. Ruszczy´ nski. Rate of convergence of the bundle method. J. Optim. Theory Appl. , 173:908–922, 2017

  7. [7]

    Goffin and J.-P

    J.-L. Goffin and J.-P. Vial. On the computation of weighted analytic centers and dual ellipsoids with the projective algorithm. Math. Programming, 60:81–92, 1993

  8. [8]

    Golub and C.F

    G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Univer- sity Press, Baltimore, MD, fourth edition, 2013

  9. [9]

    K. C. Kiwiel. Efficiency of proximal bundle methods. J. Optim. Theory Appl. , 104:589–603, 2000

  10. [10]

    K.C. Kiwiel. Methods of Descent for Nondifferentiable Optimization , volume 1133 of Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1985

  11. [11]

    Sidford, and Sam Chiu-wai Wong

    Yin Tat Lee, A. Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In FOCS 2015, pages 1049–1065. IEEE Computer Soc., Los Alamitos, CA, 2015

  12. [12]

    Lemar´ echal

    C. Lemar´ echal. An extension of Davidon methods to non differentiable prob- lems. Math. Programming Stud., 3:95–109, 1975

  13. [13]

    Lemar´ echal, A

    C. Lemar´ echal, A. Nemirovskii, and Y. Nesterov. New variants of bundle meth- ods. Math. Programming, 69:111–147, 1995

  14. [14]

    Lemar´ echal, F

    C. Lemar´ echal, F. Oustry, and C. Sagastiz´ abal. The U-lagrangian of a convex function. Transactions of the American Mathematical Society , 352:711–729, 2000

  15. [15]

    Lemar´ echal and C

    C. Lemar´ echal and C. Sagastiz´ abal. Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries. SIAM J. Optim. , 7:367—385, 1997

  16. [16]

    A.S. Lewis. Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. , 13:702–725, 2002

  17. [17]

    Lewis and M.L

    A.S. Lewis and M.L. Overton. Nonsmooth optimization via quasi-Newton meth- ods. Math. Program., 141:135–163, 2013

  18. [18]

    Lukˇ san and J

    L. Lukˇ san and J. Vlˇ cek. A bundle-Newton method for nonsmooth uncon- strained minimization. Math. Programming, 83:373–391, 1998

  19. [19]

    R. Mifflin. An algorithm for constrained optimization with semismooth func- tions. Math. Oper. Res., 2:191–207, 1977. 32

  20. [20]

    Mifflin and C

    R. Mifflin and C. Sagastiz´ abal. Proximal points are on the fast track. Journal of Convex Analysis , 9:563—579, 2002

  21. [21]

    Mifflin and C

    R. Mifflin and C. Sagastiz´ abal. AVU -algorithm for convex minimization. Math. Program., 104:583–608, 2005

  22. [22]

    Miller and J

    S.A. Miller and J. Malick. Newton methods for nonsmooth convex minimiza- tion: connections among U-Lagrangian, Riemannian Newton and SQP meth- ods. Math. Program., 104:609–633, 2005

  23. [23]

    A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Efficiency in Optimization . John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson

  24. [24]

    Nesterov

    Y. Nesterov. Complexity estimates of some cutting plane methods based on the analytic barrier. Math. Programming, 69:149–176, 1995

  25. [25]

    Nesterov

    Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht, The Netherlands, 2004

  26. [26]

    Nesterov

    Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103:127–152, 2005

  27. [27]

    Nesterov and A

    Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Con- vex Programming. SIAM, Philadelphia, PA, 1994

  28. [28]

    Nocedal and S.J

    J. Nocedal and S.J. Wright. Numerical Optimization . Springer, New York, second edition, 2006

  29. [29]

    F. Oustry. The U-Lagrangian of the maximum eigenvalue function. SIAM J. Optim., 9:526–549, 1999

  30. [30]

    M.L. Overton. On minimizing the maximum eigenvalue of a symmetric ma- trix. In Linear algebra in signals, systems, and control , pages 150–169. SIAM, Philadelphia, PA, 1988

  31. [31]

    Parikh and S

    N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1:123–231, 2013

  32. [32]

    Robinson

    S.M. Robinson. Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms. Math. Programming, 7:1–16, 1974

  33. [33]

    Ryu and S

    E.K. Ryu and S. Boyd. A primer on monotone operator methods. Appl. Com- put. Math., 15:3–43, 2016. 33

  34. [34]

    Sagastiz´ abal

    C. Sagastiz´ abal. AVU -point of view of nonsmooth optimization. In Proceedings of the International Congress of Mathematicians, Rio de Janeiro , volume 3, pages 3785–3806, 2018

  35. [35]

    Shapiro and M.K.H

    A. Shapiro and M.K.H. Fan. On eigenvalue optimization. SIAM J. Optim. , 5:552–569, 1995

  36. [36]

    P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets. Math. Programming, 73:291–341, 1996. 34