A simple Newton method for local nonsmooth optimization

Adrian Lewis; Calvin Wylie

arxiv: 1907.11742 · v1 · pith:NOWSORFJnew · submitted 2019-07-26 · 🧮 math.OC · cs.NA· math.NA

A simple Newton method for local nonsmooth optimization

Adrian Lewis , Calvin Wylie This is my paper

Pith reviewed 2026-05-24 15:16 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA

keywords nonsmooth optimizationbundle methodsNewton methodquadratic convergenceactive setblack-box optimizationconvex optimizationnonconvex optimization

0 comments

The pith

A bundle Newton method achieves local quadratic convergence for nonsmooth optimization given the optimal active set cardinality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new bundle Newton method for black-box nonsmooth optimization that incorporates second-order information alongside the usual linear approximation oracle. For the class of functions that are maxima of several smooth functions, it establishes local quadratic convergence when provided only with the cardinality of the optimal active set. This targets the challenge of slow local convergence in traditional subgradient and bundle methods. The method is tested on general functions showing promise for both convex and nonconvex cases, and hints at first-order extensions.

Core claim

The bundle Newton method, which augments the linear approximation oracle with second-order objective information, attains local quadratic convergence on the representative problem class of maxima of smooth functions when the cardinality of the optimal active set is supplied as input.

What carries the argument

The bundle Newton method that integrates second-order information with the linear approximation oracle, using the cardinality of the optimal active set to achieve quadratic convergence.

If this is right

The method converges quadratically locally for maxima of smooth functions with known active set size.
It performs well on more general nonsmooth functions, both convex and nonconvex.
It opens the door to first-order analogues of the approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the active set cardinality can be estimated reliably, the method could extend to broader problem classes without explicit input.
The semi-structured approach might apply to other optimization settings where partial structure is known.
Similar ideas could improve convergence in nonconvex nonsmooth problems beyond the tested cases.

Load-bearing premise

The objective function is a maximum of several smooth functions that the oracle cannot access individually.

What would settle it

A counterexample where the method does not exhibit quadratic convergence on a max of smooth functions despite knowing the active set cardinality.

Figures

Figures reproduced from arXiv: 1907.11742 by Adrian Lewis, Calvin Wylie.

**Figure 2.** Figure 2: Optimality measures against iteration count for bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗

**Figure 3.** Figure 3: Best function value found for BFGS and the bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

**Figure 4.** Figure 4: Optimality measures against iteration count for bundle Newton method [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Function value convergence and optimality measures for the maximum [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: Clustering of the six largest eigenvalues of [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

read the original abstract

Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional bundle methods, are popular in practice, local convergence is still sluggish. Faster variants depend either on problem structure or on analyses that elide sequences of "null" steps. Motivated by a semi-structured approach to optimization and the sequential quadratic programming philosophy, we describe a new bundle Newton method that incorporates second-order objective information with the usual linear approximation oracle. One representative problem class consists of maxima of several smooth functions, individually inaccessible to the oracle. Given as additional input just the cardinality of the optimal active set, we prove local quadratic convergence. A simple implementation shows promise on more general functions, both convex and nonconvex, and suggests first-order analogues.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a bundle-Newton hybrid with a local quadratic convergence proof for max-of-smooth functions when the optimal active-set cardinality is supplied as extra input.

read the letter

The main thing to know is that this paper presents a bundle method using second-order information that proves local quadratic convergence on the max-of-smooth class when the cardinality of the optimal active set is given upfront. That is the scoped claim, and it is new relative to the slower local rates in standard bundle methods. The authors motivate it from SQP ideas and the semi-structured setting, and they include a simple implementation that shows promise on broader convex and nonconvex examples, plus a nod toward first-order analogues. That combination of ideas and the explicit guarantee under the stated condition is the concrete advance. The abstract is straightforward about the oracle model and the representative class where individual smooth pieces are inaccessible. The stress-test note correctly flags that the argument avoids obvious circularity and stays within its stated scope. The soft spots are proportionate: the cardinality input is genuine extra information, not recovered from the oracle, so the method is not fully black-box; the result is local only, with no global claims; and the handling of null steps and the precise derivation would need checking in the full proof. Those are real limitations but they are stated up front rather than hidden. This paper is for researchers working on nonsmooth optimization rates, bundle methods, or ways to bring second-order information into nonsmooth settings. A reader already thinking about active-set structure or SQP-style approaches would get direct value. It deserves a serious referee because the claim is specific, the authors are credible in the area, and the result is falsifiable in principle. I would send it to review rather than desk reject.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces a bundle Newton method for nonsmooth optimization that augments the standard linear approximation oracle with second-order information. For the representative class of objective functions that are pointwise maxima of several smooth functions (where the individual smooth functions are inaccessible to the oracle), and given the cardinality of the optimal active set as explicit additional input, the authors prove local quadratic convergence. Numerical results on a simple implementation are reported for both convex and nonconvex test problems, with a suggestion that first-order analogues may be viable.

Significance. If the local quadratic convergence claim holds under the stated assumptions, the result would be significant for black-box nonsmooth optimization, a setting in which superlinear rates have remained elusive even for convex problems. The approach is motivated by semi-structured problem classes and SQP ideas, and the explicit cardinality input allows a precise scoping that avoids eliding null steps. The empirical section indicates practical promise beyond the analyzed class.

minor comments (3)

The abstract states that the method 'incorporates second-order objective information with the usual linear approximation oracle,' but the precise form of the second-order oracle (e.g., whether it returns a Hessian of an active function or an approximation) should be stated explicitly in §2 or §3 to clarify the information model.
In the description of the representative problem class, the phrase 'individually inaccessible to the oracle' is used; a short clarifying sentence on what information about the component functions is nevertheless available (if any) would aid readability.
The numerical experiments section would benefit from a table or plot that isolates the effect of supplying the correct active-set cardinality versus an incorrect value, to illustrate robustness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the recognition of its potential significance for black-box nonsmooth optimization, and the recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central result is a proof of local quadratic convergence for the proposed bundle Newton method on the representative class of max-of-smooth-functions objectives, conditioned on receiving the cardinality of the optimal active set as explicit additional input. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the proof is presented as depending on the algorithm and the supplied cardinality datum. The derivation is therefore self-contained against the stated assumptions and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be identified from the text.

pith-pipeline@v0.9.0 · 5667 in / 963 out tokens · 27973 ms · 2026-05-24T15:16:17.759649+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization
math.OC 2025-11 unverdicted novelty 6.0

Sufficient conditions on eigenvalue vanishing in quasi-Newton updates, observed numerically, are shown to imply convergence to criticality for piecewise differentiable nonsmooth functions, along with the method's abil...

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper

[1]

Atkinson and P.M

D.S. Atkinson and P.M. Vaidya. A cutting plane algorithm for convex pro- gramming that uses analytic centers. Math. Programming, 69:1–43, 1995

work page 1995
[2]

S. Bubeck. Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning, 8:231–357, 2015

work page 2015
[3]

Burke, A.S

J.V. Burke, A.S. Lewis, and M.L. Overton. A robust gradient sampling algo- rithm for nonsmooth, nonconvex optimization. SIAM J. Optim. , 15:751–779, 2005

work page 2005
[4]

F.H. Clarke. Optimization and Nonsmooth Analysis . Wiley Interscience, New York, 1983. 31

work page 1983
[5]

M. Coste. An Introduction to Semialgebraic Geometry. RAAG Notes, 78 pages, Institut de Recherche Math´ ematiques de Rennes, October 2002

work page 2002
[6]

Du and A

Y. Du and A. Ruszczy´ nski. Rate of convergence of the bundle method. J. Optim. Theory Appl. , 173:908–922, 2017

work page 2017
[7]

Goﬃn and J.-P

J.-L. Goﬃn and J.-P. Vial. On the computation of weighted analytic centers and dual ellipsoids with the projective algorithm. Math. Programming, 60:81–92, 1993

work page 1993
[8]

Golub and C.F

G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Univer- sity Press, Baltimore, MD, fourth edition, 2013

work page 2013
[9]

K. C. Kiwiel. Eﬃciency of proximal bundle methods. J. Optim. Theory Appl. , 104:589–603, 2000

work page 2000
[10]

K.C. Kiwiel. Methods of Descent for Nondiﬀerentiable Optimization , volume 1133 of Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1985

work page 1985
[11]

Sidford, and Sam Chiu-wai Wong

Yin Tat Lee, A. Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In FOCS 2015, pages 1049–1065. IEEE Computer Soc., Los Alamitos, CA, 2015

work page 2015
[12]

Lemar´ echal

C. Lemar´ echal. An extension of Davidon methods to non diﬀerentiable prob- lems. Math. Programming Stud., 3:95–109, 1975

work page 1975
[13]

Lemar´ echal, A

C. Lemar´ echal, A. Nemirovskii, and Y. Nesterov. New variants of bundle meth- ods. Math. Programming, 69:111–147, 1995

work page 1995
[14]

Lemar´ echal, F

C. Lemar´ echal, F. Oustry, and C. Sagastiz´ abal. The U-lagrangian of a convex function. Transactions of the American Mathematical Society , 352:711–729, 2000

work page 2000
[15]

Lemar´ echal and C

C. Lemar´ echal and C. Sagastiz´ abal. Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries. SIAM J. Optim. , 7:367—385, 1997

work page 1997
[16]

A.S. Lewis. Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. , 13:702–725, 2002

work page 2002
[17]

Lewis and M.L

A.S. Lewis and M.L. Overton. Nonsmooth optimization via quasi-Newton meth- ods. Math. Program., 141:135–163, 2013

work page 2013
[18]

Lukˇ san and J

L. Lukˇ san and J. Vlˇ cek. A bundle-Newton method for nonsmooth uncon- strained minimization. Math. Programming, 83:373–391, 1998

work page 1998
[19]

R. Miﬄin. An algorithm for constrained optimization with semismooth func- tions. Math. Oper. Res., 2:191–207, 1977. 32

work page 1977
[20]

Miﬄin and C

R. Miﬄin and C. Sagastiz´ abal. Proximal points are on the fast track. Journal of Convex Analysis , 9:563—579, 2002

work page 2002
[21]

Miﬄin and C

R. Miﬄin and C. Sagastiz´ abal. AVU -algorithm for convex minimization. Math. Program., 104:583–608, 2005

work page 2005
[22]

Miller and J

S.A. Miller and J. Malick. Newton methods for nonsmooth convex minimiza- tion: connections among U-Lagrangian, Riemannian Newton and SQP meth- ods. Math. Program., 104:609–633, 2005

work page 2005
[23]

A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Eﬃciency in Optimization . John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson

work page 1983
[24]

Nesterov

Y. Nesterov. Complexity estimates of some cutting plane methods based on the analytic barrier. Math. Programming, 69:149–176, 1995

work page 1995
[25]

Nesterov

Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht, The Netherlands, 2004

work page 2004
[26]

Nesterov

Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103:127–152, 2005

work page 2005
[27]

Nesterov and A

Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Con- vex Programming. SIAM, Philadelphia, PA, 1994

work page 1994
[28]

Nocedal and S.J

J. Nocedal and S.J. Wright. Numerical Optimization . Springer, New York, second edition, 2006

work page 2006
[29]

F. Oustry. The U-Lagrangian of the maximum eigenvalue function. SIAM J. Optim., 9:526–549, 1999

work page 1999
[30]

M.L. Overton. On minimizing the maximum eigenvalue of a symmetric ma- trix. In Linear algebra in signals, systems, and control , pages 150–169. SIAM, Philadelphia, PA, 1988

work page 1988
[31]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1:123–231, 2013

work page 2013
[32]

Robinson

S.M. Robinson. Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms. Math. Programming, 7:1–16, 1974

work page 1974
[33]

Ryu and S

E.K. Ryu and S. Boyd. A primer on monotone operator methods. Appl. Com- put. Math., 15:3–43, 2016. 33

work page 2016
[34]

Sagastiz´ abal

C. Sagastiz´ abal. AVU -point of view of nonsmooth optimization. In Proceedings of the International Congress of Mathematicians, Rio de Janeiro , volume 3, pages 3785–3806, 2018

work page 2018
[35]

Shapiro and M.K.H

A. Shapiro and M.K.H. Fan. On eigenvalue optimization. SIAM J. Optim. , 5:552–569, 1995

work page 1995
[36]

P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets. Math. Programming, 73:291–341, 1996. 34

work page 1996

[1] [1]

Atkinson and P.M

D.S. Atkinson and P.M. Vaidya. A cutting plane algorithm for convex pro- gramming that uses analytic centers. Math. Programming, 69:1–43, 1995

work page 1995

[2] [2]

S. Bubeck. Convex optimization: algorithms and complexity. Foundations and Trends in Machine Learning, 8:231–357, 2015

work page 2015

[3] [3]

Burke, A.S

J.V. Burke, A.S. Lewis, and M.L. Overton. A robust gradient sampling algo- rithm for nonsmooth, nonconvex optimization. SIAM J. Optim. , 15:751–779, 2005

work page 2005

[4] [4]

F.H. Clarke. Optimization and Nonsmooth Analysis . Wiley Interscience, New York, 1983. 31

work page 1983

[5] [5]

M. Coste. An Introduction to Semialgebraic Geometry. RAAG Notes, 78 pages, Institut de Recherche Math´ ematiques de Rennes, October 2002

work page 2002

[6] [6]

Du and A

Y. Du and A. Ruszczy´ nski. Rate of convergence of the bundle method. J. Optim. Theory Appl. , 173:908–922, 2017

work page 2017

[7] [7]

Goﬃn and J.-P

J.-L. Goﬃn and J.-P. Vial. On the computation of weighted analytic centers and dual ellipsoids with the projective algorithm. Math. Programming, 60:81–92, 1993

work page 1993

[8] [8]

Golub and C.F

G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Univer- sity Press, Baltimore, MD, fourth edition, 2013

work page 2013

[9] [9]

K. C. Kiwiel. Eﬃciency of proximal bundle methods. J. Optim. Theory Appl. , 104:589–603, 2000

work page 2000

[10] [10]

K.C. Kiwiel. Methods of Descent for Nondiﬀerentiable Optimization , volume 1133 of Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1985

work page 1985

[11] [11]

Sidford, and Sam Chiu-wai Wong

Yin Tat Lee, A. Sidford, and Sam Chiu-wai Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In FOCS 2015, pages 1049–1065. IEEE Computer Soc., Los Alamitos, CA, 2015

work page 2015

[12] [12]

Lemar´ echal

C. Lemar´ echal. An extension of Davidon methods to non diﬀerentiable prob- lems. Math. Programming Stud., 3:95–109, 1975

work page 1975

[13] [13]

Lemar´ echal, A

C. Lemar´ echal, A. Nemirovskii, and Y. Nesterov. New variants of bundle meth- ods. Math. Programming, 69:111–147, 1995

work page 1995

[14] [14]

Lemar´ echal, F

C. Lemar´ echal, F. Oustry, and C. Sagastiz´ abal. The U-lagrangian of a convex function. Transactions of the American Mathematical Society , 352:711–729, 2000

work page 2000

[15] [15]

Lemar´ echal and C

C. Lemar´ echal and C. Sagastiz´ abal. Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries. SIAM J. Optim. , 7:367—385, 1997

work page 1997

[16] [16]

A.S. Lewis. Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. , 13:702–725, 2002

work page 2002

[17] [17]

Lewis and M.L

A.S. Lewis and M.L. Overton. Nonsmooth optimization via quasi-Newton meth- ods. Math. Program., 141:135–163, 2013

work page 2013

[18] [18]

Lukˇ san and J

L. Lukˇ san and J. Vlˇ cek. A bundle-Newton method for nonsmooth uncon- strained minimization. Math. Programming, 83:373–391, 1998

work page 1998

[19] [19]

R. Miﬄin. An algorithm for constrained optimization with semismooth func- tions. Math. Oper. Res., 2:191–207, 1977. 32

work page 1977

[20] [20]

Miﬄin and C

R. Miﬄin and C. Sagastiz´ abal. Proximal points are on the fast track. Journal of Convex Analysis , 9:563—579, 2002

work page 2002

[21] [21]

Miﬄin and C

R. Miﬄin and C. Sagastiz´ abal. AVU -algorithm for convex minimization. Math. Program., 104:583–608, 2005

work page 2005

[22] [22]

Miller and J

S.A. Miller and J. Malick. Newton methods for nonsmooth convex minimiza- tion: connections among U-Lagrangian, Riemannian Newton and SQP meth- ods. Math. Program., 104:609–633, 2005

work page 2005

[23] [23]

A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Eﬃciency in Optimization . John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson

work page 1983

[24] [24]

Nesterov

Y. Nesterov. Complexity estimates of some cutting plane methods based on the analytic barrier. Math. Programming, 69:149–176, 1995

work page 1995

[25] [25]

Nesterov

Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht, The Netherlands, 2004

work page 2004

[26] [26]

Nesterov

Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103:127–152, 2005

work page 2005

[27] [27]

Nesterov and A

Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Con- vex Programming. SIAM, Philadelphia, PA, 1994

work page 1994

[28] [28]

Nocedal and S.J

J. Nocedal and S.J. Wright. Numerical Optimization . Springer, New York, second edition, 2006

work page 2006

[29] [29]

F. Oustry. The U-Lagrangian of the maximum eigenvalue function. SIAM J. Optim., 9:526–549, 1999

work page 1999

[30] [30]

M.L. Overton. On minimizing the maximum eigenvalue of a symmetric ma- trix. In Linear algebra in signals, systems, and control , pages 150–169. SIAM, Philadelphia, PA, 1988

work page 1988

[31] [31]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1:123–231, 2013

work page 2013

[32] [32]

Robinson

S.M. Robinson. Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms. Math. Programming, 7:1–16, 1974

work page 1974

[33] [33]

Ryu and S

E.K. Ryu and S. Boyd. A primer on monotone operator methods. Appl. Com- put. Math., 15:3–43, 2016. 33

work page 2016

[34] [34]

Sagastiz´ abal

C. Sagastiz´ abal. AVU -point of view of nonsmooth optimization. In Proceedings of the International Congress of Mathematicians, Rio de Janeiro , volume 3, pages 3785–3806, 2018

work page 2018

[35] [35]

Shapiro and M.K.H

A. Shapiro and M.K.H. Fan. On eigenvalue optimization. SIAM J. Optim. , 5:552–569, 1995

work page 1995

[36] [36]

P. M. Vaidya. A new algorithm for minimizing convex functions over convex sets. Math. Programming, 73:291–341, 1996. 34

work page 1996