pith. sign in

arxiv: 2604.08067 · v1 · submitted 2026-04-09 · 🧮 math.OC

Inexact Limited Memory Bundle Method

Pith reviewed 2026-05-10 17:34 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonsmooth optimizationbundle methodslimited memory methodsinexact methodsnonconvex optimizationglobal convergencenoisy data
0
0 comments X

The pith

An inexact limited memory bundle method converges globally to an approximate stationary point for large-scale nonsmooth nonconvex problems even when function values and subgradients contain noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new optimization algorithm that works with approximate information rather than exact function values and subgradients. It is built for large-scale problems that are nonsmooth and nonconvex, where noise can arise from measurement errors, privacy mechanisms, or stochastic estimates. The authors prove that the method still reaches a point satisfying an approximate stationarity condition under mild assumptions on the noise. Experiments with varying noise levels indicate that the approach remains effective and competitive with exact-data methods on large instances.

Core claim

The proposed inexact limited memory bundle method maintains global convergence to an approximate stationary point while tolerating bounded noise in both function evaluations and subgradient information, and it demonstrates practical performance on large-scale nonsmooth nonconvex test problems.

What carries the argument

The inexact limited memory bundle method, which maintains a limited history of inexact subgradients to form a bundle model and uses updates that approximate second-order information without storing the full history.

If this is right

  • The method applies directly to settings where subgradients are available only through noisy oracles, such as differential privacy or stochastic approximations.
  • Global convergence holds to an approximate stationary point rather than an exact one, with the tolerance controlled by the noise bounds.
  • Limited memory updates keep memory usage low, enabling the method to scale to problems with thousands of variables.
  • Numerical tests confirm competitiveness with exact bundle methods when noise is absent and continued reliability when noise is present.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inexact framework could be adapted to other bundle or proximal algorithms for nonsmooth problems without major changes to the convergence analysis.
  • In applications like machine learning, the noise tolerance might allow cheaper gradient estimators or privacy mechanisms while retaining convergence guarantees.
  • Performance on very high-dimensional problems may depend on how the limited memory length interacts with the specific noise structure.

Load-bearing premise

The noise affecting function values and subgradients must remain bounded or satisfy regularity conditions that preserve stability of the inexact bundle framework.

What would settle it

A concrete counter-example or numerical run in which the noise level violates the boundedness conditions yet the method still fails to reach any approximate stationary point on a problem known to possess one.

Figures

Figures reproduced from arXiv: 2604.08067 by Jenni Lampainen, Kaisa Joki, Marko M. M\"akel\"a, Napsu Karmitsa.

Figure 1
Figure 1. Figure 1: Initialization Direction finding Desired accuracy? Direction finding using the limited memory BFGS update Direction finding using the limited memory SR1 update Does the new auxiliary point improve the solution enough? Null step and aggregation Serious step STOP Compute an auxiliary point No Yes No Yes [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Nonsmooth nonconvex problems with exact information: n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000. Hollow markers and dashed lines indicate values that lie outside the figure limits. minimum therefore reflect the quality of the stationary points found rather than the success or failure of the algorithms in a strict sense. From [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Nonsmooth nonconvex problems with constant noises N 1 and N 3, the noise bound q¯ = 0.01, and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000. (a) Accuracy (b) Function evaluations [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Nonsmooth nonconvex problems with vanishing noises N 2 and N 4, the noise bound q¯ = 0.01, and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of different noise bounds q¯ = 0, 0.0001, 0.001, and 0.01 with noise type N 1 (constant noise in both function values and subgradients), and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of different noise bounds q¯ = 0, 0.0001, 0.001, and 0.01 with noise type N 2 (vanishing noise in both function values and subgradients), and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of different noise bounds q¯ = 0, 0.0001, 0.001, and 0.01 with noise type N 3 (constant noise in subgradients), and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of different noise bounds q¯ = 0, 0.0001, 0.001, and 0.01 with noise type N 4 (vanishing noise in subgradients), and n = 2, 5, 10, 20, 50, 100, 200, 500, 1000, and 2000. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
read the original abstract

Large-scale nonsmooth optimization problems arise in many real-world applications, but obtaining exact function and subgradient values for these problems may be computationally expensive or even infeasible. In many practical settings, only inexact information is available due to measurement or modeling errors, privacy-preserving computations, or stochastic approximations, making inexact optimization methods particularly relevant. In this paper, we propose a novel inexact limited memory bundle method for large-scale nonsmooth nonconvex optimization. The method tolerates noise in both function values and subgradients. We prove the global convergence of the proposed method to an approximate stationary point. Numerical experiments with different levels of noise in function and/or subgradient values show that the method performs well with both exact and noisy data. In particular, the results demonstrate competitiveness in large-scale nonsmooth optimization and highlight the suitability of the method for applications where noise is unavoidable, such as differential privacy in machine learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a novel inexact limited memory bundle method for large-scale nonsmooth nonconvex optimization. The method is designed to tolerate noise in both function values and subgradients. The authors claim to prove global convergence to an approximate stationary point and report numerical experiments demonstrating competitiveness with both exact and noisy data, including suitability for applications such as differential privacy in machine learning.

Significance. If the convergence result holds under clearly stated and verifiable noise assumptions, the method would address a practically relevant gap in scalable nonsmooth optimization where exact oracles are unavailable. The limited-memory feature supports applicability to large-scale instances, and the numerical results suggest robustness across noise levels. However, the significance is currently difficult to assess without explicit statements of the noise model, the precise definition of approximate stationarity, and the key steps in the convergence argument.

major comments (1)
  1. The abstract asserts a global convergence proof to an approximate stationary point under noise in function and subgradient values, yet provides no details on the noise regularity conditions, the definition of the stationarity measure, or the descent and bundle-update conditions used in the argument. Without these, it is not possible to verify whether the proof is load-bearing or relies on unstated assumptions.
minor comments (1)
  1. The abstract refers to 'different levels of noise' in the experiments but does not specify the noise model (e.g., bounded additive noise, relative noise, or stochastic approximation) or the test problems used, which limits reproducibility assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on our manuscript. We address the major comment below and will revise the abstract to improve accessibility while preserving its conciseness.

read point-by-point responses
  1. Referee: The abstract asserts a global convergence proof to an approximate stationary point under noise in function and subgradient values, yet provides no details on the noise regularity conditions, the definition of the stationarity measure, or the descent and bundle-update conditions used in the argument. Without these, it is not possible to verify whether the proof is load-bearing or relies on unstated assumptions.

    Authors: We appreciate the referee pointing out the abstract's lack of detail on these elements. The noise regularity conditions (bounded noise in function values and subgradients) are formalized in Assumption 3.1. The notion of approximate stationarity is defined in Definition 2.3 as an approximate Goldstein ε-stationary point. The descent condition is enforced via the inexact line search in Step 4 of Algorithm 1, while bundle-update rules (including aggregation and deletion) appear in Steps 5–6, with the full convergence argument given in Theorem 4.1 and its proof in Section 4. All conditions used in the proof are explicitly stated in the manuscript; no unstated assumptions are employed. To address the referee's concern, we will revise the abstract to include a brief sentence referencing the noise model and the stationarity measure. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and description present a novel inexact limited memory bundle method with a claimed independent global convergence proof to an approximate stationary point under noise in function and subgradient values. No equations, definitions, or load-bearing steps are available that reduce the result to fitted parameters, self-referential definitions, or self-citation chains by construction. The proof is described as a separate argument resting on regularity conditions for the inexact bundle framework, with no evidence of renaming known results or smuggling ansatzes via citation in the provided content. The derivation therefore appears self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; typical bundle-method assumptions such as local Lipschitz continuity or bounded noise are not stated here.

pith-pipeline@v0.9.0 · 5460 in / 1134 out tokens · 95172 ms · 2026-05-10T17:34:01.734642+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Abadi, A

    M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, and L. Zhang, Deep learning with differential privacy, inProc. ACM SIGSAC Conf. Comput. Commun. Secur.2016, pp. 308–318

  2. [2]

    Astorino and A

    A. Astorino and A. Fuduli,Nonsmooth optimization techniques for semi-supervised clas- sification, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007), pp. 2135–2142

  3. [3]

    Astorino, A

    A. Astorino, A. Fuduli, and E. Gorgone,Nonsmoothness in classification problems, Optim. Methods Softw. 23 (2008), pp. 675–688

  4. [4]

    Bagirov, M

    A.M. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, and S. Taheri,Numerical Non- smooth Optimization: State of the Art Algorithms, Springer Cham, 2020

  5. [5]

    Bagirov, N

    A.M. Bagirov, N. Karmitsa, and M.M. Mäkelä,Introduction to Nonsmooth Optimization: Theory, Practice and Software, Springer, 2014

  6. [6]

    Bagirov, N

    A.M. Bagirov, N. Karmitsa, and S. Taheri,Partitional Clustering via Nonsmooth Opti- mization: Clustering via Optimization, 2nd ed., Springer Cham, 2025

  7. [7]

    Bagirov, S

    A.M. Bagirov, S. Taheri, N. Karmitsa, N. Sultanova, and S. Asadi,Robust piecewise linear L1-regression via nonsmooth DC optimization, Optim. Methods Softw. 37 (2022), pp. 1289–1309

  8. [8]

    Bagirov, S

    A.M. Bagirov, S. Taheri, and J. Ugon,Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems, Pattern Recognit. 53 (2016), pp. 12–24

  9. [9]

    Bagirov and J

    A.M. Bagirov and J. Ugon,Nonsmooth DC programming approach to clusterwise linear regression: Optimality conditions and algorithms, Optim. Methods Softw. 33 (2018), pp. 194–219

  10. [10]

    Bihain,Optimization of upper semidifferentiable functions, J

    A. Bihain,Optimization of upper semidifferentiable functions, J. Optim. Theory Appl. 4 (1984), pp. 545–568

  11. [11]

    Boufi, A

    K. Boufi, A. Fadil, and A. Roubi,New formulation with proximal point algorithm and proximal bundle methods for equilibrium problems, Optimization (2026), pp. 1–31

  12. [12]

    R.H. Byrd, J. Nocedal, and R.B. Schnabel,Representations of quasi-Newton matrices and their use in limited memory methods, Math. Program. 63 (1994), pp. 129–156

  13. [13]

    Clarke,Optimization and Nonsmooth Analysis, Wiley-Interscience, New York, 1983

    F.H. Clarke,Optimization and Nonsmooth Analysis, Wiley-Interscience, New York, 1983

  14. [14]

    de Oliveira and M

    W. de Oliveira and M. Solodov,Bundle methods for inexact data, inNumerical nonsmooth optimization: State of the art algorithms, Springer, 2020, pp. 417–459

  15. [15]

    Demyanov, G.E

    V.F. Demyanov, G.E. Stavroulakis, L.N. Polyakova, and P.D. Panagiotopoulos,Qua- sidifferentiability and Nonsmooth Modelling in Mechanics, Engineering and Economics, Springer Science & Business Media, 2013

  16. [16]

    Dwork, F

    C. Dwork, F. McSherry, K. Nissim, and A. Smith,Calibrating noise to sensitivity in private data analysis, inProc. Theory Cryptogr. Conf.Springer, 2006, pp. 265–284

  17. [17]

    Dwork and A

    C. Dwork and A. Roth,The algorithmic foundations of differential privacy, Found. Trends Theor. Comput. Sci. 9 (2014), pp. 211–407

  18. [18]

    Haarala, K

    M. Haarala, K. Miettinen, and M.M. Mäkelä,New limited memory bundle method for large-scale nonsmooth optimization, Optim. Methods Softw. 19 (2004), pp. 673–692

  19. [19]

    Haarala, K

    N. Haarala, K. Miettinen, and M.M. Mäkelä,Globally convergent limited memory bundle method for large-scale nonsmooth optimization, Math. Program. 109 (2007), pp. 181–205

  20. [20]

    Halkola, K

    A. Halkola, K. Joki, T. Mirtti, M.M. Mäkelä, T. Aittokallio, and T.D. Laajala,OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prog- nostic modelling of prostate cancer, PLOS Comput. Biol. 19 (2023), p. e1010333

  21. [21]

    Hare and C

    W. Hare and C. Sagastizábal,A redistributed proximal bundle method for nonconvex 27 optimization, SIAM J. Optim. 20 (2010), pp. 2442–2473

  22. [22]

    W. Hare, C. Sagastizábal, and S. M.A proximal bundle method for nonsmooth nonconvex functions with inexact information, Comput. Optim. Appl. 63 (2016), pp. 1–28

  23. [23]

    Hertlein and M

    L. Hertlein and M. Ulbrich,An inexact bundle algorithm for nonconvex nonsmooth min- imization in Hilbert space, SIAM J. Control Optim. 57 (2019), pp. 3137–3165

  24. [24]

    Hiriart-Urruty and C

    J.B. Hiriart-Urruty and C. Lemaréchal,Convex Analysis and Minimization Algorithms I, Springer-Verlag, Berlin, 1993

  25. [25]

    Hoseini Monjezi and S

    N. Hoseini Monjezi and S. Nobakhtian,New proximal bundle algorithm based on the gradient sampling method for nonsmooth nonconvex optimization with exact and inexact information, Numer. Algorithms 94 (2023), pp. 765–787

  26. [26]

    Kärkkäinen, K

    T. Kärkkäinen, K. Majava, and M.M. Mäkelä,Comparison of formulations and solution methods for image restoration problems, Inverse Probl. 17 (2001), pp. 1977–1995

  27. [27]

    A Comprehensive Guide to Differential Privacy: From Theory to User Expectations

    N. Karmitsa, A. Airola, T. Pahikkala, and T. Pitkämäki,A comprehensive guide to differential privacy: From theory to user expectations, preprint (2025). Available at https://arxiv.org/abs/2509.03294

  28. [28]

    Karmitsa and A.M

    N. Karmitsa and A.M. Bagirov,Limited memory discrete gradient bundle method for nonsmooth derivative free optimization, Optimization 61 (2012), pp. 1491–1509

  29. [29]

    Karmitsa, A.M

    N. Karmitsa, A.M. Bagirov, and M.M. Mäkelä,Comparing different nonsmooth optimiza- tion methods and software, Optim. Methods Softw. 27 (2012), pp. 131–153

  30. [30]

    Karmitsa, A.M

    N. Karmitsa, A.M. Bagirov, and S. Taheri,Clustering in large data sets with the limited memory bundle method, Pattern Recognit. 83 (2018), pp. 245–259

  31. [31]

    Karmitsa, V.P

    N. Karmitsa, V.P. Eronen, M.M. Mäkelä, T. Pahikkala, and A. Airola,Stochastic limited memory bundle algorithm for clustering in big data, Pattern Recognit. 165 (2025), p. 111654

  32. [32]

    Karmitsa, K

    N. Karmitsa, K. Joki, A. Airola, and T. Pahikkala,Limited memory bundle DC algorithm for sparse pairwise kernel learning, J. Glob. Optim. 92 (2025), pp. 55–85

  33. [33]

    Karmitsa and M.M

    N. Karmitsa and M.M. Mäkelä,Adaptive limited memory bundle method for bound con- strained large-scale nonsmooth optimization, Optimization 59 (2010), pp. 945–962

  34. [34]

    Karmitsa, S

    N. Karmitsa, S. Taheri, K. Joki, P. Paasivirta, A.M. Bagirov, and M.M. Mäkelä,Nons- mooth optimization-based hyperparameter-free neural networks for large-scale regression, Algorithms 16 (2023)

  35. [35]

    Khanh, B.S

    P.D. Khanh, B.S. Mordukhovich, and D.B. Tran,A new inexact gradient descent method with applications to nonsmooth convex optimization, Optim. Methods Softw. (2024), pp. 1–29

  36. [36]

    Kiwiel,Methods of Descent for Nondifferentiable Optimization, Lect

    K.C. Kiwiel,Methods of Descent for Nondifferentiable Optimization, Lect. Notes Math., Springer-Verlag, Berlin, 1985

  37. [37]

    Data Anal

    J.Lampainen,K.Joki,N.Karmitsa,andM.M.Mäkelä,An efficient incremental algorithm for clustering large datasets, Adv. Data Anal. Classif. (2026), pp. 1–62

  38. [38]

    Lukšan,Dual method for solving a special problem of quadratic programming as a sub- problem at linearly constrained nonlinear minmax approximation, Kybernetika 20 (1984), pp

    L. Lukšan,Dual method for solving a special problem of quadratic programming as a sub- problem at linearly constrained nonlinear minmax approximation, Kybernetika 20 (1984), pp. 445–457

  39. [39]

    Lukšan and J

    L. Lukšan and J. Vlček,Globally convergent variable metric method for convex nonsmooth unconstrained minimization, J. Optim. Theory Appl. 102 (1999), pp. 593–613

  40. [40]

    J. Lv, L.P. Pang, and F.Y. Meng,A proximal bundle method for constrained nonsmooth nonconvex optimization with inexact information, J. Glob. Optim. 70 (2018), pp. 517–549

  41. [41]

    Mäkelä,Issues of implementing a Fortran subroutine package NSOLIB for nons- mooth optimization, Tech

    M.M. Mäkelä,Issues of implementing a Fortran subroutine package NSOLIB for nons- mooth optimization, Tech. Rep. 5/1993, Dept. Math., Lab. Sci. Comput., Univ. Jyväskylä (1993)

  42. [42]

    Mäkelä,Multiobjective proximal bundle method for nonconvex nonsmooth optimiza- tion: Fortran subroutine MPBNGC 2.0, Rep

    M.M. Mäkelä,Multiobjective proximal bundle method for nonconvex nonsmooth optimiza- tion: Fortran subroutine MPBNGC 2.0, Rep. Dept. Math. Inf. Technol., Ser. B. Sci. Com- put., B. 13/2003, Univ. Jyväskylä, Jyväskylä (2003)

  43. [43]

    Mäkelä and P

    M.M. Mäkelä and P. Neittaanmäki,Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control, World Scientific Publishing Co., Singapore, 1992

  44. [44]

    Mistakidis and G.E

    E.S. Mistakidis and G.E. Stavroulakis,Nonconvex Optimization in Mechanics. Smooth 28 and Nonsmooth Algorithms, Heuristics and Engineering Applications by the F.E.M., Kluwert Academic Publishers, Dordrecht, 1998

  45. [45]

    Moreau, P.D

    J.J. Moreau, P.D. Panagiotopoulos, and G. Strang,Topics in Nonsmooth Mechanics, Birkhäuser Verlag, Basel, 1988

  46. [46]

    Movahedian and S

    N. Movahedian and S. Nobakhtian,Necessary and sufficient conditions for nonsmooth mathematical programs with equilibrium constraints, Nonlinear Anal.: Theory Methods Appl. 72 (2010), pp. 2694–2705

  47. [47]

    Nikolova, M.K

    M. Nikolova, M.K. Ng, and C.P. Tam,Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction, IEEE Trans. Image Process. 19 (2010), pp. 3073–3088

  48. [48]

    Noll,Bundle method for non-convex minimization with inexact subgradients and func- tion values, inComput

    D. Noll,Bundle method for non-convex minimization with inexact subgradients and func- tion values, inComput. Anal. Math., Springer, 2013, pp. 555–592

  49. [49]

    Theory, Applications and Numerical Results, Kluwert Academic Publisher, Dordrecht, 1998

    J.Outrata,M.Kočvara,andJ.Zowe,Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Theory, Applications and Numerical Results, Kluwert Academic Publisher, Dordrecht, 1998

  50. [50]

    Paasivirta, R

    P. Paasivirta, R. Numminen, A. Airola, N. Karmitsa, and T. Pahikkala,Predicting pair- wise interaction affinities withℓ0-penalized least squares – a nonsmooth bi-objective opti- mization based approach, Optim. Methods Softw. (2024), pp. 1–28

  51. [51]

    Ramirez and M

    C. Ramirez and M. Argaez,An minimization algorithm for non-smooth regularization in image processing, Signal Image Video Process. 9 (2015), pp. 373–386

  52. [52]

    Singh and S.K

    V. Singh and S.K. Mishra,Duality results for quasidifferentiable mathematical programs with equilibrium constraints, Comput. Manag. Sci. 23 (2026), p. 13

  53. [53]

    Vlček and L

    J. Vlček and L. Lukšan,Globally convergent variable metric method for nonconvex nondif- ferentiable unconstrained minimization, J. Optim. Theory Appl. 111 (2001), pp. 407–430

  54. [54]

    Wang and Y

    W. Wang and Y. Chen,An accelerated smoothing gradient method for nonconvex nons- mooth minimization in image processing, J. Sci. Comput. 90 (2022), p. 31

  55. [55]

    Yang and C.C

    J.M. Yang and C.C. Chen,GEMDOCK: A generic evolutionary method for molecular docking, Proteins: Struct. Funct. Bioinf. 55 (2004), pp. 288–304

  56. [56]

    M.C. Yue, Z. Zhou, and A.M.C. So,A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo-Tseng error bound property, Math. Program. 174 (2019), pp. 327–358. 29 Appendix A. Test problems In our numerical experiments we usedfivenonsmooth nonconvex academic minimiza- tion problems from [18] and...