pith. sign in

arxiv: 1907.01321 · v1 · pith:GLOJVOUBnew · submitted 2019-07-02 · 🧮 math.OC

A Gradient Sampling method based on Ideal direction for solving nonsmooth nonconvex optimization problems: convergence analysis and numerical experiments

Pith reviewed 2026-05-25 11:09 UTC · model grok-4.3

classification 🧮 math.OC
keywords gradient samplingideal directionnonsmooth nonconvex optimizationArmijo line searchglobal convergenceserious iterationsquadratic programming
0
0 comments X

The pith

The Ideal direction replaces the quadratic subproblem in Gradient Sampling, satisfies Armijo, and preserves global convergence for nonsmooth nonconvex problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper modifies the Gradient Sampling method by introducing an Ideal direction that avoids solving a quadratic optimization problem at each iteration. The Ideal direction is proven to meet the Armijo step-size condition and to produce a substantial decrease in the objective function. The modified algorithm maintains the global convergence property of the standard method. An upper bound on the number of serious iterations is provided, enabling a new approach to convergence analysis. Numerical experiments confirm efficiency across small, medium, and large scale test problems.

Core claim

By defining the Ideal direction directly from the sampled gradients without any subproblem, the approach satisfies the Armijo condition, yields a useful reduction, and the resulting iteration sequence converges to a stationary point under the same assumptions as the original Gradient Sampling method; moreover an explicit upper bound holds for the number of serious steps.

What carries the argument

The Ideal direction, a descent direction computed directly from sampled gradients without solving quadratic or linear subproblems.

If this is right

  • The computational burden per iteration is reduced by eliminating the QP solve.
  • Global convergence to a stationary point is retained.
  • An upper bound on serious iterations is available under moderate assumptions.
  • The method performs well on problems of varying scales in experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to other first-order methods that rely on similar subproblems for direction finding.
  • Bounds on iteration counts could inform practical stopping criteria or adaptive sampling rates.
  • Large-scale applications may benefit from the reduced per-iteration cost.

Load-bearing premise

The objective function satisfies local Lipschitz continuity and the regularity conditions required by the original Gradient Sampling convergence theory.

What would settle it

A nonsmooth nonconvex function on which the Ideal direction violates the Armijo condition for every step size or on which the modified algorithm fails to reach a stationary point.

Figures

Figures reproduced from arXiv: 1907.01321 by M. Maleknia, M. Shamsi.

Figure 1
Figure 1. Figure 1: Performance profiles based on CPU time for medium scal [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance profiles based on QP solver time for mediu [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance profiles based on CPU time for large scale [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance profiles based on QP solver time for large [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
read the original abstract

In this paper, a modification to the Gradient Sampling (GS) method for minimizing nonsmooth nonconvex functions is presented. One drawback in GS method is the need of solving a Quadratic optimization Problem (QP) at each iteration, which is time-consuming especially for large scale objectives. To resolve this difficulty, we propose a new descent direction, namely Ideal direction, for which there is no need to consider any quadratic or linear optimization subproblem. It is shown that, this direction satisfies Armijo step size condition and can be used to make a substantial reduction in the objective function. Furthermore, we prove that using Ideal directions preserves the global convergence of the GS method. Moreover, under some moderate assumptions, we present an upper bound for the number of serious iterations. Using this upper bound, we develop a different strategy to study the convergence of the method. We also demonstrate the efficiency of the proposed method using small, medium and large scale problems in our numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes replacing the quadratic program in the Gradient Sampling (GS) algorithm for nonsmooth nonconvex minimization with a closed-form 'Ideal direction' that requires no optimization subproblem. It claims this direction satisfies the Armijo line-search condition, produces a substantial objective decrease, inherits the global convergence of the original GS method, and admits an explicit upper bound on the number of serious iterations under moderate assumptions; numerical experiments on small-, medium-, and large-scale problems are included to illustrate practical efficiency.

Significance. If the convergence inheritance is rigorously established, the work would provide a computationally lighter GS variant that retains theoretical guarantees, which could be useful for large-scale nonsmooth nonconvex problems where QP solves dominate runtime. The iteration bound and alternative convergence argument are additional positive features. The significance is tempered by the fact that the key inner-product decrease property required for the original GS analysis is not obviously inherited from the given closed-form construction.

major comments (2)
  1. [convergence analysis section (around the proof that Ideal directions preserve GS convergence)] The definition of the Ideal direction (closed-form expression avoiding the QP) is not shown to guarantee the uniform sufficient-decrease relation <v_k, d_k> ≤ −c ||v_k||^2 (or equivalent) for an arbitrary convex combination v_k of sampled gradients. This relation is load-bearing for carrying over the serious-step analysis and global convergence from the original GS theory; without it, the inheritance claim does not follow.
  2. [section presenting the iteration bound and alternative convergence argument] The upper bound on the number of serious iterations is stated under 'moderate assumptions,' but the manuscript does not explicitly list those assumptions or verify that they remain compatible with the Ideal-direction construction; this makes the bound's applicability and the 'different strategy' for convergence study difficult to assess.
minor comments (2)
  1. [abstract and introduction] The abstract and introduction should more clearly distinguish which parts of the GS convergence theory are reused verbatim versus which are reproved for the Ideal direction.
  2. [preliminaries and algorithm description] Notation for the sampled gradient set and the convex combination v_k should be introduced once and used consistently when stating the inner-product property.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments point by point below and will revise the manuscript to improve clarity on both issues.

read point-by-point responses
  1. Referee: [convergence analysis section (around the proof that Ideal directions preserve GS convergence)] The definition of the Ideal direction (closed-form expression avoiding the QP) is not shown to guarantee the uniform sufficient-decrease relation <v_k, d_k> ≤ −c ||v_k||^2 (or equivalent) for an arbitrary convex combination v_k of sampled gradients. This relation is load-bearing for carrying over the serious-step analysis and global convergence from the original GS theory; without it, the inheritance claim does not follow.

    Authors: We appreciate the referee highlighting this point. The global-convergence proof in the manuscript proceeds by verifying that the Ideal direction satisfies the key properties required by the original GS analysis, including a uniform sufficient-decrease relation derived directly from its closed-form expression. To make this inheritance fully explicit and easier to follow, we will add a short lemma in the revised convergence section that states and proves <v_k, d_k> ≤ −(1/2) ||v_k||^2 (with the constant obtained from the definition of the Ideal direction) for any convex combination v_k of the sampled gradients. This addition will not alter the existing arguments but will clarify how the serious-step analysis carries over unchanged. revision: yes

  2. Referee: [section presenting the iteration bound and alternative convergence argument] The upper bound on the number of serious iterations is stated under 'moderate assumptions,' but the manuscript does not explicitly list those assumptions or verify that they remain compatible with the Ideal-direction construction; this makes the bound's applicability and the 'different strategy' for convergence study difficult to assess.

    Authors: We agree that the assumptions should be stated more transparently. In the revised manuscript we will insert an explicit enumerated list of the moderate assumptions at the start of the iteration-bound section and add a brief paragraph confirming that none of them is affected by replacing the QP solution with the closed-form Ideal direction (the sampling procedure, the function class, and the line-search parameters remain identical). This will also make the alternative convergence strategy based on the iteration bound easier to evaluate. revision: yes

Circularity Check

0 steps flagged

No circularity; Ideal direction defined independently and its properties proven from standard GS assumptions

full rationale

The manuscript introduces the Ideal direction via an explicit closed-form expression that bypasses the QP subproblem, then separately verifies that this direction satisfies the Armijo condition and inherits the global convergence of the original Gradient Sampling method under the same local Lipschitz and regularity assumptions. No step reduces a claimed result to a fitted parameter, self-citation chain, or definitional equivalence; the convergence inheritance is established by direct verification of the required decrease properties rather than by renaming or smuggling prior results. External citations to the GS literature supply the baseline theory but do not bear the load of the new direction's properties.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the Ideal direction is a computational construct rather than a new postulated physical entity.

pith-pipeline@v0.9.0 · 5703 in / 1029 out tokens · 20460 ms · 2026-05-25T11:09:39.479402+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Singapore: World Scientific Publishing Co

    M¨ akel¨ a, M.M., Neittaanm¨ aki, P.: Nonsmooth optimization: Analysis and al- gorithms with applications to optimal control. Singapore: World Scientific Publishing Co. (1992)

  2. [2]

    application to the processing of outliers

    Nikolova, M.: Minimizers of cost-functions involving no nsmooth data-fidelity terms. application to the processing of outliers. SIAM Jour nal on Numerical Analysis 40(3), 965–994 (2002)

  3. [3]

    Astorino, A., Gaudioso, M.: Polyhedral separability thr ough successive LP. J. Optim. Theory Appl 112(2), 265–293 (2002)

  4. [4]

    Springer (2009) 28 M

    Lavor, C., Liberti, L., Maculan, N.: Molecular distance g eometry problem (2nd ed.). Springer (2009) 28 M. Maleknia, M. Shamsi

  5. [5]

    Elsevier Inc (2005)

    Witten, I.H., Frank, E.: Data mining: Practical machine l earning tools and techniques (2nd ed.). Elsevier Inc (2005)

  6. [6]

    SIAM (1990)

    Clarke, F.H.: Optimization and nonsmooth analysis. SIAM (1990)

  7. [7]

    Springer- Verlag (1985)

    Kiwiel, K.C.: Methods of descent for nondifferentiable op timization. Springer- Verlag (1985)

  8. [8]

    Springer (1985)

    Shor, N.Z.: Minimization methods for non-differentiable functions. Springer (1985)

  9. [9]

    A Series of Comprehensive Studies in Mathematics

    Hiriart-Urruty, J.B., chal, C.L.: Convex Analysis and Mi nimization Algorithms I. A Series of Comprehensive Studies in Mathematics. Spring er-Verlag (1993)

  10. [10]

    A Series of Comprehensive Studies in Mathematics

    Hiriart-Urruty, J.B., chal, C.L.: Convex Analysis and M inimization Algorithms II. A Series of Comprehensive Studies in Mathematics. Sprin ger-Verlag (1993)

  11. [11]

    Uryasev, S.P.: New variable metric algorithms for nondi fferentiable optimiza- tion problems. J. Optim. Theory Appl 71, 359–388 (1991)

  12. [12]

    Vlˇ cek, J., Lukˇ san, L.: Globally convergent variable m etric method for non- convex nondifferentiable unconstrained minimization. J. O ptim. Theory Appl 111(2), 407–430 (2001)

  13. [13]

    Bagirov, A.M., Jin, L., Karmitsa, N., Al Nuaimat, A., Sul tanova, N.: A sub- gradient method for nonconvex nonsmooth optimization. J. O ptim. Theory Appl 157, 416–435 (2013)

  14. [14]

    Mahdavi-Amiri, N., Yousefpour, R.: An effective nonsmoo th optimization algo- rithm for locally Lipschitz functions. J. Optim. Theory App l 155(1), 180–195 (2012)

  15. [15]

    Nesterov, Y.: Primal-dual subgradient methods for conv ex problems. Math. Program 120(1), 221–259 (2009)

  16. [16]

    Bagirov, A.M., Ganjehlou, A.N.: A quasisecant method fo r minimizing nons- mooth functions. Optim. Methods Softw 25(1), 3–18 (2010)

  17. [17]

    Haarala, M., Miettinen, K., M¨ akel¨ a, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Met hods Softw 19(6), 673–692 (2004)

  18. [18]

    Lukˇ san, L., Vlˇ cek, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program 83(1), 373–391 (1998)

  19. [19]

    Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradie nt sampling algo- rithm for nonsmooth, nonconvex optimization. SIAM J. Optim 15(3), 751–779 (2005)

  20. [20]

    arXiv : 18 04.11003 (2018)

    Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., S im˜ oes, L.E.A.: Gradi- ent sampling methods for nonsmooth optimization. arXiv : 18 04.11003 (2018)

  21. [21]

    Kiwiel, K.C.: Convergence of the gradient sampling algo rithm for nonsmooth nonconvex optimization. SIAM J. Optim 18(2), 379–388 (2007)

  22. [22]

    Kiwiel, K.C.: A nonderivative version of the gradient sa mpling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim 20(4), 1983–1994 (2010)

  23. [23]

    Curtis, F.E., Que, X.: An adaptive gradient sampling alg orithm for nonconvex nonsmooth optimization. Optim. Methods Softw 28(6), 1302–1324 (2013)

  24. [24]

    Springer (2006)

    Nocedal, J., Wright, S.J.: Numerical Optimization, sec ond edition. Springer (2006)

  25. [25]

    Curtis, F.E., Overton, M.L.: A sequential quadratic pro gramming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J . Optim 22(2), 474–500 (2012)

  26. [26]

    Helou, E.S., Santos, A.S., Sim˜ oes, L.E.A.: On the local convergence analysis of the gradient sampling method for finite max-functions. J. Optim. Theory Title Suppressed Due to Excessive Length 29 Appl 175(1), 137–157 (2017)

  27. [27]

    Springer (2014)

    Bagirov, A., Karmitsa, N., M¨ akel¨ a, M.M.: Introductio n to nonsmooth opti- mization. Springer (2014)

  28. [28]

    CRC Press (1992)

    Evans, L.C., Gariepy, R.F.: Measure theory and fine prope rties of functions, Revised Edition. CRC Press (1992)

  29. [29]

    Springer (2004)

    Rockafellar, R.T., Wets, R.J.B.: Variational analysis . Springer (2004)

  30. [30]

    Princeton Univers ity Press (1970)

    Rockafellar, R.T.: Convex Analysis. Princeton Univers ity Press (1970)

  31. [31]

    John Wiley & Sons, Inc

    Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear P rogramming. John Wiley & Sons, Inc. (2006)

  32. [32]

    Springer (20 05)

    Ehrgott, M.: Multicriteria Optimization. Springer (20 05)

  33. [33]

    Lukˇ san, L., Vlˇ cek, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Tech. rep., Institute of Comput er Science, Academy of Sciences of the Czech Republic (2000)

  34. [34]

    Master’s thesis, New York University (2010)

    Skaaja, M.: Limited memory BFGS for nonsmooth optimizat ion. Master’s thesis, New York University (2010)

  35. [35]

    interactive system for universal functional optimization

    Lukˇ san, L., Tcma, M., Siska, M., Vlˇ cek, J., Ramesova, N.: Ufo 2002. interactive system for universal functional optimization. Tech. rep., Institute of Computer Science, Academy of Sciences of the Czech Republic (2002)

  36. [36]

    Grothey, A.: Decomposition methods for nonlinear nonco nvex optimization problems. Ph.D. thesis, University of Edinburgh (2001)

  37. [37]

    Dolan, E., Mor´ e, J.: Benchmarking optimization softwa re with performance profiles. Math. Program 91(2), 201–213 (2002)

  38. [38]

    Lewis, A.S., Overton, M.L.: Nonsmooth optimization via BFGS URL https://cs.nyu.edu/overton/papers/nonsmoothalg.html