pith. sign in

arxiv: 2511.22011 · v2 · submitted 2025-11-27 · 🧮 math.OC

A nonmonotone extrapolated proximal gradient-subgradient algorithm beyond global Lipschitz gradient continuity

Pith reviewed 2026-05-17 05:34 UTC · model grok-4.3

classification 🧮 math.OC
keywords proximal gradient methodnonmonotone line searchextrapolationKurdyka-Łojasiewicz propertyglobal convergencecomposite optimizationnon-Lipschitz gradient
0
0 comments X

The pith

The authors establish that their nonmonotone extrapolated proximal gradient-subgradient algorithm achieves global convergence under the Kurdyka-Łojasiewicz property for problems where the gradient is not globally Lipschitz continuous, even,

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses composite optimization problems where the smooth component's gradient is not globally Lipschitz continuous, which prevents direct application of standard proximal gradient methods. It proposes a problem-parameter-free algorithm that uses a nonmonotone line search to manage this lack of continuity and includes an extrapolation step for potential acceleration. A refined convergence analysis is provided under the Kurdyka-Łojasiewicz property, avoiding any assumptions that the iterates remain bounded. This advances the theoretical foundations for such methods in settings common to modern applications, as demonstrated by numerical experiments showing the benefits of combining extrapolation with nonmonotone search.

Core claim

We propose a novel problem-parameter-free algorithm that incorporates a carefully designed nonmonotone line search to handle the non-global Lipschitz gradient continuity, together with an extrapolation step to achieve potential acceleration. We establish a refined convergence analysis for the proposed algorithm under the Kurdyka-Łojasiewicz property, without requiring any boundedness assumptions on the iterates.

What carries the argument

Nonmonotone line search paired with an extrapolation step inside a proximal gradient-subgradient iteration

If this is right

  • The algorithm applies directly to composite problems without requiring knowledge or estimation of a global Lipschitz constant.
  • Global convergence to critical points holds whenever the objective meets the KL property at the sequence's limit points.
  • The extrapolation step can deliver faster practical progress while preserving the same convergence guarantees.
  • Numerical results confirm that the combination of extrapolation and nonmonotone search outperforms the non-extrapolated version on the tested instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same nonmonotone-plus-extrapolation design may transfer to other first-order splitting methods that currently rely on global Lipschitz assumptions.
  • Removing the bounded-iterate hypothesis could simplify convergence arguments for related subgradient methods in nonsmooth optimization.
  • Practical implementations might benefit from adaptive rules for the line-search parameters that still satisfy the descent conditions.

Load-bearing premise

The objective satisfies the Kurdyka-Łojasiewicz property at limit points and the nonmonotone line-search parameters meet the technical conditions needed for the descent inequality.

What would settle it

A composite optimization problem that satisfies the KL property at its limit points but produces a divergent sequence when the algorithm is run with line-search parameters that obey the stated conditions.

Figures

Figures reproduced from arXiv: 2511.22011 by Jingjing Hu, Lei Yang, Tianxiang Liu.

Figure 1
Figure 1. Figure 1: The average E(t) of 10 independent trials of different methods for solving (5.1). In summary, the simultaneous incorporation of the ZH-type nonmonotone line search and extrapolation in the proposed nexPGA framework is both necessary and impactful. These enhancements can improve the performance of the proximal gradient method and its variants, as evidenced by numerical results. Moreover, nexPGA operates wit… view at source ↗
read the original abstract

With the advancement of modern applications, an increasing number of composite optimization problems arise whose smooth component does not possess a globally Lipschitz continuous gradient. This setting prevents the direct use of the proximal gradient (PG) method and its variants, and has motivated a growing body of research on new PG-type methods and their convergence theory, in particular, global convergence analysis without imposing any explicit or implicit boundedness assumptions on the iterates. Until recently, the first complete analysis of this kind has been established for the PG method and its specific nonmonotone variants, which has since stimulated further exploration along this research direction. In this paper, we consider a general composite optimization model beyond the global Lipschitz gradient continuity setting. We propose a novel problem-parameter-free algorithm that incorporates a carefully designed nonmonotone line search to handle the non-global Lipschitz gradient continuity, together with an extrapolation step to achieve potential acceleration. Despite the added technical challenges introduced by combining extrapolation with nonmonotone line search, we establish a refined convergence analysis for the proposed algorithm under the Kurdyka-{\L} ojasiewicz property, without requiring any boundedness assumptions on the iterates. This work thus further advances the theoretical understanding of PG-type methods in the non-global Lipschitz gradient continuity setting. Finally, we conduct numerical experiments to illustrate the effectiveness of our algorithm and highlight the advantages of integrating extrapolation with a nonmonotone line search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a problem-parameter-free nonmonotone extrapolated proximal gradient-subgradient algorithm for composite nonsmooth optimization where the smooth term lacks global Lipschitz continuity. It combines a carefully designed nonmonotone line search with an extrapolation step and establishes a refined convergence analysis to critical points under the Kurdyka-Łojasiewicz property at limit points, without any boundedness assumptions on the iterates; numerical experiments illustrate practical performance.

Significance. If the central claims hold, the work meaningfully extends the recent line of parameter-free PG-type methods beyond global Lipschitz continuity by successfully integrating extrapolation with nonmonotone line search while preserving a clean KL-based convergence theory free of boundedness hypotheses. The explicit handling of the technical interaction between extrapolation and the nonmonotone acceptance condition, together with the absence of free parameters, constitutes a clear incremental advance.

minor comments (3)
  1. [§3.2] §3.2, line-search acceptance condition: the statement that the line search terminates in finitely many steps relies on local differentiability and the specific choice of the nonmonotone parameter sequence; a short remark clarifying why the extrapolation term does not interfere with this finite-termination argument would improve readability.
  2. [Numerical experiments] Table 1 and Figure 2: the reported iteration counts and function values would benefit from explicit mention of the line-search parameters (e.g., the values of η and the nonmonotone window size) used in each experiment to facilitate exact reproduction.
  3. [Abstract] The abstract contains a minor LaTeX rendering artifact in 'Kurdyka-Łojasiewicz'.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, as well as the recommendation for minor revision. The referee correctly identifies the key contributions: a parameter-free algorithm combining nonmonotone line search and extrapolation for composite optimization beyond global Lipschitz continuity, with KL-based convergence to critical points without boundedness assumptions on the iterates. We appreciate the recognition of the technical handling of the interaction between extrapolation and the nonmonotone acceptance condition.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an extrapolated proximal gradient-subgradient method with a nonmonotone line search designed to operate without global Lipschitz continuity of the gradient. Convergence to critical points is shown under the Kurdyka-Łojasiewicz property assumed only at the limit points of the generated sequence, together with standard technical conditions on the line-search parameters that produce a descent inequality. The proof chain proceeds from the line-search acceptance criterion (via differentiability) to a sufficient-descent relation and then applies the KL inequality to obtain convergence without any a-priori boundedness assumption on the iterates. None of these steps reduces by construction to a fitted parameter, a self-defined quantity, or a load-bearing self-citation; the KL property and line-search termination are external to the algorithm's internal tuning and are not renamed or smuggled in from prior author work. The derivation is therefore self-contained against standard external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Kurdyka-Łojasiewicz property as a domain assumption and on technical conditions for the nonmonotone line search; no free parameters or invented entities are introduced in the abstract description.

axioms (1)
  • domain assumption The objective function satisfies the Kurdyka-Łojasiewicz property
    Invoked to obtain the convergence result without boundedness assumptions, as stated in the abstract.

pith-pipeline@v0.9.0 · 5547 in / 1367 out tokens · 35980 ms · 2026-05-17T05:34:54.447334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Attouch and J

    H. Attouch and J. Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Mathematical Programming, 116(1):5–16, 2009

  2. [2]

    Attouch, J

    H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality.Mathematics of Operations Research, 35(2):438–457, 2010

  3. [3]

    Attouch, J

    H. Attouch, J. Bolte, and B.F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods.Mathematical Programming, 137(1):91–129, 2013

  4. [4]

    Bauschke, J

    H.H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2017

  5. [5]

    Beck and M

    A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

  6. [6]

    Birgin and J.M

    E.G. Birgin and J.M. Mart´ ınez.Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM, 2014

  7. [7]

    Bolte, A

    J. Bolte, A. Daniilidis, and A. Lewis. The Lojasiewicz inequality for nonsmooth suban- alytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4):1205–1223, 2007

  8. [8]

    Bolte, S

    J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014. 35

  9. [9]

    Bolte, S

    J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems.SIAM Journal on Optimization, 28(3):2131–2151, 2018

  10. [10]

    X. Chen, Z. Lu, and T.K. Pong. Penalty methods for a class of non-Lipschitz optimization problems.SIAM Journal on Optimization, 26(3):1465–1492, 2016

  11. [11]

    Combettes and J.-C

    P.L. Combettes and J.-C. Pesquet.Proximal Splitting Methods in Signal Processing, pages 185–212. Springer, New York, 2011

  12. [12]

    De Marchi

    A. De Marchi. Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4:10290, 2023

  13. [13]

    Dragomir, A

    R.-A. Dragomir, A. d’Aspremont, and J. Bolte. Quartic first-order methods for low-rank minimization.Journal of Optimization Theory and Applications, 189(2):341–363, 2021

  14. [14]

    Fan and R

    J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American Statistical Association, 96(456):1348–1360, 2001

  15. [15]

    P. Gong, C. Zhang, Z. Lu, J.Z. Huang, and J. Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. InInternational Conference on Machine Learning, pages 37–45, 2013

  16. [16]

    Gotoh, A

    J.-Y. Gotoh, A. Takeda, and K. Tono. DC formulations and algorithms for sparse optimiza- tion problems.Mathematical Programming, 169(1):141–176, 2018

  17. [17]

    Grippo, F

    L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newton’s method.SIAM Journal on Numerical Analysis, 23(4):707–716, 1986

  18. [18]

    X. Jia, C. Kanzow, and P. Mehlitz. Convergence analysis of the proximal gradient method in the presence of the Kurdyka– Lojasiewicz property without global Lipschitz assumptions. SIAM Journal on Optimization, 33(4):3038–3056, 2023

  19. [19]

    Jia and K

    X. Jia and K. Wang. Convergence analysis of nonmonotone proximal gradient meth- ods under local lipschitz continuity and Kurdyka– Lojasiewicz property.arXiv preprint arXiv:2411.19256v3, 2024

  20. [20]

    Kanzow and L

    C. Kanzow and L. Lehmann. Convergence of nonmonotone proximal gradient methods under the Kurdyka– Lojasiewicz property without a global Lipschitz assumption.Journal of Optimization Theory and Applications, 207(4), 2025

  21. [21]

    Kanzow and P

    C. Kanzow and P. Mehlitz. Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624– 646, 2022

  22. [22]

    Li and T.K

    G. Li and T.K. Pong. Calculus of the exponent of Kurdyka– Lojasiewicz inequality and its applications to linear convergence of first–order methods.Foundations of Computational Mathematics, 18(5):1199–1232, 2018

  23. [23]

    Li and Z

    H. Li and Z. Lin. Accelerated proximal gradient methods for nonconvex programming. In Advances in Neural Information Processing Systems, pages 379–387, 2015. 36

  24. [24]

    Lions and B

    P.L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979

  25. [25]

    Liu, T.K

    T. Liu, T.K. Pong, and A. Takeda. A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems.Mathematical Programming, 176(1):339–367, 2017

  26. [26]

    Y. Lou, P. Yin, Q. He, and J. Xin. Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2.Journal of Scientific Computing, 64(1):178–196, 2015

  27. [27]

    Mordukhovich, N.M

    B.S. Mordukhovich, N.M. Nam, and N.D. Yen. Fr´ echet subdifferential calculus and optimality conditions in nondifferentiable programming.Optimization, 55(5-6):685–708, 2006

  28. [28]

    Qian and S

    Y. Qian and S. Pan. Convergence of a class of nonmonotone descent methods for Kurdyka– Lojasiewicz optimization problems.SIAM Journal on Optimization, 33(2):638–651, 2023

  29. [29]

    Y. Qian, T. Tao, S. Pan, and H. Qi. Convergence of ZH-type nonmonotone descent method for Kurdyka– Lojasiewicz optimization problems.SIAM Journal on Optimization, 35(2):1089–1109, 2025

  30. [30]

    Y. Qian, T. Tao, S. Pan, and H. Qi. GLL-type nonmonotone descent methods revisited under Kurdyka– Lojasiewicz property.arXiv preprint arXiv:2504.11385, 2025

  31. [31]

    Rockafellar.Convex Analysis

    R.T. Rockafellar.Convex Analysis. Princeton University Press, Princeton, 1970

  32. [32]

    Rockafellar and R.J.-B

    R.T. Rockafellar and R.J.-B. Wets.Variational Analysis. Springer, 1998

  33. [33]

    Takahashi, M

    S. Takahashi, M. Fukuda, and M. Tanaka. New Bregman proximal type algorithms for solving DC optimization problems.Computational Optimization and Applications, 83:893–931, 2022

  34. [34]

    Themelis, L

    A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms.SIAM Journal on Optimization, 28(3):2274–2303, 2018

  35. [35]

    P. Wang, H. Liu, and A.M.-C. So. Linear convergence of a proximal alternating minimization method with extrapolation for ℓ1-norm principal component analysis.SIAM Journal on Optimization, 33(2):684–712, 2023

  36. [36]

    B. Wen, X. Chen, and T.K. Pong. Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems.SIAM Journal on Optimization, 27(1):124–145, 2017

  37. [37]

    B. Wen, X. Chen, and T.K. Pong. A proximal difference-of-convex algorithm with extrapo- lation.Computational Optimization and Applications, 69(2):297–324, 2018

  38. [38]

    Wright, R.D

    S.J. Wright, R.D. Nowak, and M.A.T. Figueiredo. Sparse reconstruction by separable approximation.IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009

  39. [39]

    L. Yang. Proximal gradient method with extrapolation and line search for a class of non-convex and non-smooth problems.Journal of Optimization Theory and Applications, 200(1):68–103, 2024. 37

  40. [40]

    L. Yang, J. Hu, and K.-C. Toh. An inexact Bregman proximal difference-of-convex algorithm with two types of relative stopping criteria.Journal of Scientific Computing, 103:91, 2025

  41. [41]

    Yang, T.K

    L. Yang, T.K. Pong, and X. Chen. A nonmonotone alternating updating method for a class of matrix factorization problems.SIAM Journal on Optimization, 28(4):3402–3430, 2018

  42. [42]

    P. Yin, Y. Lou, Q. He, and J. Xin. Minimization of ℓ1−2 for compressed sensing.SIAM Journal on Scientific Computing, 37(1):A536–A563, 2015

  43. [43]

    C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.The Annals of Statistics, 38(2):894–942, 2010

  44. [44]

    Zhang and W.W

    H. Zhang and W.W. Hager. A nonmonotone line search technique and its application to unconstrained optimization.SIAM Journal on Optimization, 14(4):1043–1056, 2004. 38