A nonmonotone extrapolated proximal gradient-subgradient algorithm beyond global Lipschitz gradient continuity
Pith reviewed 2026-05-17 05:34 UTC · model grok-4.3
The pith
The authors establish that their nonmonotone extrapolated proximal gradient-subgradient algorithm achieves global convergence under the Kurdyka-Łojasiewicz property for problems where the gradient is not globally Lipschitz continuous, even,
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a novel problem-parameter-free algorithm that incorporates a carefully designed nonmonotone line search to handle the non-global Lipschitz gradient continuity, together with an extrapolation step to achieve potential acceleration. We establish a refined convergence analysis for the proposed algorithm under the Kurdyka-Łojasiewicz property, without requiring any boundedness assumptions on the iterates.
What carries the argument
Nonmonotone line search paired with an extrapolation step inside a proximal gradient-subgradient iteration
If this is right
- The algorithm applies directly to composite problems without requiring knowledge or estimation of a global Lipschitz constant.
- Global convergence to critical points holds whenever the objective meets the KL property at the sequence's limit points.
- The extrapolation step can deliver faster practical progress while preserving the same convergence guarantees.
- Numerical results confirm that the combination of extrapolation and nonmonotone search outperforms the non-extrapolated version on the tested instances.
Where Pith is reading between the lines
- The same nonmonotone-plus-extrapolation design may transfer to other first-order splitting methods that currently rely on global Lipschitz assumptions.
- Removing the bounded-iterate hypothesis could simplify convergence arguments for related subgradient methods in nonsmooth optimization.
- Practical implementations might benefit from adaptive rules for the line-search parameters that still satisfy the descent conditions.
Load-bearing premise
The objective satisfies the Kurdyka-Łojasiewicz property at limit points and the nonmonotone line-search parameters meet the technical conditions needed for the descent inequality.
What would settle it
A composite optimization problem that satisfies the KL property at its limit points but produces a divergent sequence when the algorithm is run with line-search parameters that obey the stated conditions.
Figures
read the original abstract
With the advancement of modern applications, an increasing number of composite optimization problems arise whose smooth component does not possess a globally Lipschitz continuous gradient. This setting prevents the direct use of the proximal gradient (PG) method and its variants, and has motivated a growing body of research on new PG-type methods and their convergence theory, in particular, global convergence analysis without imposing any explicit or implicit boundedness assumptions on the iterates. Until recently, the first complete analysis of this kind has been established for the PG method and its specific nonmonotone variants, which has since stimulated further exploration along this research direction. In this paper, we consider a general composite optimization model beyond the global Lipschitz gradient continuity setting. We propose a novel problem-parameter-free algorithm that incorporates a carefully designed nonmonotone line search to handle the non-global Lipschitz gradient continuity, together with an extrapolation step to achieve potential acceleration. Despite the added technical challenges introduced by combining extrapolation with nonmonotone line search, we establish a refined convergence analysis for the proposed algorithm under the Kurdyka-{\L} ojasiewicz property, without requiring any boundedness assumptions on the iterates. This work thus further advances the theoretical understanding of PG-type methods in the non-global Lipschitz gradient continuity setting. Finally, we conduct numerical experiments to illustrate the effectiveness of our algorithm and highlight the advantages of integrating extrapolation with a nonmonotone line search.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a problem-parameter-free nonmonotone extrapolated proximal gradient-subgradient algorithm for composite nonsmooth optimization where the smooth term lacks global Lipschitz continuity. It combines a carefully designed nonmonotone line search with an extrapolation step and establishes a refined convergence analysis to critical points under the Kurdyka-Łojasiewicz property at limit points, without any boundedness assumptions on the iterates; numerical experiments illustrate practical performance.
Significance. If the central claims hold, the work meaningfully extends the recent line of parameter-free PG-type methods beyond global Lipschitz continuity by successfully integrating extrapolation with nonmonotone line search while preserving a clean KL-based convergence theory free of boundedness hypotheses. The explicit handling of the technical interaction between extrapolation and the nonmonotone acceptance condition, together with the absence of free parameters, constitutes a clear incremental advance.
minor comments (3)
- [§3.2] §3.2, line-search acceptance condition: the statement that the line search terminates in finitely many steps relies on local differentiability and the specific choice of the nonmonotone parameter sequence; a short remark clarifying why the extrapolation term does not interfere with this finite-termination argument would improve readability.
- [Numerical experiments] Table 1 and Figure 2: the reported iteration counts and function values would benefit from explicit mention of the line-search parameters (e.g., the values of η and the nonmonotone window size) used in each experiment to facilitate exact reproduction.
- [Abstract] The abstract contains a minor LaTeX rendering artifact in 'Kurdyka-Łojasiewicz'.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our manuscript, as well as the recommendation for minor revision. The referee correctly identifies the key contributions: a parameter-free algorithm combining nonmonotone line search and extrapolation for composite optimization beyond global Lipschitz continuity, with KL-based convergence to critical points without boundedness assumptions on the iterates. We appreciate the recognition of the technical handling of the interaction between extrapolation and the nonmonotone acceptance condition.
Circularity Check
No significant circularity detected
full rationale
The paper introduces an extrapolated proximal gradient-subgradient method with a nonmonotone line search designed to operate without global Lipschitz continuity of the gradient. Convergence to critical points is shown under the Kurdyka-Łojasiewicz property assumed only at the limit points of the generated sequence, together with standard technical conditions on the line-search parameters that produce a descent inequality. The proof chain proceeds from the line-search acceptance criterion (via differentiability) to a sufficient-descent relation and then applies the KL inequality to obtain convergence without any a-priori boundedness assumption on the iterates. None of these steps reduces by construction to a fitted parameter, a self-defined quantity, or a load-bearing self-citation; the KL property and line-search termination are external to the algorithm's internal tuning and are not renamed or smuggled in from prior author work. The derivation is therefore self-contained against standard external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The objective function satisfies the Kurdyka-Łojasiewicz property
Reference graph
Works this paper leans on
-
[1]
H. Attouch and J. Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features.Mathematical Programming, 116(1):5–16, 2009
work page 2009
-
[2]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality.Mathematics of Operations Research, 35(2):438–457, 2010
work page 2010
-
[3]
H. Attouch, J. Bolte, and B.F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods.Mathematical Programming, 137(1):91–129, 2013
work page 2013
-
[4]
H.H. Bauschke, J. Bolte, and M. Teboulle. A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2017
work page 2017
-
[5]
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009
work page 2009
-
[6]
E.G. Birgin and J.M. Mart´ ınez.Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM, 2014
work page 2014
- [7]
- [8]
- [9]
-
[10]
X. Chen, Z. Lu, and T.K. Pong. Penalty methods for a class of non-Lipschitz optimization problems.SIAM Journal on Optimization, 26(3):1465–1492, 2016
work page 2016
-
[11]
P.L. Combettes and J.-C. Pesquet.Proximal Splitting Methods in Signal Processing, pages 185–212. Springer, New York, 2011
work page 2011
- [12]
-
[13]
R.-A. Dragomir, A. d’Aspremont, and J. Bolte. Quartic first-order methods for low-rank minimization.Journal of Optimization Theory and Applications, 189(2):341–363, 2021
work page 2021
- [14]
-
[15]
P. Gong, C. Zhang, Z. Lu, J.Z. Huang, and J. Ye. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. InInternational Conference on Machine Learning, pages 37–45, 2013
work page 2013
- [16]
- [17]
-
[18]
X. Jia, C. Kanzow, and P. Mehlitz. Convergence analysis of the proximal gradient method in the presence of the Kurdyka– Lojasiewicz property without global Lipschitz assumptions. SIAM Journal on Optimization, 33(4):3038–3056, 2023
work page 2023
- [19]
-
[20]
C. Kanzow and L. Lehmann. Convergence of nonmonotone proximal gradient methods under the Kurdyka– Lojasiewicz property without a global Lipschitz assumption.Journal of Optimization Theory and Applications, 207(4), 2025
work page 2025
-
[21]
C. Kanzow and P. Mehlitz. Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624– 646, 2022
work page 2022
-
[22]
G. Li and T.K. Pong. Calculus of the exponent of Kurdyka– Lojasiewicz inequality and its applications to linear convergence of first–order methods.Foundations of Computational Mathematics, 18(5):1199–1232, 2018
work page 2018
- [23]
-
[24]
P.L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979
work page 1979
- [25]
-
[26]
Y. Lou, P. Yin, Q. He, and J. Xin. Computing sparse representation in a highly coherent dictionary based on difference of L1 and L2.Journal of Scientific Computing, 64(1):178–196, 2015
work page 2015
-
[27]
B.S. Mordukhovich, N.M. Nam, and N.D. Yen. Fr´ echet subdifferential calculus and optimality conditions in nondifferentiable programming.Optimization, 55(5-6):685–708, 2006
work page 2006
-
[28]
Y. Qian and S. Pan. Convergence of a class of nonmonotone descent methods for Kurdyka– Lojasiewicz optimization problems.SIAM Journal on Optimization, 33(2):638–651, 2023
work page 2023
-
[29]
Y. Qian, T. Tao, S. Pan, and H. Qi. Convergence of ZH-type nonmonotone descent method for Kurdyka– Lojasiewicz optimization problems.SIAM Journal on Optimization, 35(2):1089–1109, 2025
work page 2025
- [30]
-
[31]
R.T. Rockafellar.Convex Analysis. Princeton University Press, Princeton, 1970
work page 1970
-
[32]
R.T. Rockafellar and R.J.-B. Wets.Variational Analysis. Springer, 1998
work page 1998
-
[33]
S. Takahashi, M. Fukuda, and M. Tanaka. New Bregman proximal type algorithms for solving DC optimization problems.Computational Optimization and Applications, 83:893–931, 2022
work page 2022
-
[34]
A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms.SIAM Journal on Optimization, 28(3):2274–2303, 2018
work page 2018
-
[35]
P. Wang, H. Liu, and A.M.-C. So. Linear convergence of a proximal alternating minimization method with extrapolation for ℓ1-norm principal component analysis.SIAM Journal on Optimization, 33(2):684–712, 2023
work page 2023
-
[36]
B. Wen, X. Chen, and T.K. Pong. Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems.SIAM Journal on Optimization, 27(1):124–145, 2017
work page 2017
-
[37]
B. Wen, X. Chen, and T.K. Pong. A proximal difference-of-convex algorithm with extrapo- lation.Computational Optimization and Applications, 69(2):297–324, 2018
work page 2018
-
[38]
S.J. Wright, R.D. Nowak, and M.A.T. Figueiredo. Sparse reconstruction by separable approximation.IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009
work page 2009
-
[39]
L. Yang. Proximal gradient method with extrapolation and line search for a class of non-convex and non-smooth problems.Journal of Optimization Theory and Applications, 200(1):68–103, 2024. 37
work page 2024
-
[40]
L. Yang, J. Hu, and K.-C. Toh. An inexact Bregman proximal difference-of-convex algorithm with two types of relative stopping criteria.Journal of Scientific Computing, 103:91, 2025
work page 2025
- [41]
-
[42]
P. Yin, Y. Lou, Q. He, and J. Xin. Minimization of ℓ1−2 for compressed sensing.SIAM Journal on Scientific Computing, 37(1):A536–A563, 2015
work page 2015
-
[43]
C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.The Annals of Statistics, 38(2):894–942, 2010
work page 2010
-
[44]
H. Zhang and W.W. Hager. A nonmonotone line search technique and its application to unconstrained optimization.SIAM Journal on Optimization, 14(4):1043–1056, 2004. 38
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.