Proximal gradient-type method with generalized distance and convergence analysis without global descent lemma

Masaru Ito; Shotaro Yagishita

arxiv: 2505.00381 · v2 · submitted 2025-05-01 · 🧮 math.OC

Proximal gradient-type method with generalized distance and convergence analysis without global descent lemma

Shotaro Yagishita , Masaru Ito This is my paper

Pith reviewed 2026-05-22 17:59 UTC · model grok-4.3

classification 🧮 math.OC

keywords proximal gradient methodsnonconvex composite optimizationconvergence analysisgeneralized distanceinterior gradient methodsconic optimizationproximal mapping

0 comments

The pith

Proximal gradient methods for nonconvex composite problems converge using general proximal terms without a global descent property.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to analyze convergence of proximal gradient-type methods for nonconvex problems that minimize a smooth function plus a nonsmooth function. The usual requirement is dropped that the smooth term must satisfy a global descent inequality with the proximal term. Instead, the proofs rely on local arguments once the subproblems can be solved accurately enough. This change matters because it lets the proximal term be chosen mainly to make the subproblem tractable given the nonsmooth function, rather than forcing a compromise between descent and solvability. The same framework supplies new convergence guarantees for interior gradient methods on conic optimization problems.

Core claim

The central claim is that proximal gradient-type methods with arbitrary generalized distances admit convergence guarantees for nonconvex composite optimization without assuming any global descent property between the smooth term and the proximal mapping. The analysis uses local descent relations together with standard properness and lower-semicontinuity conditions that ensure the iterates are well-defined. As a direct consequence, the same arguments establish convergence for the interior gradient method applied to conic programs.

What carries the argument

Generalized proximal term (or generalized distance) placed inside the subproblem, which replaces the usual quadratic proximal term and allows the subproblem to be solved efficiently while convergence is recovered through local rather than global arguments.

Load-bearing premise

The chosen generalized distance must make the subproblems solvable to sufficient accuracy, and the overall objective must be proper and lower semicontinuous so the sequence of iterates remains well-defined.

What would settle it

Find a concrete nonconvex composite problem in which the smooth term has no global descent inequality with the chosen proximal term, apply the proximal gradient iteration, and check whether the generated sequence still converges to a stationary point under the paper's local accuracy conditions.

read the original abstract

We consider solving nonconvex composite optimization problems in which the sum of a smooth function and a nonsmooth function is minimized. Many of convergence analyses of proximal gradient-type methods rely on global descent property between the smooth term and its proximal term. On the other hand, the ability to efficiently solve the subproblem depends on the compatibility between the nonsmooth term and the proximal term. Selecting an appropriate proximal term by considering both factors simultaneously is generally difficult. We overcome this issue by providing convergence analyses for proximal gradient-type methods with general proximal terms, without requiring global descent property of the smooth term. As a byproduct, new convergence results of the interior gradient methods for conic optimization are also provided.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper relaxes the global descent lemma for proximal gradient methods on nonconvex problems by switching to local arguments and a generalized distance.

read the letter

The key takeaway is that the authors prove convergence for proximal gradient-type methods on nonconvex composite problems without using the global descent lemma on the smooth function. This relaxes a standard assumption and should allow more flexible proximal terms. They work with a generalized distance and rely on local arguments to establish the results. The paper also supplies new convergence statements for interior gradient methods applied to conic optimization as a byproduct. This seems like a direct response to the practical difficulty of choosing a proximal term that is both easy to minimize and satisfies the usual descent condition. The work does a good job of keeping the analysis general while dropping the global requirement. The assumptions that remain—properness, lower semicontinuity, and accurate enough subproblem solves—are standard and reasonable. Nothing suggests the argument is circular or depends on fitting the data to the conclusion. A minor concern is whether the local conditions translate into step sizes that are simple to compute or check in applications. The abstract indicates the proofs use local properties, but the precise requirements on the generalized distance and the accuracy of subproblem solutions would need close inspection in the full text. If those turn out to be mild, the contribution strengthens. This paper is for researchers focused on first-order methods and convergence theory in nonconvex optimization. Readers who follow work on proximal algorithms or conic problems could pick up useful ideas here. It deserves a serious referee because the relaxation is technically interesting and the claims are stated clearly enough to be checked. I would recommend sending it out for peer review rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The manuscript develops proximal gradient-type methods for nonconvex composite optimization using a generalized distance in place of the standard Euclidean proximal term. It derives convergence guarantees that avoid any global descent lemma on the smooth component, relying instead on local arguments together with standard properness and lower-semicontinuity assumptions. As a byproduct, new convergence statements are obtained for interior gradient methods applied to conic optimization problems.

Significance. If the local-analysis arguments are fully rigorous, the work increases the design freedom for proximal terms that are computationally tractable for the nonsmooth summand, even when those terms fail to satisfy a global descent inequality with the smooth part. The byproduct results for interior-point-style methods on conic programs would constitute a modest but useful extension of existing theory.

major comments (2)

[§3.2] §3.2, Eq. (3.8): the local descent inequality invoked to control the smooth term is stated without an explicit uniformity condition on the step-size sequence; it is unclear whether the same step-size rule remains valid for all iterates when the smooth function is nonconvex and the generalized distance is only assumed to satisfy the listed local properties.
[Theorem 4.3] Theorem 4.3: the convergence rate for the interior gradient method on the conic problem is obtained by specializing the general result, but the specialization appears to re-introduce a global Lipschitz assumption on the gradient that was avoided in the main development; this needs explicit justification or removal.

minor comments (2)

[§2] The notation distinguishing the generalized distance D from the proximal mapping T is introduced late; an early table or remark in §2 would improve readability.
[Introduction] Several citations to standard references on proximal methods (e.g., the original Beck–Teboulle paper) are missing from the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications and indicating where revisions will be made.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (3.8): the local descent inequality invoked to control the smooth term is stated without an explicit uniformity condition on the step-size sequence; it is unclear whether the same step-size rule remains valid for all iterates when the smooth function is nonconvex and the generalized distance is only assumed to satisfy the listed local properties.

Authors: We agree that an explicit uniformity statement would improve rigor. The local properties of the generalized distance, together with properness and lower-semicontinuity of the objective, ensure that all iterates remain in a compact neighborhood of any limit point where a uniform positive lower bound on admissible step sizes exists. We will revise §3.2 to state this uniformity condition explicitly and to note that the step-size rule can therefore be applied uniformly along the sequence. revision: yes
Referee: [Theorem 4.3] Theorem 4.3: the convergence rate for the interior gradient method on the conic problem is obtained by specializing the general result, but the specialization appears to re-introduce a global Lipschitz assumption on the gradient that was avoided in the main development; this needs explicit justification or removal.

Authors: The main development avoids a global descent lemma between the smooth function and the proximal mapping; it does not prohibit a global Lipschitz assumption on the gradient when the latter is natural for the problem class. For the interior gradient method on conic programs the smooth component is a self-concordant barrier or linear objective whose gradient is globally Lipschitz on the interior of the cone, a standard and well-justified hypothesis in that literature. The specialization therefore does not contradict the general result. We will add a short clarifying remark after Theorem 4.3 distinguishing the avoided descent lemma from the Lipschitz condition used only for the rate in this application. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a direct convergence analysis for proximal gradient-type methods with generalized distances, replacing the global descent lemma with local arguments under standard lower-semicontinuity, properness, and subproblem accuracy conditions. No load-bearing step reduces the main result to a fitted quantity, self-referential definition, or self-citation chain; the claims are derived from the iteration structure and well-definedness assumptions without renaming known results or smuggling ansatzes. The byproduct interior gradient results for conic problems follow the same general framework. The derivation is self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard assumptions from nonsmooth optimization (proper lower-semicontinuous functions, existence of minimizers of the proximal subproblems) together with the definition of a generalized distance that replaces the usual squared Euclidean term. No free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The objective is a sum of a differentiable function and a proper lower-semicontinuous function whose proximal mapping with respect to the generalized distance is well-defined.
Invoked when the global descent lemma is removed and local descent or sufficient decrease arguments are substituted.

pith-pipeline@v0.9.0 · 5641 in / 1349 out tokens · 45319 ms · 2026-05-22T17:59:09.964814+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean Jcost definition and cosh-log equivalence matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

D_exp(z, w) := ∑ ψ(z_j − w_j) where ψ(ξ) := (e^ξ + e^{-ξ})/2 − 1 = cosh(ξ) − 1
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean higher-derivative calibration and local J-cost forcing echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

convergence analyses … without requiring global descent property of the smooth term … local arguments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

Difference-of-convex learning: directional stationarity, opti- mality, and sparsity.SIAM Journal on Optimization, 27(3):1637–1665, 2017

Miju Ahn, Jong-Shi Pang, and Jack Xin. Difference-of-convex learning: directional stationarity, opti- mality, and sparsity.SIAM Journal on Optimization, 27(3):1637–1665, 2017

work page 2017
[2]

The trimmed lasso: Sparse recovery guarantees and practical optimization by the generalized soft-min penalty.SIAM Journal on Mathematics of Data Science, 3(3): 900–929, 2021

Tal Amir, Ronen Basri, and Boaz Nadler. The trimmed lasso: Sparse recovery guarantees and practical optimization by the generalized soft-min penalty.SIAM Journal on Mathematics of Data Science, 3(3): 900–929, 2021

work page 2021
[3]

Regularized Lotka–Volterra dynamical system as continuous proximal- like method in optimization.Journal of optimization theory and applications, 121:541–570, 2004

Hedy Attouch and Marc Teboulle. Regularized Lotka–Volterra dynamical system as continuous proximal- like method in optimization.Journal of optimization theory and applications, 121:541–570, 2004

work page 2004
[4]

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimiza- tion and projection methods for nonconvex problems: An approach based on the kurdyka- lojasiewicz inequality.Mathematics of operations research, 35(2):438–457, 2010

work page 2010
[5]

Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized gauss–seidel methods.Mathematical programming, 137(1):91–129, 2013

work page 2013
[6]

Interior gradient and epsilon-subgradient descent methods for constrained convex minimization.Mathematics of Operations Research, 29(1):1–26, 2004

Alfred Auslender and Marc Teboulle. Interior gradient and epsilon-subgradient descent methods for constrained convex minimization.Mathematics of Operations Research, 29(1):1–26, 2004

work page 2004
[7]

Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006

Alfred Auslender and Marc Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006

work page 2006
[8]

Interior proximal and multiplier methods based on second order homogeneous kernels.Mathematics of Operations Research, 24(3):645–668, 1999

Alfred Auslender, Marc Teboulle, and Sami Ben-Tiba. Interior proximal and multiplier methods based on second order homogeneous kernels.Mathematics of Operations Research, 24(3):645–668, 1999

work page 1999
[9]

A logarithmic-quadratic proximal method for variational inequalities.Computational Optimization: A Tribute to Olvi Mangasarian Volume I, pages 31–40, 1999

Alfred Auslender, Marc Teboulle, and Sami Ben-Tiba. A logarithmic-quadratic proximal method for variational inequalities.Computational Optimization: A Tribute to Olvi Mangasarian Volume I, pages 31–40, 1999. 27

work page 1999
[10]

A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2): 330–348, 2017

Heinz H Bauschke, J´ erˆ ome Bolte, and Marc Teboulle. A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2): 330–348, 2017

work page 2017
[11]

SIAM, 2017

Amir Beck.First-order methods in optimization. SIAM, 2017

work page 2017
[12]

A quasi-newton proximal splitting method.Advances in neural infor- mation processing systems, 25, 2012

Stephen Becker and Jalal Fadili. A quasi-newton proximal splitting method.Advances in neural infor- mation processing systems, 25, 2012

work page 2012
[13]

Athena scientific, 2nd edition, 1999

Dimitri P Bertsekas.Nonlinear Programming. Athena scientific, 2nd edition, 1999

work page 1999
[14]

Barrier operators and associated gradient-like dynamical systems for constrained minimization problems.SIAM journal on control and optimization, 42(4):1266–1292, 2003

J´ erˆ ome Bolte and Marc Teboulle. Barrier operators and associated gradient-like dynamical systems for constrained minimization problems.SIAM journal on control and optimization, 42(4):1266–1292, 2003

work page 2003
[15]

The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4): 1205–1223, 2007

J´ erˆ ome Bolte, Aris Daniilidis, and Adrian Lewis. The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4): 1205–1223, 2007

work page 2007
[16]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

work page 2014
[17]

First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems.SIAM Journal on Optimization, 28(3):2131–2151, 2018

J´ erˆ ome Bolte, Shoham Sabach, Marc Teboulle, and Yakov Vaisbourd. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems.SIAM Journal on Optimization, 28(3):2131–2151, 2018

work page 2018
[18]

Variable metric inexact line-search- based methods for nonsmooth optimization.SIAM journal on optimization, 26(2):891–921, 2016

Silvia Bonettini, Ignace Loris, Federica Porta, and Marco Prato. Variable metric inexact line-search- based methods for nonsmooth optimization.SIAM journal on optimization, 26(2):891–921, 2016

work page 2016
[19]

Convergence rates in forward–backward splitting.SIAM Journal on Optimization, 7(2):421–444, 1997

George HG Chen and R Tyrrell Rockafellar. Convergence rates in forward–backward splitting.SIAM Journal on Optimization, 7(2):421–444, 1997

work page 1997
[20]

Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4(Original research articles), 2023

Alberto De Marchi. Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4(Original research articles), 2023

work page 2023
[21]

Alberto De Marchi and Andreas Themelis. Proximal gradient algorithms under local lipschitz gradi- ent continuity: A convergence and robustness analysis of panoc.Journal of Optimization Theory and Applications, 194(3):771–794, 2022

work page 2022
[22]

Multiplicative iterative algorithms for convex programming.Linear Algebra and its Applications, 130:25–42, 1990

PPB Eggermont. Multiplicative iterative algorithms for convex programming.Linear Algebra and its Applications, 130:25–42, 1990

work page 1990
[23]

A generalized proximal point algorithm for certain non-convex minimization problems.International Journal of Systems Science, 12(8):989–1000, 1981

Masao Fukushima and Hisashi Mine. A generalized proximal point algorithm for certain non-convex minimization problems.International Journal of Systems Science, 12(8):989–1000, 1981

work page 1981
[24]

Convergence of a gradient projection method.Technical report LIDS-P-1201, 1982

Eli M Gafni and Dimitri P Bertsekas. Convergence of a gradient projection method.Technical report LIDS-P-1201, 1982

work page 1982
[25]

Dc formulations and algorithms for sparse optimization problems.Mathematical Programming, 169(1):141–176, 2018

Jun-ya Gotoh, Akiko Takeda, and Katsuya Tono. Dc formulations and algorithms for sparse optimization problems.Mathematical Programming, 169(1):141–176, 2018

work page 2018
[26]

Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization.Mathematical Programming, 160(1):1–32, 2016

Xiaoqin Hua and Nobuo Yamashita. Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization.Mathematical Programming, 160(1):1–32, 2016. 28

work page 2016
[27]

Two-levelℓ 1 minimiza- tion for compressed sensing.Signal Processing, 108:459–475, 2015

Xiaolin Huang, Yipeng Liu, Lei Shi, Sabine Van Huffel, and Johan AK Suykens. Two-levelℓ 1 minimiza- tion for compressed sensing.Signal Processing, 108:459–475, 2015

work page 2015
[28]

An interior point multiplicative method for optimization under positivity constraints

Alfredo N Iusem. An interior point multiplicative method for optimization under positivity constraints. Acta Applicandae Mathematica, 38:163–184, 1995

work page 1995
[29]

Multiplicative interior gradient methods for mini- mization over the nonnegative orthant.SIAM journal on control and optimization, 34(1):389–406, 1996

Alfredo N Iusem, BF Svaiter, and Marc Teboulle. Multiplicative interior gradient methods for mini- mization over the nonnegative orthant.SIAM journal on control and optimization, 34(1):389–406, 1996

work page 1996
[30]

Xiaoxi Jia, Christian Kanzow, and Patrick Mehlitz. Convergence analysis of the proximal gradient method in the presence of the kurdyka– lojasiewicz property without global lipschitz assumptions.SIAM Journal on Optimization, 33(4):3038–3056, 2023

work page 2023
[31]

Christian Kanzow and Leo Lehmann. Convergence of nonmonotone proximal gradient methods under the Kurdyka– lojasiewicz property without a global Lipschitz assumption.Journal of Optimization Theory and Applications, 207(1):4, 2025

work page 2025
[32]

Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624–646, 2022

Christian Kanzow and Patrick Mehlitz. Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624–646, 2022

work page 2022
[33]

Algorithms for non-negative matrix factorization.Advances in neural information processing systems, 13, 2000

Daniel Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization.Advances in neural information processing systems, 13, 2000

work page 2000
[34]

Proximal Newton-type methods for minimizing composite functions.SIAM Journal on Optimization, 24(3):1420–1443, 2014

Jason D Lee, Yuekai Sun, and Michael A Saunders. Proximal Newton-type methods for minimizing composite functions.SIAM Journal on Optimization, 24(3):1420–1443, 2014

work page 2014
[35]

Guoyin Li and Ting Kei Pong. Calculus of the exponent of kurdyka- lojasiewicz inequality and its applications to linear convergence of first-order methods.Foundations of Computational Mathematics, 18:1199–1232, 2017

work page 2017
[36]

Relatively smooth convex optimization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

Haihao Lu, Robert M Freund, and Yurii Nesterov. Relatively smooth convex optimization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

work page 2018
[37]

Exact penalization for cardinality and rank-constrained optimization problems via partial regularization.Optimization Methods and Software, 38(2):412–433, 2023

Zhaosong Lu, Xiaorui Li, and Shuhuang Xiang. Exact penalization for cardinality and rank-constrained optimization problems via partial regularization.Optimization Methods and Software, 38(2):412–433, 2023

work page 2023
[38]

New improved penalty methods for sparse recon- struction based on difference of two norms.Technical report, 2013

Ziyan Luo, Yingnan Wang, and Xianglilan Zhang. New improved penalty methods for sparse recon- struction based on difference of two norms.Technical report, 2013. doi: 10.13140/RG.2.1.3256.3369

work page doi:10.13140/rg.2.1.3256.3369 2013
[39]

Springer Science & Business Media, 2009

R Tyrrell Rockafellar and Roger J-B Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009

work page 2009
[40]

Least median of squares regression.Journal of the American statistical association, 79(388):871–880, 1984

Peter J Rousseeuw. Least median of squares regression.Journal of the American statistical association, 79(388):871–880, 1984

work page 1984
[41]

Maximum likelihood reconstruction for emission tomography.IEEE transactions on medical imaging, 1(2):113–122, 1982

Larry A Shepp and Yehuda Vardi. Maximum likelihood reconstruction for emission tomography.IEEE transactions on medical imaging, 1(2):113–122, 1982

work page 1982
[42]

Springer Science & Business Media, 1997

Masahiro Shiota.Geometry of Subanalytic and Semialgebraic Sets. Springer Science & Business Media, 1997. 29

work page 1997
[43]

Trimmed lasso regression estimator for binary response data.Statistics & Probability Letters, 159:108679, 2020

Hongwei Sun, Yuehua Cui, Qian Gao, and Tong Wang. Trimmed lasso regression estimator for binary response data.Statistics & Probability Letters, 159:108679, 2020

work page 2020
[44]

Majorization-minimization bregman proximal gra- dient algorithms for nmf with the kullback–leibler divergence.Journal of Optimization Theory and Applications, 208(1):1–34, 2026

Shota Takahashi, Mirai Tanaka, and Shiro Ikeda. Majorization-minimization bregman proximal gra- dient algorithms for nmf with the kullback–leibler divergence.Journal of Optimization Theory and Applications, 208(1):1–34, 2026

work page 2026
[45]

A coordinate gradient descent method for nonsmooth separable mini- mization.Mathematical Programming, 117:387–423, 2009

Paul Tseng and Sangwoon Yun. A coordinate gradient descent method for nonsmooth separable mini- mization.Mathematical Programming, 117:387–423, 2009

work page 2009
[46]

Forward-backward splitting with bregman distances.Vietnam Journal of Mathe- matics, 45(3):519–539, 2017

Quang Van Nguyen. Forward-backward splitting with bregman distances.Vietnam Journal of Mathe- matics, 45(3):519–539, 2017

work page 2017
[47]

A statistical model for positron emission tomogra- phy.Journal of the American statistical Association, 80(389):8–20, 1985

Yehuda Vardi, Larry A Shepp, and Linda Kaufman. A statistical model for positron emission tomogra- phy.Journal of the American statistical Association, 80(389):8–20, 1985

work page 1985
[48]

A bayesian approach to real estate assessment.Studies in Bayesian Econometrics and Statistics in Honor of Leonard J/Savage, 1975

H Varian. A bayesian approach to real estate assessment.Studies in Bayesian Econometrics and Statistics in Honor of Leonard J/Savage, 1975

work page 1975
[49]

Fast algorithm for sparse least trimmed squares via trimmed-regularized reformula- tion.arXiv preprint arXiv:2410.04554, 2024

Shotaro Yagishita. Fast algorithm for sparse least trimmed squares via trimmed-regularized reformula- tion.arXiv preprint arXiv:2410.04554, 2024

work page arXiv 2024
[50]

Exact penalization at d-stationary points of cardinality-or rank- constrained problem.Optimization, pages 1–35, 2025

Shotaro Yagishita and Jun-ya Gotoh. Exact penalization at d-stationary points of cardinality-or rank- constrained problem.Optimization, pages 1–35, 2025

work page 2025
[51]

Bayesian estimation and prediction using asymmetric loss functions.Journal of the American Statistical Association, 81(394):446–451, 1986

Arnold Zellner. Bayesian estimation and prediction using asymmetric loss functions.Journal of the American Statistical Association, 81(394):446–451, 1986

work page 1986
[52]

A nonmonotone line search technique and its application to unconstrained optimization.SIAM journal on Optimization, 14(4):1043–1056, 2004

Hongchao Zhang and William W Hager. A nonmonotone line search technique and its application to unconstrained optimization.SIAM journal on Optimization, 14(4):1043–1056, 2004. 30

work page 2004

[1] [1]

Difference-of-convex learning: directional stationarity, opti- mality, and sparsity.SIAM Journal on Optimization, 27(3):1637–1665, 2017

Miju Ahn, Jong-Shi Pang, and Jack Xin. Difference-of-convex learning: directional stationarity, opti- mality, and sparsity.SIAM Journal on Optimization, 27(3):1637–1665, 2017

work page 2017

[2] [2]

The trimmed lasso: Sparse recovery guarantees and practical optimization by the generalized soft-min penalty.SIAM Journal on Mathematics of Data Science, 3(3): 900–929, 2021

Tal Amir, Ronen Basri, and Boaz Nadler. The trimmed lasso: Sparse recovery guarantees and practical optimization by the generalized soft-min penalty.SIAM Journal on Mathematics of Data Science, 3(3): 900–929, 2021

work page 2021

[3] [3]

Regularized Lotka–Volterra dynamical system as continuous proximal- like method in optimization.Journal of optimization theory and applications, 121:541–570, 2004

Hedy Attouch and Marc Teboulle. Regularized Lotka–Volterra dynamical system as continuous proximal- like method in optimization.Journal of optimization theory and applications, 121:541–570, 2004

work page 2004

[4] [4]

H´ edy Attouch, J´ erˆ ome Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimiza- tion and projection methods for nonconvex problems: An approach based on the kurdyka- lojasiewicz inequality.Mathematics of operations research, 35(2):438–457, 2010

work page 2010

[5] [5]

Hedy Attouch, J´ erˆ ome Bolte, and Benar Fux Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized gauss–seidel methods.Mathematical programming, 137(1):91–129, 2013

work page 2013

[6] [6]

Interior gradient and epsilon-subgradient descent methods for constrained convex minimization.Mathematics of Operations Research, 29(1):1–26, 2004

Alfred Auslender and Marc Teboulle. Interior gradient and epsilon-subgradient descent methods for constrained convex minimization.Mathematics of Operations Research, 29(1):1–26, 2004

work page 2004

[7] [7]

Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006

Alfred Auslender and Marc Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006

work page 2006

[8] [8]

Interior proximal and multiplier methods based on second order homogeneous kernels.Mathematics of Operations Research, 24(3):645–668, 1999

Alfred Auslender, Marc Teboulle, and Sami Ben-Tiba. Interior proximal and multiplier methods based on second order homogeneous kernels.Mathematics of Operations Research, 24(3):645–668, 1999

work page 1999

[9] [9]

A logarithmic-quadratic proximal method for variational inequalities.Computational Optimization: A Tribute to Olvi Mangasarian Volume I, pages 31–40, 1999

Alfred Auslender, Marc Teboulle, and Sami Ben-Tiba. A logarithmic-quadratic proximal method for variational inequalities.Computational Optimization: A Tribute to Olvi Mangasarian Volume I, pages 31–40, 1999. 27

work page 1999

[10] [10]

A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2): 330–348, 2017

Heinz H Bauschke, J´ erˆ ome Bolte, and Marc Teboulle. A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2): 330–348, 2017

work page 2017

[11] [11]

SIAM, 2017

Amir Beck.First-order methods in optimization. SIAM, 2017

work page 2017

[12] [12]

A quasi-newton proximal splitting method.Advances in neural infor- mation processing systems, 25, 2012

Stephen Becker and Jalal Fadili. A quasi-newton proximal splitting method.Advances in neural infor- mation processing systems, 25, 2012

work page 2012

[13] [13]

Athena scientific, 2nd edition, 1999

Dimitri P Bertsekas.Nonlinear Programming. Athena scientific, 2nd edition, 1999

work page 1999

[14] [14]

Barrier operators and associated gradient-like dynamical systems for constrained minimization problems.SIAM journal on control and optimization, 42(4):1266–1292, 2003

J´ erˆ ome Bolte and Marc Teboulle. Barrier operators and associated gradient-like dynamical systems for constrained minimization problems.SIAM journal on control and optimization, 42(4):1266–1292, 2003

work page 2003

[15] [15]

The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4): 1205–1223, 2007

J´ erˆ ome Bolte, Aris Daniilidis, and Adrian Lewis. The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems.SIAM Journal on Optimization, 17(4): 1205–1223, 2007

work page 2007

[16] [16]

Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

J´ erˆ ome Bolte, Shoham Sabach, and Marc Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems.Mathematical Programming, 146(1):459–494, 2014

work page 2014

[17] [17]

First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems.SIAM Journal on Optimization, 28(3):2131–2151, 2018

J´ erˆ ome Bolte, Shoham Sabach, Marc Teboulle, and Yakov Vaisbourd. First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems.SIAM Journal on Optimization, 28(3):2131–2151, 2018

work page 2018

[18] [18]

Variable metric inexact line-search- based methods for nonsmooth optimization.SIAM journal on optimization, 26(2):891–921, 2016

Silvia Bonettini, Ignace Loris, Federica Porta, and Marco Prato. Variable metric inexact line-search- based methods for nonsmooth optimization.SIAM journal on optimization, 26(2):891–921, 2016

work page 2016

[19] [19]

Convergence rates in forward–backward splitting.SIAM Journal on Optimization, 7(2):421–444, 1997

George HG Chen and R Tyrrell Rockafellar. Convergence rates in forward–backward splitting.SIAM Journal on Optimization, 7(2):421–444, 1997

work page 1997

[20] [20]

Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4(Original research articles), 2023

Alberto De Marchi. Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4(Original research articles), 2023

work page 2023

[21] [21]

Alberto De Marchi and Andreas Themelis. Proximal gradient algorithms under local lipschitz gradi- ent continuity: A convergence and robustness analysis of panoc.Journal of Optimization Theory and Applications, 194(3):771–794, 2022

work page 2022

[22] [22]

Multiplicative iterative algorithms for convex programming.Linear Algebra and its Applications, 130:25–42, 1990

PPB Eggermont. Multiplicative iterative algorithms for convex programming.Linear Algebra and its Applications, 130:25–42, 1990

work page 1990

[23] [23]

A generalized proximal point algorithm for certain non-convex minimization problems.International Journal of Systems Science, 12(8):989–1000, 1981

Masao Fukushima and Hisashi Mine. A generalized proximal point algorithm for certain non-convex minimization problems.International Journal of Systems Science, 12(8):989–1000, 1981

work page 1981

[24] [24]

Convergence of a gradient projection method.Technical report LIDS-P-1201, 1982

Eli M Gafni and Dimitri P Bertsekas. Convergence of a gradient projection method.Technical report LIDS-P-1201, 1982

work page 1982

[25] [25]

Dc formulations and algorithms for sparse optimization problems.Mathematical Programming, 169(1):141–176, 2018

Jun-ya Gotoh, Akiko Takeda, and Katsuya Tono. Dc formulations and algorithms for sparse optimization problems.Mathematical Programming, 169(1):141–176, 2018

work page 2018

[26] [26]

Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization.Mathematical Programming, 160(1):1–32, 2016

Xiaoqin Hua and Nobuo Yamashita. Block coordinate proximal gradient methods with variable bregman functions for nonsmooth separable optimization.Mathematical Programming, 160(1):1–32, 2016. 28

work page 2016

[27] [27]

Two-levelℓ 1 minimiza- tion for compressed sensing.Signal Processing, 108:459–475, 2015

Xiaolin Huang, Yipeng Liu, Lei Shi, Sabine Van Huffel, and Johan AK Suykens. Two-levelℓ 1 minimiza- tion for compressed sensing.Signal Processing, 108:459–475, 2015

work page 2015

[28] [28]

An interior point multiplicative method for optimization under positivity constraints

Alfredo N Iusem. An interior point multiplicative method for optimization under positivity constraints. Acta Applicandae Mathematica, 38:163–184, 1995

work page 1995

[29] [29]

Multiplicative interior gradient methods for mini- mization over the nonnegative orthant.SIAM journal on control and optimization, 34(1):389–406, 1996

Alfredo N Iusem, BF Svaiter, and Marc Teboulle. Multiplicative interior gradient methods for mini- mization over the nonnegative orthant.SIAM journal on control and optimization, 34(1):389–406, 1996

work page 1996

[30] [30]

Xiaoxi Jia, Christian Kanzow, and Patrick Mehlitz. Convergence analysis of the proximal gradient method in the presence of the kurdyka– lojasiewicz property without global lipschitz assumptions.SIAM Journal on Optimization, 33(4):3038–3056, 2023

work page 2023

[31] [31]

Christian Kanzow and Leo Lehmann. Convergence of nonmonotone proximal gradient methods under the Kurdyka– lojasiewicz property without a global Lipschitz assumption.Journal of Optimization Theory and Applications, 207(1):4, 2025

work page 2025

[32] [32]

Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624–646, 2022

Christian Kanzow and Patrick Mehlitz. Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195(2):624–646, 2022

work page 2022

[33] [33]

Algorithms for non-negative matrix factorization.Advances in neural information processing systems, 13, 2000

Daniel Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization.Advances in neural information processing systems, 13, 2000

work page 2000

[34] [34]

Proximal Newton-type methods for minimizing composite functions.SIAM Journal on Optimization, 24(3):1420–1443, 2014

Jason D Lee, Yuekai Sun, and Michael A Saunders. Proximal Newton-type methods for minimizing composite functions.SIAM Journal on Optimization, 24(3):1420–1443, 2014

work page 2014

[35] [35]

Guoyin Li and Ting Kei Pong. Calculus of the exponent of kurdyka- lojasiewicz inequality and its applications to linear convergence of first-order methods.Foundations of Computational Mathematics, 18:1199–1232, 2017

work page 2017

[36] [36]

Relatively smooth convex optimization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

Haihao Lu, Robert M Freund, and Yurii Nesterov. Relatively smooth convex optimization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

work page 2018

[37] [37]

Exact penalization for cardinality and rank-constrained optimization problems via partial regularization.Optimization Methods and Software, 38(2):412–433, 2023

Zhaosong Lu, Xiaorui Li, and Shuhuang Xiang. Exact penalization for cardinality and rank-constrained optimization problems via partial regularization.Optimization Methods and Software, 38(2):412–433, 2023

work page 2023

[38] [38]

New improved penalty methods for sparse recon- struction based on difference of two norms.Technical report, 2013

Ziyan Luo, Yingnan Wang, and Xianglilan Zhang. New improved penalty methods for sparse recon- struction based on difference of two norms.Technical report, 2013. doi: 10.13140/RG.2.1.3256.3369

work page doi:10.13140/rg.2.1.3256.3369 2013

[39] [39]

Springer Science & Business Media, 2009

R Tyrrell Rockafellar and Roger J-B Wets.Variational analysis, volume 317. Springer Science & Business Media, 2009

work page 2009

[40] [40]

Least median of squares regression.Journal of the American statistical association, 79(388):871–880, 1984

Peter J Rousseeuw. Least median of squares regression.Journal of the American statistical association, 79(388):871–880, 1984

work page 1984

[41] [41]

Maximum likelihood reconstruction for emission tomography.IEEE transactions on medical imaging, 1(2):113–122, 1982

Larry A Shepp and Yehuda Vardi. Maximum likelihood reconstruction for emission tomography.IEEE transactions on medical imaging, 1(2):113–122, 1982

work page 1982

[42] [42]

Springer Science & Business Media, 1997

Masahiro Shiota.Geometry of Subanalytic and Semialgebraic Sets. Springer Science & Business Media, 1997. 29

work page 1997

[43] [43]

Trimmed lasso regression estimator for binary response data.Statistics & Probability Letters, 159:108679, 2020

Hongwei Sun, Yuehua Cui, Qian Gao, and Tong Wang. Trimmed lasso regression estimator for binary response data.Statistics & Probability Letters, 159:108679, 2020

work page 2020

[44] [44]

Majorization-minimization bregman proximal gra- dient algorithms for nmf with the kullback–leibler divergence.Journal of Optimization Theory and Applications, 208(1):1–34, 2026

Shota Takahashi, Mirai Tanaka, and Shiro Ikeda. Majorization-minimization bregman proximal gra- dient algorithms for nmf with the kullback–leibler divergence.Journal of Optimization Theory and Applications, 208(1):1–34, 2026

work page 2026

[45] [45]

A coordinate gradient descent method for nonsmooth separable mini- mization.Mathematical Programming, 117:387–423, 2009

Paul Tseng and Sangwoon Yun. A coordinate gradient descent method for nonsmooth separable mini- mization.Mathematical Programming, 117:387–423, 2009

work page 2009

[46] [46]

Forward-backward splitting with bregman distances.Vietnam Journal of Mathe- matics, 45(3):519–539, 2017

Quang Van Nguyen. Forward-backward splitting with bregman distances.Vietnam Journal of Mathe- matics, 45(3):519–539, 2017

work page 2017

[47] [47]

A statistical model for positron emission tomogra- phy.Journal of the American statistical Association, 80(389):8–20, 1985

Yehuda Vardi, Larry A Shepp, and Linda Kaufman. A statistical model for positron emission tomogra- phy.Journal of the American statistical Association, 80(389):8–20, 1985

work page 1985

[48] [48]

A bayesian approach to real estate assessment.Studies in Bayesian Econometrics and Statistics in Honor of Leonard J/Savage, 1975

H Varian. A bayesian approach to real estate assessment.Studies in Bayesian Econometrics and Statistics in Honor of Leonard J/Savage, 1975

work page 1975

[49] [49]

Fast algorithm for sparse least trimmed squares via trimmed-regularized reformula- tion.arXiv preprint arXiv:2410.04554, 2024

Shotaro Yagishita. Fast algorithm for sparse least trimmed squares via trimmed-regularized reformula- tion.arXiv preprint arXiv:2410.04554, 2024

work page arXiv 2024

[50] [50]

Exact penalization at d-stationary points of cardinality-or rank- constrained problem.Optimization, pages 1–35, 2025

Shotaro Yagishita and Jun-ya Gotoh. Exact penalization at d-stationary points of cardinality-or rank- constrained problem.Optimization, pages 1–35, 2025

work page 2025

[51] [51]

Bayesian estimation and prediction using asymmetric loss functions.Journal of the American Statistical Association, 81(394):446–451, 1986

Arnold Zellner. Bayesian estimation and prediction using asymmetric loss functions.Journal of the American Statistical Association, 81(394):446–451, 1986

work page 1986

[52] [52]

A nonmonotone line search technique and its application to unconstrained optimization.SIAM journal on Optimization, 14(4):1043–1056, 2004

Hongchao Zhang and William W Hager. A nonmonotone line search technique and its application to unconstrained optimization.SIAM journal on Optimization, 14(4):1043–1056, 2004. 30

work page 2004