Locally Linear Convergence for Nonsmooth Convex Optimization via Coupled Smoothing and Momentum

Peyman Mohajerin Esfahani; Reza Rahimi Baghbadorani; Sergio Grammatico

arxiv: 2511.10239 · v3 · submitted 2025-11-13 · 🧮 math.OC

Locally Linear Convergence for Nonsmooth Convex Optimization via Coupled Smoothing and Momentum

Reza Rahimi Baghbadorani , Sergio Grammatico , Peyman Mohajerin Esfahani This is my paper

Pith reviewed 2026-05-17 22:40 UTC · model grok-4.3

classification 🧮 math.OC

keywords nonsmooth convex optimizationsmoothing methodsmomentum accelerationlocal linear convergencesublinear convergenceLassoMaxCut

0 comments

The pith

Coupling the smoothing parameter with the momentum parameter achieves optimal global sublinear convergence and local linear convergence for nonsmooth convex optimization under a local strong convexity condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an adaptive accelerated smoothing technique for nonsmooth convex optimization problems in which the smoothing update rule is directly coupled to the momentum parameter. The resulting algorithm attains the known optimal global convergence rate of O(1/k) for the class of problems studied. When the nonsmooth term satisfies a locally strong convexity condition near the solution, the same coupling additionally produces a local linear convergence rate. The approach is extended to objectives consisting of the sum of two nonsmooth convex functions and is illustrated on Lasso regression, MaxCut, nuclear-norm minimization, and l1-regularized model predictive control, where practical runs display an initial faster transient followed by the predicted linear regime.

Core claim

By coupling the smoothing parameter decrease with the momentum parameter through a specific adaptive rule, the algorithm attains the provably optimal global rate of O(1/k) for minimizing a nonsmooth convex objective. Under the additional assumption that the nonsmooth term fulfills a locally strong convexity condition near the optimum, the method further guarantees local linear convergence. The same coupling extends to the sum of two nonsmooth convex functions and explains the observed practical two-phase behavior of faster initial convergence followed by asymptotic linear convergence.

What carries the argument

The adaptive accelerated smoothing technique whose smoothing update rule is coupled to the momentum parameter.

Load-bearing premise

The nonsmooth term must satisfy a locally strong convexity condition near the optimum and the smoothing and momentum parameters must obey a precise coupling rule.

What would settle it

A numerical example in which a nonsmooth convex function with local strong convexity near the optimum produces only sublinear convergence under the proposed coupled scheme would disprove the local linear rate claim.

Figures

Figures reproduced from arXiv: 2511.10239 by Peyman Mohajerin Esfahani, Reza Rahimi Baghbadorani, Sergio Grammatico.

**Figure 1.** Figure 1: Convergence rate of µk. 5 Numerical experiments We demonstrate the performance of Algorithm 1 (with c = 0) on four popular classes of problems studied in the 9 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Numerical results for four problem classes. The first [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

read the original abstract

We propose an adaptive accelerated smoothing technique for a nonsmooth convex optimization problem where the smoothing update rule is coupled with the momentum parameter. We also extend the setting to the case where the objective function is the sum of two nonsmooth functions. With regard to convergence rate, we provide the global (optimal) sublinear convergence guarantees of O(1/k), which is known to be provably optimal for the studied class of functions, along with a local linear rate if the nonsmooth term fulfills a so-call locally strong convexity condition. We validate the performance of our algorithm on several problem classes, including regression with the l1-norm (the Lasso problem), sparse semidefinite programming (the MaxCut problem), Nuclear norm minimization with application in model free fault diagnosis, and l_1-regularized model predictive control to showcase the benefits of the coupling. An interesting observation is that although our global convergence result guarantees O(1/k) convergence, we consistently observe a practical transient convergence rate of O(1/k^2), followed by asymptotic linear convergence as anticipated by the theoretical result. This two-phase behavior can also be explained in view of the proposed smoothing rule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper couples smoothing directly to the momentum sequence to get the standard global O(1/k) rate plus a local linear rate under local strong convexity, with experiments showing a practical faster transient.

read the letter

The main thing to know is that this paper ties the smoothing parameter update straight to the momentum term in an accelerated first-order method for nonsmooth convex problems. That specific coupling is what supports both the optimal global O(1/k) rate and a local linear rate when the nonsmooth term satisfies local strong convexity near the optimum. They also extend the setup to the sum of two nonsmooth functions. The global rate itself is the known optimal one for this problem class, so the lift comes from making the local claim work with the same schedule. What stands out in practice is the observed transient O(1/k^2) phase before linear convergence sets in, which they connect back to how the smoothing is driven to zero. The experiments on Lasso, MaxCut, nuclear norm minimization for fault diagnosis, and l1-regularized MPC are relevant and show the method on concrete application problems. The soft spot is the local linear rate. It depends on the coupled rule keeping the smoothed objective close enough to the original while preserving a positive strong-convexity modulus in a neighborhood of the solution. Any mismatch in how fast the smoothing bias vanishes versus how the momentum approaches 1 could break the contraction, and the abstract does not supply an explicit lower bound on the surviving modulus or a sensitivity check on the coupling constants. The global analysis follows standard smoothing-plus-acceleration arguments once the parameter goes to zero at the right rate. This paper is for people working on accelerated methods for nonsmooth convex optimization, particularly those interested in sparse regression, SDP relaxations, or model-predictive control. A reader who follows similar smoothing or momentum techniques would find the coupling idea and the reported two-phase behavior useful. The algorithmic choice plus the application results give it enough substance to deserve a serious referee rather than a desk reject. I would recommend sending it out for peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an adaptive accelerated smoothing method for nonsmooth convex optimization in which the smoothing parameter is explicitly coupled to the momentum sequence. It establishes global O(1/k) sublinear convergence (optimal for the problem class) and, under an additional locally strong convexity assumption on the nonsmooth term, a local linear rate. The framework is extended to the sum of two nonsmooth convex functions. Numerical results on Lasso, MaxCut, nuclear-norm minimization, and l1-regularized MPC illustrate practical performance, including an observed transient O(1/k^2) phase followed by linear convergence.

Significance. If the local linear-rate analysis is complete and the coupling schedule is shown to be robust, the work would provide a meaningful advance by delivering both the optimal global rate and a faster local rate for nonsmooth convex problems, together with a plausible explanation for the commonly observed two-phase practical behavior. The extension to two nonsmooth terms and the concrete algorithmic coupling are additional strengths.

major comments (3)

[§4, Theorem 4.3] §4, Theorem 4.3 (local linear rate): the proof relies on the smoothing bias vanishing at a rate compatible with the momentum parameter approaching 1, yet no explicit lower bound is derived for the effective strong-convexity modulus that survives the smoothing perturbation; without this, it is unclear for which range of the local strong-convexity constant the linear contraction remains valid.
[§5] §5 (two-nonsmooth case): the error bound between the doubly smoothed objective and the original function is stated as O(μ_k), but the constants are not expressed in terms of the individual Lipschitz moduli of the two nonsmooth terms; this omission makes it impossible to verify that the same coupling schedule preserves a positive effective strong-convexity modulus near the optimum.
[§3.2, coupling rule (3.4)] §3.2, coupling rule (3.4): the specific choice μ_k = Θ(1/k) and β_k = 1 − Θ(1/k) is shown to yield the global O(1/k) rate, but the manuscript does not quantify the perturbation size in the momentum sequence that would destroy the local linear regime, leaving the robustness of the schedule uncharacterized.

minor comments (2)

[Figure 2] The caption of Figure 2 should explicitly state the vertical axis (function-value gap or distance to optimum) and the horizontal axis scaling (iteration count or CPU time).
[Introduction] A short paragraph comparing the proposed coupling to existing adaptive-smoothing schedules (e.g., Nesterov 2005, Beck & Teboulle) would help readers situate the novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We address each of the major comments in detail below and will revise the manuscript accordingly to incorporate the clarifications.

read point-by-point responses

Referee: [§4, Theorem 4.3] §4, Theorem 4.3 (local linear rate): the proof relies on the smoothing bias vanishing at a rate compatible with the momentum parameter approaching 1, yet no explicit lower bound is derived for the effective strong-convexity modulus that survives the smoothing perturbation; without this, it is unclear for which range of the local strong-convexity constant the linear contraction remains valid.

Authors: We appreciate this observation. Upon closer inspection of the proof in Theorem 4.3, the smoothing bias term is bounded by a multiple of μ_k, and since μ_k = Θ(1/k) and the momentum parameter β_k approaches 1 at rate 1/k, the effective strong-convexity modulus can be bounded below by a positive constant depending on the local strong-convexity parameter μ and the Lipschitz constant. Specifically, for k sufficiently large, the effective modulus is at least μ/2. We will add this explicit lower bound and specify the range of μ for which the linear rate holds in the revised manuscript. revision: yes
Referee: [§5] §5 (two-nonsmooth case): the error bound between the doubly smoothed objective and the original function is stated as O(μ_k), but the constants are not expressed in terms of the individual Lipschitz moduli of the two nonsmooth terms; this omission makes it impossible to verify that the same coupling schedule preserves a positive effective strong-convexity modulus near the optimum.

Authors: The referee correctly identifies that the error bound in Section 5 is given as O(μ_k) without explicit constants. In fact, if the two nonsmooth functions have Lipschitz moduli L1 and L2, the smoothing error is at most (L1 + L2) μ_k. This allows us to show that the effective strong-convexity modulus remains positive near the optimum as long as μ_k is small enough relative to the local strong-convexity constant. We will update the manuscript to express the constants in terms of L1 and L2 and include the verification for the coupling schedule. revision: yes
Referee: [§3.2, coupling rule (3.4)] §3.2, coupling rule (3.4): the specific choice μ_k = Θ(1/k) and β_k = 1 − Θ(1/k) is shown to yield the global O(1/k) rate, but the manuscript does not quantify the perturbation size in the momentum sequence that would destroy the local linear regime, leaving the robustness of the schedule uncharacterized.

Authors: We agree that a characterization of robustness would enhance the result. The local linear convergence in Theorem 4.3 holds provided that the momentum parameter satisfies β_k ≥ 1 - c/k for some c, and μ_k ≤ C/k, with appropriate constants c and C depending on the problem parameters. Perturbations that violate this rate would indeed affect the linear regime. We will add a remark or corollary quantifying the allowable perturbation size in the momentum sequence to ensure the local linear rate is preserved. revision: yes

Circularity Check

0 steps flagged

No significant circularity; rates derived from standard analysis plus explicit assumption

full rationale

The global O(1/k) sublinear rate is presented as the known optimal rate for nonsmooth convex problems and follows from standard smoothing-plus-momentum analysis once the smoothing parameter is driven to zero at a suitable rate. The local linear rate is explicitly conditioned on the additional assumption that the nonsmooth term satisfies a locally strong convexity condition near the optimum; the coupling schedule between smoothing and momentum is a design choice introduced to make the analysis close, not a fitted quantity renamed as a prediction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the derivation chain. The manuscript remains self-contained against external benchmarks for convex optimization convergence rates.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Full manuscript text not provided; audit is therefore limited to claims appearing in the abstract. The central results rest on standard convex-analysis background plus one domain-specific assumption.

axioms (1)

domain assumption The nonsmooth term satisfies a locally strong convexity condition near the solution
Explicitly required for the local linear convergence guarantee stated in the abstract.

pith-pipeline@v0.9.0 · 5514 in / 1311 out tokens · 38180 ms · 2026-05-17T22:40:47.632645+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an adaptive accelerated smoothing technique ... smoothing update rule is coupled with the momentum parameter ... global (optimal) sublinear convergence guarantees of O(1/k) ... local linear rate if the nonsmooth term fulfills a so-called locally strong convexity condition.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

[1]

Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences

Feriel Abboud, Emilie Chouzenoux, Jean-Christophe Pesquet, Jean-Hugues Chenot, and Louis Laborelli. Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences. Journal of Mathematical Imaging and Vision , 59:415–431, 2017

work page 2017
[2]

An admm algorithm for solving ℓ1 regularized mpc

Mariette Annergren, Anders Hansson, and Bo W ahlberg. An admm algorithm for solving ℓ1 regularized mpc. In IEEE Conference on Decision and Control , 2012

work page 2012
[3]

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

Haim A vron, Satyen Kale, Shiva Kasiviswanathan, and Vik as Sindhwani. Eﬃcient and practical stochastic subgradient descent for nuclear norm regularization. preprint available at arXiv:1206.6384, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[4]

Adaptive accelerated composit e minimization

Reza Rahimi Baghbadorani, Sergio Grammatico, and Peyman Mohajerin Esfahani. Adaptive accelerated composit e minimization. arXiv preprint arXiv:2405.03414 , 2024

work page arXiv 2024
[5]

First-order methods in optimization

Amir Beck. First-order methods in optimization . SIAM, 2017

work page 2017
[6]

Mirror descent and nonlinea r projected subgradient methods for convex optimization

Amir Beck and Marc Teboulle. Mirror descent and nonlinea r projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003

work page 2003
[7]

A fast iterative shrinkage- thresholding algorithm for linear inverse problems

Amir Beck and Marc Teboulle. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences , 2(1):183–202, 2009

work page 2009
[8]

Smoothing and ﬁrst order methods: A uniﬁed framework

Amir Beck and Marc Teboulle. Smoothing and ﬁrst order methods: A uniﬁed framework. SIAM Journal on Optimization, 22(2):557–580, 2012

work page 2012
[9]

Quadratic growth conditions and uniqueness of optimal solution to lasso

Yunier Bello-Cruz, Guoyin Li, and Tran Thai An Nghia. Quadratic growth conditions and uniqueness of optimal solution to lasso. Journal of Optimization Theory and Applications, 194(1):167–190, 2022

work page 2022
[10]

Simultaneous analysis of lasso and dantzig selector

Peter J Bickel, Ya’acov Ritov, and Alexandre B Tsybakov . Simultaneous analysis of lasso and dantzig selector. The Annals of Statistic , 37(4):1705–1732, 2009

work page 2009
[11]

A variable smoothing algorithm for solving convex optimization problems

Radu Ioan Boţ and Christopher Hendrich. A variable smoothing algorithm for solving convex optimization problems. An Oﬃcial Journal of the Spanish Society of Statistics and Operations Research , 23:124–150, 2015

work page 2015
[12]

A ﬁrst-order primal - dual algorithm for convex problems with applications to imaging

Antonin Chambolle and Thomas Pock. A ﬁrst-order primal - dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision , 40:120–145, 2011. 12

work page 2011
[13]

Libsvm: a library fo r support vector machines

Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library fo r support vector machines. ACM transactions on intelligent systems and technology (TIST) , 2(3):1–27, 2011

work page 2011
[14]

Nested iterative algorithms for convex constra ined image recovery problems

Caroline Chaux, Jean-Christophe Pesquet, and Nelly Pustelnik. Nested iterative algorithms for convex constra ined image recovery problems. SIAM Journal on Imaging Sciences, 2(2):730–762, 2009

work page 2009
[15]

The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t

Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t. Mathematical Programming, 155(1):57–79, 2016

work page 2016
[16]

A direct formulation for sparse pca usin g semideﬁnite programming

Alexandre dAspremont, Laurent Ghaoui, Michael Jordan , and Gert Lanckriet. A direct formulation for sparse pca usin g semideﬁnite programming. Advances in Neural Information Processing Systems, 2004

work page 2004
[17]

A stochas tic smoothing algorithm for semideﬁnite programming

Alexandre dAspremont and Noureddine Karoui. A stochas tic smoothing algorithm for semideﬁnite programming. SIAM Journal on Optimization , 24(3):1138–1177, 2014

work page 2014
[18]

Eﬃcient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration

Haoying Fu, Michael K Ng, Mila Nikolova, and Jesse L Barlow. Eﬃcient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration. SIAM Journal on Scientiﬁc computing, 27(6):1881–1902, 2006

work page 1902
[19]

The landscape of the proximal point method for nonconvex–nonconcave minimax optimization

Benjamin Grimmer, Haihao Lu, Pratik W orah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, 201(1):373–407, 2023

work page 2023
[20]

Robust transfer principal component analy sis with rank constraints

Yuhong Guo. Robust transfer principal component analy sis with rank constraints. Advances in Neural Information Processing Systems, 2013

work page 2013
[21]

A spectral bundle method for semideﬁnite programming

Christoph Helmberg and Franz Rendl. A spectral bundle method for semideﬁnite programming. SIAM Journal on Optimization, 10(3):673–696, 2000

work page 2000
[22]

Model predictive control for constrained systems with uncertain state-dela ys

Xiao-Bing Hu and W en-Hua Chen. Model predictive control for constrained systems with uncertain state-dela ys. International Journal of Robust and Nonlinear Control , 14(17):1421–1432, 2004

work page 2004
[23]

A simple algorithm f or nuclear norm regularized problems

Martin Jaggi, Marek Sulovsk, et al. A simple algorithm f or nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning , 2010

work page 2010
[24]

Teaching multivariable control using the quadruple-tank process

Karl Henrik Johansson, Alexander Horch, Olle Wijk, and Anders Hansson. Teaching multivariable control using the quadruple-tank process. In 38th IEEE Conference on Decision and Control , volume 1. IEEE, 1999

work page 1999
[25]

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16 , pages 795–

work page 2016
[26]

Activi ty identiﬁcation and local linear convergence of forward– backward-type methods

Jingwei Liang, Jalal Fadili, and Gabriel Peyré. Activi ty identiﬁcation and local linear convergence of forward– backward-type methods. SIAM Journal on Optimization , 27(1):408–437, 2017

work page 2017
[27]

Local linear convergence of forward–backward under partial smoothness

Jingwei Liang, Jalal M Fadili, and Gabriel Peyré. Local linear convergence of forward–backward under partial smoothness . Advances in neural information processing systems , 27, 2014

work page 2014
[28]

Probl em complexity and method eﬃciency in optimization

Arkadij Nemirovskij and David Borisovich Yudin. Probl em complexity and method eﬃciency in optimization. 1983

work page 1983
[29]

A fast and accurate ﬁrst-order method for sparse recovery, 2009

Becker S Bobin J Nesta. A fast and accurate ﬁrst-order method for sparse recovery, 2009

work page 2009
[30]

A method of solving a convex programmin g problem with convergence rate O(1/k 2)

Yurii Nesterov. A method of solving a convex programmin g problem with convergence rate O(1/k 2). In Doklady Akademii Nauk , volume 269, pages 543–547. Russian Academy of Sciences, 1983

work page 1983
[31]

Excessive gap technique in nonsmooth c onvex minimization

Yurii Nesterov. Excessive gap technique in nonsmooth c onvex minimization. SIAM Journal on Optimization , 16(1), 2005

work page 2005
[32]

Smooth minimization of non-smooth functions

Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005

work page 2005
[33]

Smoothing technique and its applicati ons in semideﬁnite optimization

Yurii Nesterov. Smoothing technique and its applicati ons in semideﬁnite optimization. Mathematical Programming, 110(2):245–259, 2007

work page 2007
[34]

Gradient methods for minimizing compo site functions

Yurii Nesterov. Gradient methods for minimizing compo site functions. Mathematical Programming, 140(1):125–161, 2013

work page 2013
[35]

Proximal-based recursive implementation for model-free data-driven fault diagnosis

Jacques Noom, Oleg Soloviev, and Michel Verhaegen. Proximal-based recursive implementation for model-free data-driven fault diagnosis. Automatica, 165:111656, 2024

work page 2024
[36]

PRISMA: PRoximal Iterative SMoothing Algorithm

Francesco Orabona, Andreas Argyriou, and Nathan Srebr o. Prisma: Proximal iterative smoothing algorithm. preprint available at arXiv:1206.2372 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[37]

Sparse control using sum-of-norms regularized mode l predictive control

Sina Khoshfetrat Pakazad, Henrik Ohlsson, and Lennart Ljung. Sparse control using sum-of-norms regularized mode l predictive control. In 52nd IEEE Conference on Decision and Control , pages 5758–5763. IEEE, 2013

work page 2013
[38]

Generalized hessian properties of regularized nonsmooth functions

René A Poliquin and R Tyrrell Rockafellar. Generalized hessian properties of regularized nonsmooth functions. SIAM Journal on Optimization , 6(4):1121–1137, 1996

work page 1996
[39]

Introduction to Optimization

Boris Polyak. Introduction to Optimization . New York, NY, USA: Optimization Software, 1987

work page 1987
[40]

Proximity operato r of a sum of functions; application to depth map estimation

Nelly Pustelnik and Laurent Condat. Proximity operato r of a sum of functions; application to depth map estimation. IEEE Signal Processing Letters , 24(12):1827–1831, 2017

work page 2017
[41]

An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints

Mehmet Fatih Sahin, Armin Eftekhari, Ahmet Alacaoglu, Fabian Latorre, and Volkan Cevher. An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints. Advances in Neural Information Processing Systems, 2019

work page 2019
[42]

A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results

Helga Schramm and Jochem Zowe. A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization, 2(1):121–152, 1992

work page 1992
[43]

Maximum-margin matrix factorization

Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. Advances in Neural Information Processing Systems , 2004

work page 2004
[44]

System identiﬁcation via nuclear norm regularization

Yue Sun, Samet Oymak, and Maryam Fazel. System identiﬁcation via nuclear norm regularization. preprint available at arXiv:2203.16673 , 2022

work page arXiv 2022
[45]

Lasso with non-linear measurements is equivalent to one with linear measurements

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassi bi. Lasso with non-linear measurements is equivalent to one with linear measurements. Advances in Neural Information Processing Systems, 2015

work page 2015
[46]

Adaptive smoothing algorithms for nonsmooth composite convex minimization

Quoc Tran-Dinh. Adaptive smoothing algorithms for nonsmooth composite convex minimization. Computational Optimization and Applications , 66:425–451, 2017

work page 2017
[47]

Dictionary learning based impulse noise removal via l1–l1 minimization

Shanshan W ang, Qiegen Liu, Yong Xia, Pei Dong, Jianhua Luo, Qiu Huang, and David Dagan Feng. Dictionary learning based impulse noise removal via l1–l1 minimization. Signal Processing, 93(9):2696–2708, 2013

work page 2013
[48]

Alternating direction algorithms for ℓ1-problems in compressive sensing

Junfeng Yang and Yin Zhang. Alternating direction algorithms for ℓ1-problems in compressive sensing. SIAM Journal on Scientiﬁc Computing , 33(1):250–278, 2011

work page 2011
[49]

On decomposing the proximal map

Yao-Liang Yu. On decomposing the proximal map. Advances in Neural Information Processing Systems , 2013

work page 2013
[50]

abalone” and “mg

Alp Yurtsever, Olivier Fercoq, and Volkan Cevher. A conditional-gradient-based augmented lagrangi an framework. ICML, 2019. 13 A Supplemental Numerics This section presents additional numerical experiments showcasing the performance of the proposed method in the case of regression problem (27) and MaxCut (30) and also the comparison with the ADMM and Algo...

work page 2019

[1] [1]

Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences

Feriel Abboud, Emilie Chouzenoux, Jean-Christophe Pesquet, Jean-Hugues Chenot, and Louis Laborelli. Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences. Journal of Mathematical Imaging and Vision , 59:415–431, 2017

work page 2017

[2] [2]

An admm algorithm for solving ℓ1 regularized mpc

Mariette Annergren, Anders Hansson, and Bo W ahlberg. An admm algorithm for solving ℓ1 regularized mpc. In IEEE Conference on Decision and Control , 2012

work page 2012

[3] [3]

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

Haim A vron, Satyen Kale, Shiva Kasiviswanathan, and Vik as Sindhwani. Eﬃcient and practical stochastic subgradient descent for nuclear norm regularization. preprint available at arXiv:1206.6384, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[4] [4]

Adaptive accelerated composit e minimization

Reza Rahimi Baghbadorani, Sergio Grammatico, and Peyman Mohajerin Esfahani. Adaptive accelerated composit e minimization. arXiv preprint arXiv:2405.03414 , 2024

work page arXiv 2024

[5] [5]

First-order methods in optimization

Amir Beck. First-order methods in optimization . SIAM, 2017

work page 2017

[6] [6]

Mirror descent and nonlinea r projected subgradient methods for convex optimization

Amir Beck and Marc Teboulle. Mirror descent and nonlinea r projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003

work page 2003

[7] [7]

A fast iterative shrinkage- thresholding algorithm for linear inverse problems

Amir Beck and Marc Teboulle. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences , 2(1):183–202, 2009

work page 2009

[8] [8]

Smoothing and ﬁrst order methods: A uniﬁed framework

Amir Beck and Marc Teboulle. Smoothing and ﬁrst order methods: A uniﬁed framework. SIAM Journal on Optimization, 22(2):557–580, 2012

work page 2012

[9] [9]

Quadratic growth conditions and uniqueness of optimal solution to lasso

Yunier Bello-Cruz, Guoyin Li, and Tran Thai An Nghia. Quadratic growth conditions and uniqueness of optimal solution to lasso. Journal of Optimization Theory and Applications, 194(1):167–190, 2022

work page 2022

[10] [10]

Simultaneous analysis of lasso and dantzig selector

Peter J Bickel, Ya’acov Ritov, and Alexandre B Tsybakov . Simultaneous analysis of lasso and dantzig selector. The Annals of Statistic , 37(4):1705–1732, 2009

work page 2009

[11] [11]

A variable smoothing algorithm for solving convex optimization problems

Radu Ioan Boţ and Christopher Hendrich. A variable smoothing algorithm for solving convex optimization problems. An Oﬃcial Journal of the Spanish Society of Statistics and Operations Research , 23:124–150, 2015

work page 2015

[12] [12]

A ﬁrst-order primal - dual algorithm for convex problems with applications to imaging

Antonin Chambolle and Thomas Pock. A ﬁrst-order primal - dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision , 40:120–145, 2011. 12

work page 2011

[13] [13]

Libsvm: a library fo r support vector machines

Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library fo r support vector machines. ACM transactions on intelligent systems and technology (TIST) , 2(3):1–27, 2011

work page 2011

[14] [14]

Nested iterative algorithms for convex constra ined image recovery problems

Caroline Chaux, Jean-Christophe Pesquet, and Nelly Pustelnik. Nested iterative algorithms for convex constra ined image recovery problems. SIAM Journal on Imaging Sciences, 2(2):730–762, 2009

work page 2009

[15] [15]

The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t

Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t. Mathematical Programming, 155(1):57–79, 2016

work page 2016

[16] [16]

A direct formulation for sparse pca usin g semideﬁnite programming

Alexandre dAspremont, Laurent Ghaoui, Michael Jordan , and Gert Lanckriet. A direct formulation for sparse pca usin g semideﬁnite programming. Advances in Neural Information Processing Systems, 2004

work page 2004

[17] [17]

A stochas tic smoothing algorithm for semideﬁnite programming

Alexandre dAspremont and Noureddine Karoui. A stochas tic smoothing algorithm for semideﬁnite programming. SIAM Journal on Optimization , 24(3):1138–1177, 2014

work page 2014

[18] [18]

Eﬃcient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration

Haoying Fu, Michael K Ng, Mila Nikolova, and Jesse L Barlow. Eﬃcient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration. SIAM Journal on Scientiﬁc computing, 27(6):1881–1902, 2006

work page 1902

[19] [19]

The landscape of the proximal point method for nonconvex–nonconcave minimax optimization

Benjamin Grimmer, Haihao Lu, Pratik W orah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, 201(1):373–407, 2023

work page 2023

[20] [20]

Robust transfer principal component analy sis with rank constraints

Yuhong Guo. Robust transfer principal component analy sis with rank constraints. Advances in Neural Information Processing Systems, 2013

work page 2013

[21] [21]

A spectral bundle method for semideﬁnite programming

Christoph Helmberg and Franz Rendl. A spectral bundle method for semideﬁnite programming. SIAM Journal on Optimization, 10(3):673–696, 2000

work page 2000

[22] [22]

Model predictive control for constrained systems with uncertain state-dela ys

Xiao-Bing Hu and W en-Hua Chen. Model predictive control for constrained systems with uncertain state-dela ys. International Journal of Robust and Nonlinear Control , 14(17):1421–1432, 2004

work page 2004

[23] [23]

A simple algorithm f or nuclear norm regularized problems

Martin Jaggi, Marek Sulovsk, et al. A simple algorithm f or nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning , 2010

work page 2010

[24] [24]

Teaching multivariable control using the quadruple-tank process

Karl Henrik Johansson, Alexander Horch, Olle Wijk, and Anders Hansson. Teaching multivariable control using the quadruple-tank process. In 38th IEEE Conference on Decision and Control , volume 1. IEEE, 1999

work page 1999

[25] [25]

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16 , pages 795–

work page 2016

[26] [26]

Activi ty identiﬁcation and local linear convergence of forward– backward-type methods

Jingwei Liang, Jalal Fadili, and Gabriel Peyré. Activi ty identiﬁcation and local linear convergence of forward– backward-type methods. SIAM Journal on Optimization , 27(1):408–437, 2017

work page 2017

[27] [27]

Local linear convergence of forward–backward under partial smoothness

Jingwei Liang, Jalal M Fadili, and Gabriel Peyré. Local linear convergence of forward–backward under partial smoothness . Advances in neural information processing systems , 27, 2014

work page 2014

[28] [28]

Probl em complexity and method eﬃciency in optimization

Arkadij Nemirovskij and David Borisovich Yudin. Probl em complexity and method eﬃciency in optimization. 1983

work page 1983

[29] [29]

A fast and accurate ﬁrst-order method for sparse recovery, 2009

Becker S Bobin J Nesta. A fast and accurate ﬁrst-order method for sparse recovery, 2009

work page 2009

[30] [30]

A method of solving a convex programmin g problem with convergence rate O(1/k 2)

Yurii Nesterov. A method of solving a convex programmin g problem with convergence rate O(1/k 2). In Doklady Akademii Nauk , volume 269, pages 543–547. Russian Academy of Sciences, 1983

work page 1983

[31] [31]

Excessive gap technique in nonsmooth c onvex minimization

Yurii Nesterov. Excessive gap technique in nonsmooth c onvex minimization. SIAM Journal on Optimization , 16(1), 2005

work page 2005

[32] [32]

Smooth minimization of non-smooth functions

Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005

work page 2005

[33] [33]

Smoothing technique and its applicati ons in semideﬁnite optimization

Yurii Nesterov. Smoothing technique and its applicati ons in semideﬁnite optimization. Mathematical Programming, 110(2):245–259, 2007

work page 2007

[34] [34]

Gradient methods for minimizing compo site functions

Yurii Nesterov. Gradient methods for minimizing compo site functions. Mathematical Programming, 140(1):125–161, 2013

work page 2013

[35] [35]

Proximal-based recursive implementation for model-free data-driven fault diagnosis

Jacques Noom, Oleg Soloviev, and Michel Verhaegen. Proximal-based recursive implementation for model-free data-driven fault diagnosis. Automatica, 165:111656, 2024

work page 2024

[36] [36]

PRISMA: PRoximal Iterative SMoothing Algorithm

Francesco Orabona, Andreas Argyriou, and Nathan Srebr o. Prisma: Proximal iterative smoothing algorithm. preprint available at arXiv:1206.2372 , 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[37] [37]

Sparse control using sum-of-norms regularized mode l predictive control

Sina Khoshfetrat Pakazad, Henrik Ohlsson, and Lennart Ljung. Sparse control using sum-of-norms regularized mode l predictive control. In 52nd IEEE Conference on Decision and Control , pages 5758–5763. IEEE, 2013

work page 2013

[38] [38]

Generalized hessian properties of regularized nonsmooth functions

René A Poliquin and R Tyrrell Rockafellar. Generalized hessian properties of regularized nonsmooth functions. SIAM Journal on Optimization , 6(4):1121–1137, 1996

work page 1996

[39] [39]

Introduction to Optimization

Boris Polyak. Introduction to Optimization . New York, NY, USA: Optimization Software, 1987

work page 1987

[40] [40]

Proximity operato r of a sum of functions; application to depth map estimation

Nelly Pustelnik and Laurent Condat. Proximity operato r of a sum of functions; application to depth map estimation. IEEE Signal Processing Letters , 24(12):1827–1831, 2017

work page 2017

[41] [41]

An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints

Mehmet Fatih Sahin, Armin Eftekhari, Ahmet Alacaoglu, Fabian Latorre, and Volkan Cevher. An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints. Advances in Neural Information Processing Systems, 2019

work page 2019

[42] [42]

A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results

Helga Schramm and Jochem Zowe. A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization, 2(1):121–152, 1992

work page 1992

[43] [43]

Maximum-margin matrix factorization

Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. Advances in Neural Information Processing Systems , 2004

work page 2004

[44] [44]

System identiﬁcation via nuclear norm regularization

Yue Sun, Samet Oymak, and Maryam Fazel. System identiﬁcation via nuclear norm regularization. preprint available at arXiv:2203.16673 , 2022

work page arXiv 2022

[45] [45]

Lasso with non-linear measurements is equivalent to one with linear measurements

Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassi bi. Lasso with non-linear measurements is equivalent to one with linear measurements. Advances in Neural Information Processing Systems, 2015

work page 2015

[46] [46]

Adaptive smoothing algorithms for nonsmooth composite convex minimization

Quoc Tran-Dinh. Adaptive smoothing algorithms for nonsmooth composite convex minimization. Computational Optimization and Applications , 66:425–451, 2017

work page 2017

[47] [47]

Dictionary learning based impulse noise removal via l1–l1 minimization

Shanshan W ang, Qiegen Liu, Yong Xia, Pei Dong, Jianhua Luo, Qiu Huang, and David Dagan Feng. Dictionary learning based impulse noise removal via l1–l1 minimization. Signal Processing, 93(9):2696–2708, 2013

work page 2013

[48] [48]

Alternating direction algorithms for ℓ1-problems in compressive sensing

Junfeng Yang and Yin Zhang. Alternating direction algorithms for ℓ1-problems in compressive sensing. SIAM Journal on Scientiﬁc Computing , 33(1):250–278, 2011

work page 2011

[49] [49]

On decomposing the proximal map

Yao-Liang Yu. On decomposing the proximal map. Advances in Neural Information Processing Systems , 2013

work page 2013

[50] [50]

abalone” and “mg

Alp Yurtsever, Olivier Fercoq, and Volkan Cevher. A conditional-gradient-based augmented lagrangi an framework. ICML, 2019. 13 A Supplemental Numerics This section presents additional numerical experiments showcasing the performance of the proposed method in the case of regression problem (27) and MaxCut (30) and also the comparison with the ADMM and Algo...

work page 2019