pith. sign in

arxiv: 2511.10239 · v3 · submitted 2025-11-13 · 🧮 math.OC

Locally Linear Convergence for Nonsmooth Convex Optimization via Coupled Smoothing and Momentum

Pith reviewed 2026-05-17 22:40 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonsmooth convex optimizationsmoothing methodsmomentum accelerationlocal linear convergencesublinear convergenceLassoMaxCut
0
0 comments X

The pith

Coupling the smoothing parameter with the momentum parameter achieves optimal global sublinear convergence and local linear convergence for nonsmooth convex optimization under a local strong convexity condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an adaptive accelerated smoothing technique for nonsmooth convex optimization problems in which the smoothing update rule is directly coupled to the momentum parameter. The resulting algorithm attains the known optimal global convergence rate of O(1/k) for the class of problems studied. When the nonsmooth term satisfies a locally strong convexity condition near the solution, the same coupling additionally produces a local linear convergence rate. The approach is extended to objectives consisting of the sum of two nonsmooth convex functions and is illustrated on Lasso regression, MaxCut, nuclear-norm minimization, and l1-regularized model predictive control, where practical runs display an initial faster transient followed by the predicted linear regime.

Core claim

By coupling the smoothing parameter decrease with the momentum parameter through a specific adaptive rule, the algorithm attains the provably optimal global rate of O(1/k) for minimizing a nonsmooth convex objective. Under the additional assumption that the nonsmooth term fulfills a locally strong convexity condition near the optimum, the method further guarantees local linear convergence. The same coupling extends to the sum of two nonsmooth convex functions and explains the observed practical two-phase behavior of faster initial convergence followed by asymptotic linear convergence.

What carries the argument

The adaptive accelerated smoothing technique whose smoothing update rule is coupled to the momentum parameter.

Load-bearing premise

The nonsmooth term must satisfy a locally strong convexity condition near the optimum and the smoothing and momentum parameters must obey a precise coupling rule.

What would settle it

A numerical example in which a nonsmooth convex function with local strong convexity near the optimum produces only sublinear convergence under the proposed coupled scheme would disprove the local linear rate claim.

Figures

Figures reproduced from arXiv: 2511.10239 by Peyman Mohajerin Esfahani, Reza Rahimi Baghbadorani, Sergio Grammatico.

Figure 1
Figure 1. Figure 1: Convergence rate of µk. 5 Numerical experiments We demonstrate the performance of Algorithm 1 (with c = 0) on four popular classes of problems studied in the 9 [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Numerical results for four problem classes. The first [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
read the original abstract

We propose an adaptive accelerated smoothing technique for a nonsmooth convex optimization problem where the smoothing update rule is coupled with the momentum parameter. We also extend the setting to the case where the objective function is the sum of two nonsmooth functions. With regard to convergence rate, we provide the global (optimal) sublinear convergence guarantees of O(1/k), which is known to be provably optimal for the studied class of functions, along with a local linear rate if the nonsmooth term fulfills a so-call locally strong convexity condition. We validate the performance of our algorithm on several problem classes, including regression with the l1-norm (the Lasso problem), sparse semidefinite programming (the MaxCut problem), Nuclear norm minimization with application in model free fault diagnosis, and l_1-regularized model predictive control to showcase the benefits of the coupling. An interesting observation is that although our global convergence result guarantees O(1/k) convergence, we consistently observe a practical transient convergence rate of O(1/k^2), followed by asymptotic linear convergence as anticipated by the theoretical result. This two-phase behavior can also be explained in view of the proposed smoothing rule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an adaptive accelerated smoothing method for nonsmooth convex optimization in which the smoothing parameter is explicitly coupled to the momentum sequence. It establishes global O(1/k) sublinear convergence (optimal for the problem class) and, under an additional locally strong convexity assumption on the nonsmooth term, a local linear rate. The framework is extended to the sum of two nonsmooth convex functions. Numerical results on Lasso, MaxCut, nuclear-norm minimization, and l1-regularized MPC illustrate practical performance, including an observed transient O(1/k^2) phase followed by linear convergence.

Significance. If the local linear-rate analysis is complete and the coupling schedule is shown to be robust, the work would provide a meaningful advance by delivering both the optimal global rate and a faster local rate for nonsmooth convex problems, together with a plausible explanation for the commonly observed two-phase practical behavior. The extension to two nonsmooth terms and the concrete algorithmic coupling are additional strengths.

major comments (3)
  1. [§4, Theorem 4.3] §4, Theorem 4.3 (local linear rate): the proof relies on the smoothing bias vanishing at a rate compatible with the momentum parameter approaching 1, yet no explicit lower bound is derived for the effective strong-convexity modulus that survives the smoothing perturbation; without this, it is unclear for which range of the local strong-convexity constant the linear contraction remains valid.
  2. [§5] §5 (two-nonsmooth case): the error bound between the doubly smoothed objective and the original function is stated as O(μ_k), but the constants are not expressed in terms of the individual Lipschitz moduli of the two nonsmooth terms; this omission makes it impossible to verify that the same coupling schedule preserves a positive effective strong-convexity modulus near the optimum.
  3. [§3.2, coupling rule (3.4)] §3.2, coupling rule (3.4): the specific choice μ_k = Θ(1/k) and β_k = 1 − Θ(1/k) is shown to yield the global O(1/k) rate, but the manuscript does not quantify the perturbation size in the momentum sequence that would destroy the local linear regime, leaving the robustness of the schedule uncharacterized.
minor comments (2)
  1. [Figure 2] The caption of Figure 2 should explicitly state the vertical axis (function-value gap or distance to optimum) and the horizontal axis scaling (iteration count or CPU time).
  2. [Introduction] A short paragraph comparing the proposed coupling to existing adaptive-smoothing schedules (e.g., Nesterov 2005, Beck & Teboulle) would help readers situate the novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We address each of the major comments in detail below and will revise the manuscript accordingly to incorporate the clarifications.

read point-by-point responses
  1. Referee: [§4, Theorem 4.3] §4, Theorem 4.3 (local linear rate): the proof relies on the smoothing bias vanishing at a rate compatible with the momentum parameter approaching 1, yet no explicit lower bound is derived for the effective strong-convexity modulus that survives the smoothing perturbation; without this, it is unclear for which range of the local strong-convexity constant the linear contraction remains valid.

    Authors: We appreciate this observation. Upon closer inspection of the proof in Theorem 4.3, the smoothing bias term is bounded by a multiple of μ_k, and since μ_k = Θ(1/k) and the momentum parameter β_k approaches 1 at rate 1/k, the effective strong-convexity modulus can be bounded below by a positive constant depending on the local strong-convexity parameter μ and the Lipschitz constant. Specifically, for k sufficiently large, the effective modulus is at least μ/2. We will add this explicit lower bound and specify the range of μ for which the linear rate holds in the revised manuscript. revision: yes

  2. Referee: [§5] §5 (two-nonsmooth case): the error bound between the doubly smoothed objective and the original function is stated as O(μ_k), but the constants are not expressed in terms of the individual Lipschitz moduli of the two nonsmooth terms; this omission makes it impossible to verify that the same coupling schedule preserves a positive effective strong-convexity modulus near the optimum.

    Authors: The referee correctly identifies that the error bound in Section 5 is given as O(μ_k) without explicit constants. In fact, if the two nonsmooth functions have Lipschitz moduli L1 and L2, the smoothing error is at most (L1 + L2) μ_k. This allows us to show that the effective strong-convexity modulus remains positive near the optimum as long as μ_k is small enough relative to the local strong-convexity constant. We will update the manuscript to express the constants in terms of L1 and L2 and include the verification for the coupling schedule. revision: yes

  3. Referee: [§3.2, coupling rule (3.4)] §3.2, coupling rule (3.4): the specific choice μ_k = Θ(1/k) and β_k = 1 − Θ(1/k) is shown to yield the global O(1/k) rate, but the manuscript does not quantify the perturbation size in the momentum sequence that would destroy the local linear regime, leaving the robustness of the schedule uncharacterized.

    Authors: We agree that a characterization of robustness would enhance the result. The local linear convergence in Theorem 4.3 holds provided that the momentum parameter satisfies β_k ≥ 1 - c/k for some c, and μ_k ≤ C/k, with appropriate constants c and C depending on the problem parameters. Perturbations that violate this rate would indeed affect the linear regime. We will add a remark or corollary quantifying the allowable perturbation size in the momentum sequence to ensure the local linear rate is preserved. revision: yes

Circularity Check

0 steps flagged

No significant circularity; rates derived from standard analysis plus explicit assumption

full rationale

The global O(1/k) sublinear rate is presented as the known optimal rate for nonsmooth convex problems and follows from standard smoothing-plus-momentum analysis once the smoothing parameter is driven to zero at a suitable rate. The local linear rate is explicitly conditioned on the additional assumption that the nonsmooth term satisfies a locally strong convexity condition near the optimum; the coupling schedule between smoothing and momentum is a design choice introduced to make the analysis close, not a fitted quantity renamed as a prediction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the derivation chain. The manuscript remains self-contained against external benchmarks for convex optimization convergence rates.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Full manuscript text not provided; audit is therefore limited to claims appearing in the abstract. The central results rest on standard convex-analysis background plus one domain-specific assumption.

axioms (1)
  • domain assumption The nonsmooth term satisfies a locally strong convexity condition near the solution
    Explicitly required for the local linear convergence guarantee stated in the abstract.

pith-pipeline@v0.9.0 · 5514 in / 1311 out tokens · 38180 ms · 2026-05-17T22:40:47.632645+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We propose an adaptive accelerated smoothing technique ... smoothing update rule is coupled with the momentum parameter ... global (optimal) sublinear convergence guarantees of O(1/k) ... local linear rate if the nonsmooth term fulfills a so-called locally strong convexity condition.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

  1. [1]

    Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences

    Feriel Abboud, Emilie Chouzenoux, Jean-Christophe Pesquet, Jean-Hugues Chenot, and Louis Laborelli. Dual block-coordinate forward-backward algorithm with application to deconvolution and deinterlacing of video sequences. Journal of Mathematical Imaging and Vision , 59:415–431, 2017

  2. [2]

    An admm algorithm for solving ℓ1 regularized mpc

    Mariette Annergren, Anders Hansson, and Bo W ahlberg. An admm algorithm for solving ℓ1 regularized mpc. In IEEE Conference on Decision and Control , 2012

  3. [3]

    Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

    Haim A vron, Satyen Kale, Shiva Kasiviswanathan, and Vik as Sindhwani. Efficient and practical stochastic subgradient descent for nuclear norm regularization. preprint available at arXiv:1206.6384, 2012

  4. [4]

    Adaptive accelerated composit e minimization

    Reza Rahimi Baghbadorani, Sergio Grammatico, and Peyman Mohajerin Esfahani. Adaptive accelerated composit e minimization. arXiv preprint arXiv:2405.03414 , 2024

  5. [5]

    First-order methods in optimization

    Amir Beck. First-order methods in optimization . SIAM, 2017

  6. [6]

    Mirror descent and nonlinea r projected subgradient methods for convex optimization

    Amir Beck and Marc Teboulle. Mirror descent and nonlinea r projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003

  7. [7]

    A fast iterative shrinkage- thresholding algorithm for linear inverse problems

    Amir Beck and Marc Teboulle. A fast iterative shrinkage- thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences , 2(1):183–202, 2009

  8. [8]

    Smoothing and first order methods: A unified framework

    Amir Beck and Marc Teboulle. Smoothing and first order methods: A unified framework. SIAM Journal on Optimization, 22(2):557–580, 2012

  9. [9]

    Quadratic growth conditions and uniqueness of optimal solution to lasso

    Yunier Bello-Cruz, Guoyin Li, and Tran Thai An Nghia. Quadratic growth conditions and uniqueness of optimal solution to lasso. Journal of Optimization Theory and Applications, 194(1):167–190, 2022

  10. [10]

    Simultaneous analysis of lasso and dantzig selector

    Peter J Bickel, Ya’acov Ritov, and Alexandre B Tsybakov . Simultaneous analysis of lasso and dantzig selector. The Annals of Statistic , 37(4):1705–1732, 2009

  11. [11]

    A variable smoothing algorithm for solving convex optimization problems

    Radu Ioan Boţ and Christopher Hendrich. A variable smoothing algorithm for solving convex optimization problems. An Official Journal of the Spanish Society of Statistics and Operations Research , 23:124–150, 2015

  12. [12]

    A first-order primal - dual algorithm for convex problems with applications to imaging

    Antonin Chambolle and Thomas Pock. A first-order primal - dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision , 40:120–145, 2011. 12

  13. [13]

    Libsvm: a library fo r support vector machines

    Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library fo r support vector machines. ACM transactions on intelligent systems and technology (TIST) , 2(3):1–27, 2011

  14. [14]

    Nested iterative algorithms for convex constra ined image recovery problems

    Caroline Chaux, Jean-Christophe Pesquet, and Nelly Pustelnik. Nested iterative algorithms for convex constra ined image recovery problems. SIAM Journal on Imaging Sciences, 2(2):730–762, 2009

  15. [15]

    The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t

    Caihua Chen, Bingsheng He, Yinyu Ye, and Xiaoming Yuan. The direct extension of admm for multi-block convex minimization problems is not necessarily convergen t. Mathematical Programming, 155(1):57–79, 2016

  16. [16]

    A direct formulation for sparse pca usin g semidefinite programming

    Alexandre dAspremont, Laurent Ghaoui, Michael Jordan , and Gert Lanckriet. A direct formulation for sparse pca usin g semidefinite programming. Advances in Neural Information Processing Systems, 2004

  17. [17]

    A stochas tic smoothing algorithm for semidefinite programming

    Alexandre dAspremont and Noureddine Karoui. A stochas tic smoothing algorithm for semidefinite programming. SIAM Journal on Optimization , 24(3):1138–1177, 2014

  18. [18]

    Efficient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration

    Haoying Fu, Michael K Ng, Mila Nikolova, and Jesse L Barlow. Efficient minimization methods of mixed l2-l1 and l1-l1 norms for image restoration. SIAM Journal on Scientific computing, 27(6):1881–1902, 2006

  19. [19]

    The landscape of the proximal point method for nonconvex–nonconcave minimax optimization

    Benjamin Grimmer, Haihao Lu, Pratik W orah, and Vahab Mirrokni. The landscape of the proximal point method for nonconvex–nonconcave minimax optimization. Mathematical Programming, 201(1):373–407, 2023

  20. [20]

    Robust transfer principal component analy sis with rank constraints

    Yuhong Guo. Robust transfer principal component analy sis with rank constraints. Advances in Neural Information Processing Systems, 2013

  21. [21]

    A spectral bundle method for semidefinite programming

    Christoph Helmberg and Franz Rendl. A spectral bundle method for semidefinite programming. SIAM Journal on Optimization, 10(3):673–696, 2000

  22. [22]

    Model predictive control for constrained systems with uncertain state-dela ys

    Xiao-Bing Hu and W en-Hua Chen. Model predictive control for constrained systems with uncertain state-dela ys. International Journal of Robust and Nonlinear Control , 14(17):1421–1432, 2004

  23. [23]

    A simple algorithm f or nuclear norm regularized problems

    Martin Jaggi, Marek Sulovsk, et al. A simple algorithm f or nuclear norm regularized problems. In Proceedings of the 27th International Conference on Machine Learning , 2010

  24. [24]

    Teaching multivariable control using the quadruple-tank process

    Karl Henrik Johansson, Alexander Horch, Olle Wijk, and Anders Hansson. Teaching multivariable control using the quadruple-tank process. In 38th IEEE Conference on Decision and Control , volume 1. IEEE, 1999

  25. [25]

    Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

    Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16 , pages 795–

  26. [26]

    Activi ty identification and local linear convergence of forward– backward-type methods

    Jingwei Liang, Jalal Fadili, and Gabriel Peyré. Activi ty identification and local linear convergence of forward– backward-type methods. SIAM Journal on Optimization , 27(1):408–437, 2017

  27. [27]

    Local linear convergence of forward–backward under partial smoothness

    Jingwei Liang, Jalal M Fadili, and Gabriel Peyré. Local linear convergence of forward–backward under partial smoothness . Advances in neural information processing systems , 27, 2014

  28. [28]

    Probl em complexity and method efficiency in optimization

    Arkadij Nemirovskij and David Borisovich Yudin. Probl em complexity and method efficiency in optimization. 1983

  29. [29]

    A fast and accurate first-order method for sparse recovery, 2009

    Becker S Bobin J Nesta. A fast and accurate first-order method for sparse recovery, 2009

  30. [30]

    A method of solving a convex programmin g problem with convergence rate O(1/k 2)

    Yurii Nesterov. A method of solving a convex programmin g problem with convergence rate O(1/k 2). In Doklady Akademii Nauk , volume 269, pages 543–547. Russian Academy of Sciences, 1983

  31. [31]

    Excessive gap technique in nonsmooth c onvex minimization

    Yurii Nesterov. Excessive gap technique in nonsmooth c onvex minimization. SIAM Journal on Optimization , 16(1), 2005

  32. [32]

    Smooth minimization of non-smooth functions

    Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005

  33. [33]

    Smoothing technique and its applicati ons in semidefinite optimization

    Yurii Nesterov. Smoothing technique and its applicati ons in semidefinite optimization. Mathematical Programming, 110(2):245–259, 2007

  34. [34]

    Gradient methods for minimizing compo site functions

    Yurii Nesterov. Gradient methods for minimizing compo site functions. Mathematical Programming, 140(1):125–161, 2013

  35. [35]

    Proximal-based recursive implementation for model-free data-driven fault diagnosis

    Jacques Noom, Oleg Soloviev, and Michel Verhaegen. Proximal-based recursive implementation for model-free data-driven fault diagnosis. Automatica, 165:111656, 2024

  36. [36]

    PRISMA: PRoximal Iterative SMoothing Algorithm

    Francesco Orabona, Andreas Argyriou, and Nathan Srebr o. Prisma: Proximal iterative smoothing algorithm. preprint available at arXiv:1206.2372 , 2012

  37. [37]

    Sparse control using sum-of-norms regularized mode l predictive control

    Sina Khoshfetrat Pakazad, Henrik Ohlsson, and Lennart Ljung. Sparse control using sum-of-norms regularized mode l predictive control. In 52nd IEEE Conference on Decision and Control , pages 5758–5763. IEEE, 2013

  38. [38]

    Generalized hessian properties of regularized nonsmooth functions

    René A Poliquin and R Tyrrell Rockafellar. Generalized hessian properties of regularized nonsmooth functions. SIAM Journal on Optimization , 6(4):1121–1137, 1996

  39. [39]

    Introduction to Optimization

    Boris Polyak. Introduction to Optimization . New York, NY, USA: Optimization Software, 1987

  40. [40]

    Proximity operato r of a sum of functions; application to depth map estimation

    Nelly Pustelnik and Laurent Condat. Proximity operato r of a sum of functions; application to depth map estimation. IEEE Signal Processing Letters , 24(12):1827–1831, 2017

  41. [41]

    An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints

    Mehmet Fatih Sahin, Armin Eftekhari, Ahmet Alacaoglu, Fabian Latorre, and Volkan Cevher. An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints. Advances in Neural Information Processing Systems, 2019

  42. [42]

    A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results

    Helga Schramm and Jochem Zowe. A version of the bundle idea for minimizing a nonsmooth function: Conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization, 2(1):121–152, 1992

  43. [43]

    Maximum-margin matrix factorization

    Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum-margin matrix factorization. Advances in Neural Information Processing Systems , 2004

  44. [44]

    System identification via nuclear norm regularization

    Yue Sun, Samet Oymak, and Maryam Fazel. System identification via nuclear norm regularization. preprint available at arXiv:2203.16673 , 2022

  45. [45]

    Lasso with non-linear measurements is equivalent to one with linear measurements

    Christos Thrampoulidis, Ehsan Abbasi, and Babak Hassi bi. Lasso with non-linear measurements is equivalent to one with linear measurements. Advances in Neural Information Processing Systems, 2015

  46. [46]

    Adaptive smoothing algorithms for nonsmooth composite convex minimization

    Quoc Tran-Dinh. Adaptive smoothing algorithms for nonsmooth composite convex minimization. Computational Optimization and Applications , 66:425–451, 2017

  47. [47]

    Dictionary learning based impulse noise removal via l1–l1 minimization

    Shanshan W ang, Qiegen Liu, Yong Xia, Pei Dong, Jianhua Luo, Qiu Huang, and David Dagan Feng. Dictionary learning based impulse noise removal via l1–l1 minimization. Signal Processing, 93(9):2696–2708, 2013

  48. [48]

    Alternating direction algorithms for ℓ1-problems in compressive sensing

    Junfeng Yang and Yin Zhang. Alternating direction algorithms for ℓ1-problems in compressive sensing. SIAM Journal on Scientific Computing , 33(1):250–278, 2011

  49. [49]

    On decomposing the proximal map

    Yao-Liang Yu. On decomposing the proximal map. Advances in Neural Information Processing Systems , 2013

  50. [50]

    abalone” and “mg

    Alp Yurtsever, Olivier Fercoq, and Volkan Cevher. A conditional-gradient-based augmented lagrangi an framework. ICML, 2019. 13 A Supplemental Numerics This section presents additional numerical experiments showcasing the performance of the proposed method in the case of regression problem (27) and MaxCut (30) and also the comparison with the ADMM and Algo...