A Nesterov-Accelerated Primal-Dual Splitting Algorithm for Convex Nonsmooth Optimization

Abdurakhmon Sadiev; Laurent Condat; Peter Richt\'arik

arxiv: 2604.09245 · v1 · submitted 2026-04-10 · 🧮 math.OC

A Nesterov-Accelerated Primal-Dual Splitting Algorithm for Convex Nonsmooth Optimization

Laurent Condat , Abdurakhmon Sadiev , Peter Richt\'arik This is my paper

Pith reviewed 2026-05-10 17:08 UTC · model grok-4.3

classification 🧮 math.OC

keywords primal-dual methodsNesterov accelerationproximal splittingconvex optimizationconvergence ratesLyapunov analysissaddle-point problemsforward-backward algorithms

0 comments

The pith

Nesterov acceleration integrates into primal-dual splitting to achieve optimal O(1/t^2) rates for convex problems with a quadratic term.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops the Accelerated Proximal Alternating Predictor-Corrector algorithm to add Nesterov momentum to primal-dual forward-backward methods for minimizing a smooth function plus a quadratic plus a nonsmooth term composed with a linear map. The usual barrier to acceleration in primal-dual space is the rotational coupling between variables, but the method stabilizes the momentum steps by using strong convexity in the dual problem. A single Lyapunov analysis then delivers the optimal sublinear rate across three standard regimes where that dual convexity appears, plus faster linear rates once the quadratic coefficient is positive. The iterates are also shown to converge weakly to a saddle point. These guarantees matter for any application whose structure matches the assumed form, because faster reliable solvers reduce the number of iterations needed to reach a given accuracy.

Core claim

We propose the Accelerated Proximal Alternating Predictor-Corrector algorithm (APAPC) for the structured problem min f(x) + (μ_g/2)‖x‖² + h(Kx). Nesterov momentum is inserted into the primal-dual scheme by exploiting dual strong convexity to stabilize the accelerated primal updates. A unified Lyapunov framework establishes optimal O(1/t²) sublinear rates and accelerated linear rates when μ_g > 0, across the three regimes of dual strong convexity given by smoothness of h, lower boundedness of K*, and linear constraints. Weak convergence of the iterates to a saddle point follows from known results on accelerated gradient descent.

What carries the argument

The APAPC predictor-corrector scheme with Nesterov momentum on the primal variable, whose stability and rates are certified by a unified Lyapunov function that exploits dual strong convexity.

If this is right

Optimal O(1/t²) convergence is guaranteed whenever the nonsmooth term h is smooth.
The same rate holds when the adjoint operator K* is bounded below on its range.
Linearly constrained convex problems inherit the accelerated rates directly.
When the quadratic coefficient μ_g is positive, the method additionally delivers accelerated linear convergence.
The primal-dual sequence converges weakly to a saddle-point solution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Lyapunov construction may allow Nesterov momentum to be added to other primal-dual splittings once a suitable dual-convexity condition is identified.
The three explicit regimes give practitioners a concrete checklist for verifying whether a given application will receive the accelerated guarantees.
The reliance on recent accelerated-gradient theory for the weak-convergence part suggests that further refinements of those gradient results could immediately tighten the saddle-point guarantees here.

Load-bearing premise

The dual problem must be strongly convex in one of the three listed regimes and the middle term must be exactly quadratic; without dual strong convexity the acceleration and rate claims do not hold.

What would settle it

Numerical runs on a convex instance where none of the three dual-strong-convexity regimes holds and the observed decay is slower than O(1/t²) would falsify the main rate result.

read the original abstract

We investigate the integration of Nesterov-type acceleration into primal-dual methods for structured convex optimization. While proximal splitting algorithms efficiently handle composite problems of the form $\min_x f(x)+g(x)+h(Kx)$, accelerating their convergence with respect to the smooth term $f$ is notoriously challenging due to the rotational dynamics in the primal-dual space. In this paper, we overcome this barrier by proposing the Accelerated Proximal Alternating Predictor-Corrector algorithm (APAPC), focusing on the setting where $g(x)=\frac{\mu_g}{2}\|x\|^2$. Our analysis reveals that Nesterov momentum can be seamlessly integrated into a primal-dual forward-backward scheme by exploiting the strong convexity of the dual problem to stabilize the accelerated primal updates. Using a unified Lyapunov framework, we establish optimal $O(1/t^2)$ sublinear convergence rates, as well as accelerated linear convergence when $\mu_g > 0$, across three regimes of dual strong convexity: (i) when $h$ is smooth, (ii) when the linear operator $K^*$ is bounded below, and (iii) for linearly constrained optimization. Furthermore, leveraging recent results on accelerated gradient descent, we characterize the weak convergence of the primal-dual iterates to a saddle-point solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new primal-dual algorithm APAPC that adds Nesterov momentum when g is quadratic and the dual is strongly convex in one of three regimes, with a unified Lyapunov analysis for the rates.

read the letter

The main takeaway is that this work proposes APAPC, a primal-dual method that integrates Nesterov acceleration into a predictor-corrector scheme for min f(x) + g(x) + h(Kx) when g is exactly the quadratic (μ_g/2)||x||². It claims O(1/t²) sublinear rates plus accelerated linear rates under dual strong convexity, handled in three regimes: smooth h, K* bounded below, or linear constraints. The analysis uses a single Lyapunov function that stabilizes the momentum via the dual strong convexity, plus recent accelerated gradient descent results for weak convergence of the iterates.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes the Accelerated Proximal Alternating Predictor-Corrector (APAPC) algorithm for convex optimization problems of the form min_x f(x) + g(x) + h(Kx) with the explicit restriction g(x) = (μ_g/2)||x||^2. It integrates Nesterov momentum into a primal-dual forward-backward scheme by exploiting dual strong convexity, and uses a unified Lyapunov framework to prove optimal O(1/t^2) sublinear convergence rates together with accelerated linear rates (when μ_g > 0) in three regimes: (i) h smooth, (ii) K* bounded below, and (iii) linearly constrained problems. Weak convergence of the primal-dual iterates to a saddle point is also established by invoking recent results on accelerated gradient descent.

Significance. If the Lyapunov analysis is complete and the step-size conditions are free of post-hoc restrictions, the work would be a useful addition to the literature on accelerated primal-dual methods. The explicit handling of rotational dynamics via dual strong convexity, the unified treatment across the three regimes, and the clean restriction to quadratic g are all strengths. The paper correctly delimits its scope rather than claiming acceleration for general nonsmooth g.

minor comments (3)

[§2] The three regimes of dual strong convexity are listed in the abstract and introduction but would benefit from a compact summary table (or explicit parameter conditions) early in §2 so that readers can immediately compare the assumptions.
All step-size restrictions and the precise definition of the dual strong-convexity parameters should appear explicitly in the statements of the main convergence theorems (rather than only inside the Lyapunov derivations).
[§5] The weak-convergence argument invokes 'recent results on accelerated gradient descent'; a specific citation and a one-sentence reminder of the invoked theorem would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the positive overall assessment. We are encouraged by the recognition of the paper's strengths, including the explicit treatment of rotational dynamics through dual strong convexity, the unified Lyapunov framework across the three regimes, and the deliberate restriction to quadratic g. The recommendation for minor revision is noted, and we address the key points raised in the significance statement below.

read point-by-point responses

Referee: If the Lyapunov analysis is complete and the step-size conditions are free of post-hoc restrictions, the work would be a useful addition to the literature on accelerated primal-dual methods.

Authors: The Lyapunov analysis is complete as presented: it is constructed explicitly in Section 3 and used uniformly to derive both the O(1/t^2) sublinear rates (Theorems 1--3) and the accelerated linear rates (Theorems 4--6) without additional assumptions or case-by-case adjustments. The step-size conditions (e.g., the requirements on alpha, beta, and gamma in Algorithm 1 and the subsequent theorems) are stated a priori in terms of the problem parameters (mu_g, L_h, sigma_K, etc.) and do not involve any post-hoc tuning or restrictions that depend on the generated iterates. We are happy to add a short clarifying remark in the revised version if the referee finds the current presentation insufficiently explicit on this point. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper restricts to the explicit assumption g(x) = (μ_g/2)‖x‖² and derives O(1/t²) and linear rates only inside the three regimes where dual strong convexity holds, using a standard Lyapunov construction that directly exploits that convexity to control the Nesterov momentum term. No quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and the central convergence claims follow from the Lyapunov decrease inequality under the stated assumptions rather than reducing to them by construction. The reference to recent accelerated-gradient results is external to the present derivation and does not carry the load-bearing step; the analysis remains self-contained once the scope (quadratic g and dual strong convexity) is accepted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; the paper relies on standard convex-analysis assumptions but introduces no new free parameters or invented entities visible at this level.

axioms (2)

domain assumption f, g, h are convex; g is μ_g-strongly convex quadratic
Stated in the problem setup and used to obtain dual strong convexity.
standard math K is a linear operator
Standard in the composite optimization template.

pith-pipeline@v0.9.0 · 5541 in / 1294 out tokens · 35781 ms · 2026-05-10T17:08:19.576845+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

[1]

S. A. Alghunaim, E. K. Ryu, K. Yuan, and A. H. Sayed. Decentralized proximal gradient algorithms with linear convergence rates. IEEE Trans. Automatic Control, 66 0 (6): 0 2787--2794, June 2021

work page 2021
[2]

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Z. Allen-Zhu and L. Orecchia. Linear coupling: An ultimate unification of gradient and mirror descent. preprint arXiv:1407.1537, 2014

work page Pith review arXiv 2014
[3]

Attouch and J

H. Attouch and J. Peypouquet. The rate of convergence of N esterov's accelerated forward-backward method is actually faster than 1/k^2 . SIAM J. Optim., 26 0 (3): 0 1824--1834, 2016

work page 2016
[4]

Aujol, C

J.-F. Aujol, C. Dossal, H. Labarri\`ere, and A. Rondepierre. FISTA restart using an automatic estimation of the growth parameter. Journal of Optimization Theory and Applications, 206: 0 51, 2025

work page 2025
[5]

Auslender and M

A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim., 16 0 (3): 0 697--725, 2006

work page 2006
[6]

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn., 4 0 (1): 0 1--106, 2012

work page 2012
[7]

H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, 2nd edition, 2017

work page 2017
[8]

H. H. Bauschke and W. M. Moursi. Understanding FISTA 's weak convergence: A step-by-step introduction to the 2025 milestone. preprint arXiv:2601.15398, 2026

work page arXiv 2025
[9]

A. Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization. SIAM, 2017

work page 2017
[10]

Beck and M

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2 0 (1): 0 183--202, 2009

work page 2009
[11]

R. I. Bo t , E. R. Csetnek, and C. Hendrich. Recent developments on primal--dual splitting methods with applications to convex minimization. In P. M. Pardalos and T. M. Rassias, editors, Mathematics Without Boundaries: Surveys in Interdisciplinary Research, pages 57--99. Springer New York, 2014

work page 2014
[12]

R. I. Bo t , E. Chenchene, E. R. Csetnek, and D. A. Hulett. Accelerating diagonal methods for bilevel optimization: Unified convergence via continuous-time dynamics. preprint arXiv:2505.14389, 2025 a

work page arXiv 2025
[13]

R. I. Bo t , J. Fadili, and D.-K. Nguyen. The iterates of Nesterov’s accelerated algorithm converge in the critical regimes. preprint arXiv:2510.22715, Oct. 2025 b

work page arXiv 2025
[14]

Bottou, F

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM review, 60 0 (2): 0 223--311, 2018

work page 2018
[15]

S. Bubeck. Convex optimization: A lgorithms and complexity. Found. Trends Mach. Learn., 8 0 (3--4): 0 231--357, 2015

work page 2015
[16]

Cevher, S

V. Cevher, S. Becker, and M. Schmidt. Convex optimization for big data: S calable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. , 31 0 (5): 0 32--43, 2014

work page 2014
[17]

Fast Iterative Shrinkage/Thresholding Algorithm

A. Chambolle and C. Dossal. On the convergence of the iterates of the "Fast Iterative Shrinkage/Thresholding Algorithm" . J. Optim. Theory Appl., 166: 0 968--982, 2015

work page 2015
[18]

Chambolle and T

A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40 0 (1): 0 120--145, May 2011

work page 2011
[19]

Chambolle and T

A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numerica, 25: 0 161--319, 2016 a

work page 2016
[20]

Chambolle and T

A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal--dual algorithm. Math. Program., 159 0 (1--2): 0 253--287, Sept. 2016 b

work page 2016
[21]

Chambolle, M

A. Chambolle, M. J. Ehrhardt, P. Richt\'arik , and C.-B. Sch\"onlieb . Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim., 28 0 (4): 0 2783--2808, 2018

work page 2018
[22]

P. Chen, J. Huang, and X. Zhang. A primal--dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Problems, 29 0 (2), 2013

work page 2013
[23]

Y. Chen, G. Lan, and Y. Ouyang. Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim., 24: 0 1779--1814, 2014

work page 2014
[24]

P. L. Combettes. The geometry of monotone operator splitting methods. Acta Numerica, 33: 0 487--632, 2024

work page 2024
[25]

P. L. Combettes and J. I. Madariaga. Almost-surely convergent randomly activated monotone operator splitting methods. SIAM J. Imaging Sci., 18 0 (4): 0 2177--2205, 2025

work page 2025
[26]

P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In H. H. Bauschke, R. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer-Verlag, New York, 2010

work page 2010
[27]

P. L. Combettes and J.-C. Pesquet. Fixed point strategies in data science. IEEE Trans. Signal Process. , 69: 0 3878--3905, 2021

work page 2021
[28]

P. L. Combettes, L. Condat, J.-C. Pesquet, and B. C. V\ u . A forward--backward view of some primal--dual optimization methods in image recovery. In Proc.\ of IEEE ICIP , Paris, France, Oct. 2014

work page 2014
[29]

L. Condat. A primal-dual splitting method for convex optimization involving L ipschitzian, proximable and linear composite terms. J. Optim. Theory Appl., 158 0 (2): 0 460--479, 2013

work page 2013
[30]

Condat and P

L. Condat and P. Richt \'a rik. RandProx : P rimal-dual optimization algorithms with randomized proximal updates. In Proc.\ of Int. Conf. Learning Representations (ICLR) , 2023

work page 2023
[31]

Condat, G

L. Condat, G. Malinovsky, and P. Richt \'a rik. Distributed proximal splitting algorithms with rates and acceleration. Frontiers in Signal Processing, 1, Jan. 2022

work page 2022
[32]

Condat, D

L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi. Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists. SIAM Review, 65 0 (2): 0 375--435, 2023

work page 2023
[33]

Condat, E

L. Condat, E. Gasanov, and P. Richt\'arik . The stochastic multi-proximal method for nonsmooth optimization. preprint arXiv:2505.12409, 2025

work page arXiv 2025
[34]

Condat, I

L. Condat, I. Agarsk \'y , and P. Richt \'a rik. CompressedScaffnew : The first theoretical double acceleration of communication from local training and compression in distributed optimization. Optimization, 2026

work page 2026
[35]

Davis and W

D. Davis and W. Yin. A three-operator splitting scheme and its optimization applications. Set-Val. Var. Anal., 25: 0 829--858, 2017

work page 2017
[36]

Dirren, M

C. Dirren, M. Bianchi, P. D. Grontas, J. Lygeros, and F. D\"orfler . Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach. In Proc.\ of Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS) , volume PMLR 258, 2025

work page 2025
[37]

Driggs, M

D. Driggs, M. J. Ehrhardt, C.-B. Sch\"onlieb , and J. Tang. Practical acceleration of the Condat--V\ u algorithm. SIAM J. Imaging Sci., 17 0 (4): 0 2076--2109, 2024

work page 2076
[38]

Drori, S

Y. Drori, S. Sabach, and M. Teboulle. A simple algorithm for a class of nonsmooth convex concave saddle-point problems. Oper. Res. Lett., 43 0 (2): 0 209--214, 2015

work page 2015
[39]

M. J. Ehrhardt, Z. Kereta, J. Liang, and J. Tang. A guide to stochastic optimisation for large-scale inverse problems. Inverse Problems, 41 0 (5): 0 053001, 2025

work page 2025
[40]

Glowinski, S

R. Glowinski, S. J. Osher, and W. Yin, editors. Splitting Methods in Communication, Imaging, Science, and Engineering. Springer International Publishing, 2016

work page 2016
[41]

Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

U. Jang and E. K. Ryu. Point convergence of Nesterov’s accelerated gradient method: an AI -assisted proof. preprint arXiv:2510.23513, Oct. 2025

work page arXiv 2025
[42]

U. Jang, S. D. Gupta, and E. K. Ryu. Computer-assisted design of accelerated composite optimization methods: OptISTA . Math. Program., 2025

work page 2025
[43]

K. Knopp. Theory and Application of Infinite Series. Blackie & Son, 2nd edition, 1951

work page 1951
[44]

Kovalev, A

D. Kovalev, A. Salim, and P. Richt\'arik . Optimal and practical algorithms for smooth and strongly convex decentralized optimization. In Proc.\ of Conf. on Neural Information Processing Systems (NeurIPS) , 2020

work page 2020
[45]

J. Lee, S. Yi, and E. K. Ryu. Convergence analyses of Davis--Yin splitting via scaled relative graphs. SIAM J. Optim., 35 0 (1): 0 270--301, 2025

work page 2025
[46]

H. Li, Z. Lin, and Y. Fang. Variance reduced EXTRA and DIGing and their optimal acceleration for strongly convex decentralized optimization. Journal of Machine Learning Research, 23: 0 1--41, 2022

work page 2022
[47]

Loris and C

I. Loris and C. Verhoeven. On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Problems, 27 0 (12), 2011

work page 2011
[48]

Mishchenko, G

K. Mishchenko, G. Malinovsky, S. Stich, and P. Richt\'arik . ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! In Proc.\ of the 39th International Conference on Machine Learning (ICML) , July 2022

work page 2022
[49]

Nesterov

Y. Nesterov. Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, 2004

work page 2004
[50]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140 0 (1): 0 125--161, 2013

work page 2013
[51]

O'Connor and L

D. O'Connor and L. Vandenberghe. On the equivalence of the primal-dual hybrid gradient method and Douglas--Rachford splitting. Math. Program., 79: 0 85--108, 2020

work page 2020
[52]

D. P. Palomar and Y. C. Eldar, editors. Convex Optimization in Signal Processing and Communications. Cambridge University Press, 2009

work page 2009
[53]

Papoutsellis, Z

E. Papoutsellis, Z. Kereta, and K. Papafitsoros. Why do we regularise in every iteration for imaging inverse problems? In Proc.\ of International Conference on Scale Space and Variational Methods in Computer Vision (SSVM) , pages 43--55. Springer Nature Switzerland, 2025

work page 2025
[54]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 3 0 (1): 0 127--239, 2014

work page 2014
[55]

N. G. Polson, J. G. Scott, and B. T. Willard. Proximal algorithms in statistics and machine learning. Statist. Sci., 30 0 (4): 0 559--581, 2015

work page 2015
[56]

B. T. Polyak. Gradient methods for the minimisation of functionals. Ussr Computational Mathematics and Mathematical Physics, 3: 0 864--878, 1963

work page 1963
[57]

B. T. Polyak. Introduction to Optimization. Optimization Software, Inc., Publications Division, New York, NY, USA, 1987. Revised version of the 1987 original dated Nov.\ 2010 available on ResearchGate

work page 1987
[58]

Salim, L

A. Salim, L. Condat, D. Kovalev, and P. Richt\'arik . An optimal algorithm for strongly convex minimization under affine constraints. In Proc.\ of Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS) , volume PMLR 151, pages 4482--4498, 2022 a

work page 2022
[59]

Salim, L

A. Salim, L. Condat, K. Mishchenko, and P. Richt\'arik . Dualize, split, randomize: Toward fast nonsmooth optimization algorithms. J. Optim. Theory Appl., July 2022 b

work page 2022
[60]

S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning. The MIT Press, 2011

work page 2011
[61]

Stathopoulos, H

G. Stathopoulos, H. Shukla, A. Szucs, Y. Pu, and C. N. Jones. Operator splitting methods in control. Foundations and Trends in Systems and Control, 3 0 (3): 0 249--362, 2016

work page 2016
[62]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program., 161 0 (1--2): 0 307--345, 2017

work page 2017
[63]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J. Optim. Theory Appl., 178: 0 455--476, 2018

work page 2018
[64]

P. Tseng. On accelerated proximal gradient methods for convex--concave optimization. preprint, 2008

work page 2008
[65]

Ushiyama

K. Ushiyama. A 2 -accelerated FISTA for composite strongly convex. preprint arXiv:2509.09295v2, Sept. 2025

work page arXiv 2025
[66]

B. C. V \ u . A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math., 38 0 (3): 0 667--681, Apr. 2013

work page 2013
[67]

Y. Wu, Y. Zhang, L. Liu, and Y. Ouyang. A note on the gradient-evaluation sequence in accelerated gradient methods. preprint arXiv:2603.06937, 2026

work page arXiv 2026
[68]

J. Xu, Y. Tian, Y. Sun, and G. Scutari. Accelerated primal–dual algorithm for distributed smooth convex optimization over networks. In Proc.\ of International Conference on Artificial Intelligence and Statistics (AISTATS) , volume PMLR 108, 2020

work page 2020
[69]

M. Yan. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. J. Sci. Comput., 76 0 (3): 0 1698--1717, Sept. 2018

work page 2018
[70]

R. Zhao. Accelerated stochastic algorithms for convex-concave saddle-point problems. Math. Oper. Res., 47: 0 1443--1473, 2022

work page 2022
[71]

R. Zhao, W. B. Haskell, and V. Y. F. Tan. An optimal algorithm for stochastic three-composite optimization. In Proc.\ of International Conference on Artificial Intelligence and Statistics (AISTATS) , volume PMLR 89, 2019

work page 2019
[72]

Y.-N. Zhu. Accelerated primal dual fixed point algorithm. preprint arXiv:2511.00385, 2025

work page arXiv 2025

[1] [1]

S. A. Alghunaim, E. K. Ryu, K. Yuan, and A. H. Sayed. Decentralized proximal gradient algorithms with linear convergence rates. IEEE Trans. Automatic Control, 66 0 (6): 0 2787--2794, June 2021

work page 2021

[2] [2]

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Z. Allen-Zhu and L. Orecchia. Linear coupling: An ultimate unification of gradient and mirror descent. preprint arXiv:1407.1537, 2014

work page Pith review arXiv 2014

[3] [3]

Attouch and J

H. Attouch and J. Peypouquet. The rate of convergence of N esterov's accelerated forward-backward method is actually faster than 1/k^2 . SIAM J. Optim., 26 0 (3): 0 1824--1834, 2016

work page 2016

[4] [4]

Aujol, C

J.-F. Aujol, C. Dossal, H. Labarri\`ere, and A. Rondepierre. FISTA restart using an automatic estimation of the growth parameter. Journal of Optimization Theory and Applications, 206: 0 51, 2025

work page 2025

[5] [5]

Auslender and M

A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim., 16 0 (3): 0 697--725, 2006

work page 2006

[6] [6]

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn., 4 0 (1): 0 1--106, 2012

work page 2012

[7] [7]

H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, 2nd edition, 2017

work page 2017

[8] [8]

H. H. Bauschke and W. M. Moursi. Understanding FISTA 's weak convergence: A step-by-step introduction to the 2025 milestone. preprint arXiv:2601.15398, 2026

work page arXiv 2025

[9] [9]

A. Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization. SIAM, 2017

work page 2017

[10] [10]

Beck and M

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2 0 (1): 0 183--202, 2009

work page 2009

[11] [11]

R. I. Bo t , E. R. Csetnek, and C. Hendrich. Recent developments on primal--dual splitting methods with applications to convex minimization. In P. M. Pardalos and T. M. Rassias, editors, Mathematics Without Boundaries: Surveys in Interdisciplinary Research, pages 57--99. Springer New York, 2014

work page 2014

[12] [12]

R. I. Bo t , E. Chenchene, E. R. Csetnek, and D. A. Hulett. Accelerating diagonal methods for bilevel optimization: Unified convergence via continuous-time dynamics. preprint arXiv:2505.14389, 2025 a

work page arXiv 2025

[13] [13]

R. I. Bo t , J. Fadili, and D.-K. Nguyen. The iterates of Nesterov’s accelerated algorithm converge in the critical regimes. preprint arXiv:2510.22715, Oct. 2025 b

work page arXiv 2025

[14] [14]

Bottou, F

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM review, 60 0 (2): 0 223--311, 2018

work page 2018

[15] [15]

S. Bubeck. Convex optimization: A lgorithms and complexity. Found. Trends Mach. Learn., 8 0 (3--4): 0 231--357, 2015

work page 2015

[16] [16]

Cevher, S

V. Cevher, S. Becker, and M. Schmidt. Convex optimization for big data: S calable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. , 31 0 (5): 0 32--43, 2014

work page 2014

[17] [17]

Fast Iterative Shrinkage/Thresholding Algorithm

A. Chambolle and C. Dossal. On the convergence of the iterates of the "Fast Iterative Shrinkage/Thresholding Algorithm" . J. Optim. Theory Appl., 166: 0 968--982, 2015

work page 2015

[18] [18]

Chambolle and T

A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40 0 (1): 0 120--145, May 2011

work page 2011

[19] [19]

Chambolle and T

A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta Numerica, 25: 0 161--319, 2016 a

work page 2016

[20] [20]

Chambolle and T

A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal--dual algorithm. Math. Program., 159 0 (1--2): 0 253--287, Sept. 2016 b

work page 2016

[21] [21]

Chambolle, M

A. Chambolle, M. J. Ehrhardt, P. Richt\'arik , and C.-B. Sch\"onlieb . Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim., 28 0 (4): 0 2783--2808, 2018

work page 2018

[22] [22]

P. Chen, J. Huang, and X. Zhang. A primal--dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Problems, 29 0 (2), 2013

work page 2013

[23] [23]

Y. Chen, G. Lan, and Y. Ouyang. Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim., 24: 0 1779--1814, 2014

work page 2014

[24] [24]

P. L. Combettes. The geometry of monotone operator splitting methods. Acta Numerica, 33: 0 487--632, 2024

work page 2024

[25] [25]

P. L. Combettes and J. I. Madariaga. Almost-surely convergent randomly activated monotone operator splitting methods. SIAM J. Imaging Sci., 18 0 (4): 0 2177--2205, 2025

work page 2025

[26] [26]

P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In H. H. Bauschke, R. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer-Verlag, New York, 2010

work page 2010

[27] [27]

P. L. Combettes and J.-C. Pesquet. Fixed point strategies in data science. IEEE Trans. Signal Process. , 69: 0 3878--3905, 2021

work page 2021

[28] [28]

P. L. Combettes, L. Condat, J.-C. Pesquet, and B. C. V\ u . A forward--backward view of some primal--dual optimization methods in image recovery. In Proc.\ of IEEE ICIP , Paris, France, Oct. 2014

work page 2014

[29] [29]

L. Condat. A primal-dual splitting method for convex optimization involving L ipschitzian, proximable and linear composite terms. J. Optim. Theory Appl., 158 0 (2): 0 460--479, 2013

work page 2013

[30] [30]

Condat and P

L. Condat and P. Richt \'a rik. RandProx : P rimal-dual optimization algorithms with randomized proximal updates. In Proc.\ of Int. Conf. Learning Representations (ICLR) , 2023

work page 2023

[31] [31]

Condat, G

L. Condat, G. Malinovsky, and P. Richt \'a rik. Distributed proximal splitting algorithms with rates and acceleration. Frontiers in Signal Processing, 1, Jan. 2022

work page 2022

[32] [32]

Condat, D

L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi. Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists. SIAM Review, 65 0 (2): 0 375--435, 2023

work page 2023

[33] [33]

Condat, E

L. Condat, E. Gasanov, and P. Richt\'arik . The stochastic multi-proximal method for nonsmooth optimization. preprint arXiv:2505.12409, 2025

work page arXiv 2025

[34] [34]

Condat, I

L. Condat, I. Agarsk \'y , and P. Richt \'a rik. CompressedScaffnew : The first theoretical double acceleration of communication from local training and compression in distributed optimization. Optimization, 2026

work page 2026

[35] [35]

Davis and W

D. Davis and W. Yin. A three-operator splitting scheme and its optimization applications. Set-Val. Var. Anal., 25: 0 829--858, 2017

work page 2017

[36] [36]

Dirren, M

C. Dirren, M. Bianchi, P. D. Grontas, J. Lygeros, and F. D\"orfler . Contractivity and linear convergence in bilinear saddle-point problems: An operator-theoretic approach. In Proc.\ of Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS) , volume PMLR 258, 2025

work page 2025

[37] [37]

Driggs, M

D. Driggs, M. J. Ehrhardt, C.-B. Sch\"onlieb , and J. Tang. Practical acceleration of the Condat--V\ u algorithm. SIAM J. Imaging Sci., 17 0 (4): 0 2076--2109, 2024

work page 2076

[38] [38]

Drori, S

Y. Drori, S. Sabach, and M. Teboulle. A simple algorithm for a class of nonsmooth convex concave saddle-point problems. Oper. Res. Lett., 43 0 (2): 0 209--214, 2015

work page 2015

[39] [39]

M. J. Ehrhardt, Z. Kereta, J. Liang, and J. Tang. A guide to stochastic optimisation for large-scale inverse problems. Inverse Problems, 41 0 (5): 0 053001, 2025

work page 2025

[40] [40]

Glowinski, S

R. Glowinski, S. J. Osher, and W. Yin, editors. Splitting Methods in Communication, Imaging, Science, and Engineering. Springer International Publishing, 2016

work page 2016

[41] [41]

Point convergence of nesterov’s accelerated gradient method: An ai-assisted proof.arXiv preprint arXiv:2510.23513, 2025

U. Jang and E. K. Ryu. Point convergence of Nesterov’s accelerated gradient method: an AI -assisted proof. preprint arXiv:2510.23513, Oct. 2025

work page arXiv 2025

[42] [42]

U. Jang, S. D. Gupta, and E. K. Ryu. Computer-assisted design of accelerated composite optimization methods: OptISTA . Math. Program., 2025

work page 2025

[43] [43]

K. Knopp. Theory and Application of Infinite Series. Blackie & Son, 2nd edition, 1951

work page 1951

[44] [44]

Kovalev, A

D. Kovalev, A. Salim, and P. Richt\'arik . Optimal and practical algorithms for smooth and strongly convex decentralized optimization. In Proc.\ of Conf. on Neural Information Processing Systems (NeurIPS) , 2020

work page 2020

[45] [45]

J. Lee, S. Yi, and E. K. Ryu. Convergence analyses of Davis--Yin splitting via scaled relative graphs. SIAM J. Optim., 35 0 (1): 0 270--301, 2025

work page 2025

[46] [46]

H. Li, Z. Lin, and Y. Fang. Variance reduced EXTRA and DIGing and their optimal acceleration for strongly convex decentralized optimization. Journal of Machine Learning Research, 23: 0 1--41, 2022

work page 2022

[47] [47]

Loris and C

I. Loris and C. Verhoeven. On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty. Inverse Problems, 27 0 (12), 2011

work page 2011

[48] [48]

Mishchenko, G

K. Mishchenko, G. Malinovsky, S. Stich, and P. Richt\'arik . ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! In Proc.\ of the 39th International Conference on Machine Learning (ICML) , July 2022

work page 2022

[49] [49]

Nesterov

Y. Nesterov. Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, 2004

work page 2004

[50] [50]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140 0 (1): 0 125--161, 2013

work page 2013

[51] [51]

O'Connor and L

D. O'Connor and L. Vandenberghe. On the equivalence of the primal-dual hybrid gradient method and Douglas--Rachford splitting. Math. Program., 79: 0 85--108, 2020

work page 2020

[52] [52]

D. P. Palomar and Y. C. Eldar, editors. Convex Optimization in Signal Processing and Communications. Cambridge University Press, 2009

work page 2009

[53] [53]

Papoutsellis, Z

E. Papoutsellis, Z. Kereta, and K. Papafitsoros. Why do we regularise in every iteration for imaging inverse problems? In Proc.\ of International Conference on Scale Space and Variational Methods in Computer Vision (SSVM) , pages 43--55. Springer Nature Switzerland, 2025

work page 2025

[54] [54]

Parikh and S

N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 3 0 (1): 0 127--239, 2014

work page 2014

[55] [55]

N. G. Polson, J. G. Scott, and B. T. Willard. Proximal algorithms in statistics and machine learning. Statist. Sci., 30 0 (4): 0 559--581, 2015

work page 2015

[56] [56]

B. T. Polyak. Gradient methods for the minimisation of functionals. Ussr Computational Mathematics and Mathematical Physics, 3: 0 864--878, 1963

work page 1963

[57] [57]

B. T. Polyak. Introduction to Optimization. Optimization Software, Inc., Publications Division, New York, NY, USA, 1987. Revised version of the 1987 original dated Nov.\ 2010 available on ResearchGate

work page 1987

[58] [58]

Salim, L

A. Salim, L. Condat, D. Kovalev, and P. Richt\'arik . An optimal algorithm for strongly convex minimization under affine constraints. In Proc.\ of Int.\ Conf.\ Artificial Intelligence and Statistics (AISTATS) , volume PMLR 151, pages 4482--4498, 2022 a

work page 2022

[59] [59]

Salim, L

A. Salim, L. Condat, K. Mishchenko, and P. Richt\'arik . Dualize, split, randomize: Toward fast nonsmooth optimization algorithms. J. Optim. Theory Appl., July 2022 b

work page 2022

[60] [60]

S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning. The MIT Press, 2011

work page 2011

[61] [61]

Stathopoulos, H

G. Stathopoulos, H. Shukla, A. Szucs, Y. Pu, and C. N. Jones. Operator splitting methods in control. Foundations and Trends in Systems and Control, 3 0 (3): 0 249--362, 2016

work page 2016

[62] [62]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program., 161 0 (1--2): 0 307--345, 2017

work page 2017

[63] [63]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. J. Optim. Theory Appl., 178: 0 455--476, 2018

work page 2018

[64] [64]

P. Tseng. On accelerated proximal gradient methods for convex--concave optimization. preprint, 2008

work page 2008

[65] [65]

Ushiyama

K. Ushiyama. A 2 -accelerated FISTA for composite strongly convex. preprint arXiv:2509.09295v2, Sept. 2025

work page arXiv 2025

[66] [66]

B. C. V \ u . A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math., 38 0 (3): 0 667--681, Apr. 2013

work page 2013

[67] [67]

Y. Wu, Y. Zhang, L. Liu, and Y. Ouyang. A note on the gradient-evaluation sequence in accelerated gradient methods. preprint arXiv:2603.06937, 2026

work page arXiv 2026

[68] [68]

J. Xu, Y. Tian, Y. Sun, and G. Scutari. Accelerated primal–dual algorithm for distributed smooth convex optimization over networks. In Proc.\ of International Conference on Artificial Intelligence and Statistics (AISTATS) , volume PMLR 108, 2020

work page 2020

[69] [69]

M. Yan. A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. J. Sci. Comput., 76 0 (3): 0 1698--1717, Sept. 2018

work page 2018

[70] [70]

R. Zhao. Accelerated stochastic algorithms for convex-concave saddle-point problems. Math. Oper. Res., 47: 0 1443--1473, 2022

work page 2022

[71] [71]

R. Zhao, W. B. Haskell, and V. Y. F. Tan. An optimal algorithm for stochastic three-composite optimization. In Proc.\ of International Conference on Artificial Intelligence and Statistics (AISTATS) , volume PMLR 89, 2019

work page 2019

[72] [72]

Y.-N. Zhu. Accelerated primal dual fixed point algorithm. preprint arXiv:2511.00385, 2025

work page arXiv 2025