Accuracy Certificates for Convex Optimization at Accelerated Rates via Primal-Dual Averaging
Pith reviewed 2026-05-10 04:00 UTC · model grok-4.3
The pith
A three-average primal-dual averaging method yields accelerated rates for computable optimality certificates in convex optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Solving a regularized surrogate with primal-dual averaging supplies non-asymptotic bounds on a computable optimality certificate. One-average schemes (modified dual averaging and generalized conditional gradient) achieve Õ(ε^{-1}) rates. A self-dual two-average method preserves symmetry at the cost of certificate guarantees. The three-average method recovers accelerated Õ(ε^{-1/2}) certificate convergence; its primal form mirrors dual gradient extrapolation. The same averaging framework connects to equilibrium dynamics in zero-sum games and Fisher markets.
What carries the argument
Three-average primal-dual averaging applied to a regularized surrogate, which restores both symmetry and accelerated certificate rates while corresponding to known dual extrapolation schemes.
If this is right
- One-average primal-dual averaging achieves Õ(ε^{-1}) certificate complexity.
- Two-average methods remain symmetric but lose certificate convergence guarantees.
- The three-average method achieves Õ(ε^{-1/2}) certificate complexity.
- The primal three-average scheme corresponds exactly to gradient extrapolation on the dual side.
- The averaging constructions apply directly to equilibrium computation in zero-sum matrix games and Fisher markets.
Where Pith is reading between the lines
- The number of averages can be viewed as a tunable parameter that trades off symmetry, acceleration, and certificate quality across other convex problems.
- These certificates supply practical, computable stopping rules for large-scale solvers where the true duality gap cannot be evaluated exactly.
- The explicit game-theoretic and market interpretations open the possibility of importing averaging techniques into equilibrium-finding algorithms.
Load-bearing premise
The underlying problem is convex and the chosen regularization allows the averaging procedures to solve the surrogate to sufficient accuracy.
What would settle it
A concrete convex optimization instance on which the three-average method produces a certificate whose error fails to shrink at the claimed accelerated rate.
Figures
read the original abstract
Many works in convex optimization provide rates for achieving a small primal gap. However, this quantity is typically unavailable in practice. In this work, we show that solving a regularized surrogate with algorithms based on simple primal-dual averaging provides non-asymptotic convergence guarantees for a \textit{computable} optimality certificate. We first analyze primal and dual methods based on one average, namely modified dual averaging and generalized conditional gradient, and establish $\tilde{O}(\varepsilon^{-1})$ certificate complexities. Motivated by asymmetries in the one-average case, we analyze a self-dual, two-average method that preserves symmetry while losing certificate guarantees. To recover certificate convergence, we propose a three-average method that achieves an accelerated $\tilde{O}(\varepsilon^{-1/2})$ certificate complexity. Furthermore, we prove primal-dual algorithm correspondences for the one, two, and three-average cases. In particular, the primal three-average accelerated method mirrors the well-known gradient extrapolation method in the dual. By interpreting our results through the lens of zero-sum matrix games and Fisher markets, we further connect primal-dual averaging methods to game theory and market dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops primal-dual averaging schemes for convex optimization that produce computable optimality certificates rather than relying on the unavailable primal gap. One-average methods (modified dual averaging and generalized conditional gradient) are shown to achieve Õ(ε^{-1}) certificate complexity. A symmetric two-average method is analyzed but loses certificate guarantees. A three-average method is proposed that recovers an accelerated Õ(ε^{-1/2}) certificate complexity. Primal-dual algorithm correspondences are established for all three cases, and the results are interpreted in the settings of zero-sum matrix games and Fisher markets.
Significance. If the claimed accelerated certificate rate survives explicit control of the regularization bias, the work would provide a useful bridge between simple averaging algorithms and verifiable accuracy guarantees at accelerated rates. The primal-dual correspondences and game-theoretic links are additional strengths that could aid interpretation in related fields.
major comments (2)
- [Abstract] Abstract: the stated Õ(ε^{-1/2}) certificate complexity for the three-average method does not exhibit the regularization schedule λ(ε). Controlling the bias term (typically O(λ R²)) to be at most ε/2 forces λ = Θ(ε). Substituting this choice into the convergence analysis of the averaging scheme on the λ-regularized surrogate (which inherits a 1/λ factor from strong convexity or smoothness) produces an overall iteration count of Õ(ε^{-1}) rather than Õ(ε^{-1/2}), unless the paper derives an explicit cancellation that preserves acceleration.
- [Abstract] The non-asymptotic certificate bounds for the one-average and three-average cases must be re-derived with the explicit λ(ε) dependence included; without this, it is unclear whether the claimed rates are for the surrogate only or for the original unregularized problem.
minor comments (1)
- [Abstract] The abstract uses both 'tilde-O' and 'Õ' notation interchangeably; standardize the notation throughout.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The comments correctly identify that the abstract and analysis must make the dependence on the regularization parameter λ(ε) fully explicit when transferring guarantees from the surrogate to the original problem. We address both points below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the stated Õ(ε^{-1/2}) certificate complexity for the three-average method does not exhibit the regularization schedule λ(ε). Controlling the bias term (typically O(λ R²)) to be at most ε/2 forces λ = Θ(ε). Substituting this choice into the convergence analysis of the averaging scheme on the λ-regularized surrogate (which inherits a 1/λ factor from strong convexity or smoothness) produces an overall iteration count of Õ(ε^{-1}) rather than Õ(ε^{-1/2}), unless the paper derives an explicit cancellation that preserves acceleration.
Authors: We agree that the abstract does not display the λ(ε) schedule and that a direct substitution λ = Θ(ε) introduces a 1/λ factor that offsets the acceleration, yielding an overall Õ(ε^{-1}) certificate complexity for the original problem. The manuscript derives accelerated rates only for the λ-regularized surrogate; no cancellation that restores Õ(ε^{-1/2}) for the unregularized objective is shown. We will revise the abstract, introduction, and theorem statements to state the surrogate rates explicitly, insert the choice λ(ε) = Θ(ε) needed to control bias, and report the resulting Õ(ε^{-1}) complexity for the original problem. This makes the claims precise without overstating the acceleration. revision: yes
-
Referee: [Abstract] The non-asymptotic certificate bounds for the one-average and three-average cases must be re-derived with the explicit λ(ε) dependence included; without this, it is unclear whether the claimed rates are for the surrogate only or for the original unregularized problem.
Authors: We accept the request. The current non-asymptotic bounds are written for a generic fixed λ and therefore apply directly to the surrogate. We will re-derive and restate the bounds for both the one-average and three-average methods with the explicit substitution λ = Θ(ε) (or any other schedule the referee prefers), clearly separating the surrogate certificate complexity from the complexity required to reach an ε-certificate for the original objective. The one-average case remains Õ(ε^{-1}); the three-average case becomes Õ(ε^{-1}) once bias control is included. These revisions will be placed in the main theorems and the abstract. revision: yes
Circularity Check
No circularity; derivations follow from explicit analysis of primal-dual averaging on regularized surrogates
full rationale
The paper's central results consist of convergence analyses for one-average, two-average, and three-average primal-dual methods applied to a λ-regularized surrogate problem, followed by direct proofs of primal-dual algorithm correspondences. These steps rely on standard Lyapunov-style arguments and averaging identities that are derived from the algorithm recursions themselves rather than from any fitted parameter, self-definition, or prior self-citation that encodes the target certificate rate. The choice of regularization parameter λ appears as an explicit design choice whose effect on bias and strong-convexity modulus is accounted for in the final complexity statement; it does not create a definitional loop. No load-bearing uniqueness theorem or ansatz is imported from the authors' own prior work. The claimed Õ(ε^{-1/2}) certificate complexity is therefore obtained by ordinary non-asymptotic analysis and remains independent of the result it asserts.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization parameter
axioms (2)
- domain assumption The optimization problem is convex
- domain assumption Primal-dual averaging schemes produce the claimed convergence on the surrogate
Reference graph
Works this paper leans on
-
[1]
Online learning and online convex optimization , author=. Foundations and Trends. 2012 , publisher=
work page 2012
-
[2]
Statistical inference via convex optimization , author=. 2020 , publisher=
work page 2020
-
[3]
Foundations of Computational Mathematics , volume=
Exact matrix completion via convex programming , author=. Foundations of Computational Mathematics , volume=
-
[4]
IEEE Transactions on Information Theory , volume=
Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information , author=. IEEE Transactions on Information Theory , volume=. 2006 , publisher=
work page 2006
-
[5]
Journal of Mathematical Imaging and Vision , volume=
A first-order primal-dual algorithm for convex problems with applications to imaging , author=. Journal of Mathematical Imaging and Vision , volume=. 2011 , publisher=
work page 2011
- [6]
- [7]
-
[8]
Mathematical Programming , volume=
Unifying mirror descent and dual averaging , author=. Mathematical Programming , volume=. 2023 , publisher=
work page 2023
-
[9]
Journal of Optimization Theory and Applications , volume=
Quasi-monotone subgradient methods for nonsmooth convex minimization , author=. Journal of Optimization Theory and Applications , volume=. 2015 , publisher=
work page 2015
-
[10]
Mathematical programming , volume=
A single cut proximal bundle method for stochastic convex composite optimization , author=. Mathematical programming , volume=. 2024 , publisher=
work page 2024
-
[11]
arXiv preprint arXiv:2407.10073 , year=
Universal subgradient and proximal bundle methods for convex and strongly convex hybrid composite optimization , author=. arXiv preprint arXiv:2407.10073 , year=
-
[12]
A subgradient method for finding a saddle point of a convex-concave function , author=. Issled. Prikl. Mat , volume=
- [13]
-
[14]
Generalized gradient method for finding saddlepoints , author=. Matekon , volume=. 1974 , publisher=
work page 1974
-
[15]
Studies in linear and nonlinear programming , volume=
Iterative methods for concave programming , author=. Studies in linear and nonlinear programming , volume=
-
[16]
SIAM Journal on Optimization , volume=
A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes , author=. SIAM Journal on Optimization , volume=. 2021 , publisher=
work page 2021
-
[17]
Mathematics of Operations Research , volume=
A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems , author=. Mathematics of Operations Research , volume=. 2024 , publisher=
work page 2024
-
[18]
SIAM Journal on Optimization , volume=
Optimal convergence rates for the proximal bundle method , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=
work page 2023
-
[19]
SIAM Journal on Optimization , volume=
A nonmonotone proximal bundle method with (potentially) continuous step decisions , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=
work page 2013
-
[20]
Convex analysis and minimization algorithms II , author=. 1993 , publisher=
work page 1993
-
[21]
Computers & Operations Research , volume=
Probabilistic optimization via approximate p-efficient points and bundle methods , author=. Computers & Operations Research , volume=. 2017 , publisher=
work page 2017
- [22]
-
[23]
Mathematical Programming , volume=
Convex proximal bundle methods in depth: a unified analysis for inexact oracles , author=. Mathematical Programming , volume=. 2014 , publisher=
work page 2014
-
[24]
Journal of Optimization Theory and Applications , volume=
Efficiency of proximal bundle methods , author=. Journal of Optimization Theory and Applications , volume=. 2000 , publisher=
work page 2000
-
[25]
SIAM Journal on Optimization , volume=
Generalized bundle methods , author=. SIAM Journal on Optimization , volume=. 2002 , publisher=
work page 2002
-
[26]
Journal of Optimization Theory and Applications , volume=
Rate of convergence of the bundle method , author=. Journal of Optimization Theory and Applications , volume=. 2017 , publisher=
work page 2017
-
[27]
Mifflin, R. , booktitle=. A modification and an extension of. 1982 , publisher=
work page 1982
-
[28]
Nonsmooth optimization and descent methods , author=. 1978 , publisher=
work page 1978
-
[29]
Nondifferentiable optimization , pages=
A method of conjugate subgradients for minimizing nondifferentiable functions , author=. Nondifferentiable optimization , pages=. 1975 , publisher=
work page 1975
-
[30]
Nondifferentiable optimization , pages=
An extension of Davidon methods to non differentiable problems , author=. Nondifferentiable optimization , pages=. 1975 , publisher=
work page 1975
- [31]
-
[32]
Automation and Remote Control , volume=
Algorithms of robust stochastic optimization based on mirror descent method , author=. Automation and Remote Control , volume=. 2019 , publisher=
work page 2019
-
[33]
Aujol, J.-F. and Calatroni, L. and Dossal, C. and Labarri. Parameter-free. Available at arXiv:2307.14323 , year=
-
[34]
Mathematical Programming , year=
A single cut proximal bundle method for stochastic convex composite optimization , author=. Mathematical Programming , year=
-
[35]
Optimal and parameter-free gradient minimization methods for smooth optimization , author=. Available at arXiv:2310.12139 , year=
-
[36]
Mathematical programming , volume=
Gradient methods for minimizing composite functions , author=. Mathematical programming , volume=. 2013 , publisher=
work page 2013
-
[37]
Journal of Machine Learning Research , volume=
Loss minimization and parameter estimation with heavy tails , author=. Journal of Machine Learning Research , volume=. 2016 , publisher=
work page 2016
-
[38]
Problem Complexity and Method Efficiency in Optimization , author=. 1983 , publisher=
work page 1983
-
[39]
SIAM Journal on Control and Optimization , volume=
Acceleration of stochastic approximation by averaging , author=. SIAM Journal on Control and Optimization , volume=. 1992 , publisher=
work page 1992
-
[40]
Mathematical Programming , volume=
An optimal method for stochastic composite optimization , author=. Mathematical Programming , volume=. 2012 , publisher=
work page 2012
-
[41]
SIAM Journal on Optimization , volume=
Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=
work page 2013
-
[42]
SIAM Journal on Optimization , volume=
Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework , author=. SIAM Journal on Optimization , volume=. 2012 , publisher=
work page 2012
-
[43]
Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization , author=. Stochastic Systems , volume=. 2014 , publisher=
work page 2014
-
[44]
SIAM Journal on Optimization , volume=
On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean , author=. SIAM Journal on Optimization , volume=. 2010 , publisher=
work page 2010
-
[45]
R. D. C. Monteiro and B. F. Svaiter , title =. SIAM Journal on Optimization , volume =
-
[46]
Gradient methods for minimizing composite functions , author=. Math. Programming , pages=. 2012 , publisher=
work page 2012
-
[47]
SIAM Journal on Optimization , volume=
Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity , author=. SIAM Journal on Optimization , volume=. 2019 , publisher=
work page 2019
-
[48]
Mathematics of Operations Research , year=
A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems , author=. Mathematics of Operations Research , year=
-
[49]
Journal of Machine Learning Research , volume=
From low probability to high confidence in stochastic convex optimization , author=. Journal of Machine Learning Research , volume=
-
[50]
A. Nemirovski and A. Juditsky and G. Lan and A. Shapiro , title=. SIAM Journal on Optimization , volume=19, issue=4, year=2009, pages=
work page 2009
-
[51]
J.-B. H. Urruty and C. Lemar\'echal , title =
- [52]
- [53]
-
[54]
The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , author=. arXiv , year=
-
[55]
Journal of optimization theory and applications , volume=
Subgradient methods for saddle-point problems , author=. Journal of optimization theory and applications , volume=. 2009 , publisher=
work page 2009
-
[56]
R.I. Boţ and E.R. Csetnek and M. Sedlmayer Michael , title =. Comput. Optim. Appl. , volume =. 2023 , pages =
work page 2023
- [57]
-
[58]
Y. Nesterov and V. Shikhman , TITLE =. J. Oper. Res. Soc. China , VOLUME =. 2018 , NUMBER =
work page 2018
- [59]
- [60]
-
[61]
T. Larsson and M. Patriksson and AB. Strömberg , title =. Math. Program. , volume =
-
[62]
E. Gustavsson and M. Patriksson and AB. Strömberg , title =. Math. Program. , volume =
-
[63]
SIAM Journal on Optimization , volume=
Approximate primal solutions and rate analysis for dual subgradient methods , author=. SIAM Journal on Optimization , volume=. 2009 , publisher=
work page 2009
-
[64]
Some primal-dual theory for subgradient methods for strongly convex optimization , author=. ArXiv , year=
- [65]
-
[66]
Mathematical Programming , volume=
Primal-dual subgradient methods for convex problems , author=. Mathematical Programming , volume=. 2009 , publisher=
work page 2009
- [67]
- [68]
-
[69]
SIAM Journal on Optimization , volume=
A unifying polyhedral approximation framework for convex optimization , author=. SIAM Journal on Optimization , volume=. 2011 , publisher=
work page 2011
- [70]
-
[71]
Learning with submodular functions: A convex optimization perspective , author=. Foundations and Trends. 2013 , publisher=
work page 2013
-
[72]
SIAM Journal on Optimization , volume=
Duality between subgradient and conditional gradient methods , author=. SIAM Journal on Optimization , volume=. 2015 , publisher=
work page 2015
-
[73]
Advances in Neural Information Processing Systems , volume=
First-order methods for large-scale market equilibrium computation , author=. Advances in Neural Information Processing Systems , volume=
-
[74]
Chen, G. and Teboulle, M. , year = 2006, journal =. Convergence. doi:10.1137/0803026 , urldate =
-
[75]
SIAM Journal on Optimization , volume =
Relatively Smooth Convex Optimization by First-Order Methods, and Applications , author =. SIAM Journal on Optimization , volume =. doi:10.1137/16M1099546 , urldate =
-
[76]
Mathematical Programming , volume =
Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization , author =. Mathematical Programming , volume =
-
[77]
Primal-dual proximal bundle and conditional gradient methods for convex problems: J. Liang , author=. Mathematical Programming , pages=. 2025 , publisher=
work page 2025
-
[78]
Cheung, Y. K. and Cole, R. and Devanur, N. , year = 2013, series =. Tatonnement beyond Gross Substitutes? Gradient Descent to the Rescue , shorttitle =. Proceedings of the Forty-Fifth Annual. doi:10.1145/2488608.2488633 , urldate =
-
[79]
Lu, H. and Freund, R. M. , journal=. Generalized stochastic. 2021 , publisher=
work page 2021
-
[80]
Mathematical Programming , volume =
Subgradient Ellipsoid Method for Nonsmooth Convex Problems , author =. Mathematical Programming , volume =. doi:10.1007/s10107-022-01833-4 , urldate =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.