pith. sign in

arxiv: 2605.17883 · v1 · pith:PEVFN4MRnew · submitted 2026-05-18 · 🧮 math.OC

On the convergence of doubly stochastic Primal-Dual Hybrid Gradient Method

Pith reviewed 2026-05-20 09:41 UTC · model grok-4.3

classification 🧮 math.OC
keywords primal-dual hybrid gradientsaddle-point problemsstochastic optimizationblock coordinate updatesconvergence ratesconvex-concave problemsrestarted methodsergodic convergence
0
0 comments X

The pith

A doubly stochastic primal-dual hybrid gradient method achieves O(1/K) ergodic convergence for convex saddle-point problems with randomized block updates on both sides.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a doubly stochastic primal-dual hybrid gradient method for block-structured convex-concave saddle-point problems, where both primal and dual variables are updated in random blocks to reduce computational cost on large problems. It proves an O(1/K) rate for the expected restricted primal-dual gap in the general convex case when blockwise step sizes meet specific conditions. A restarted version of the method is shown to converge linearly under a quadratic growth condition measured by the smoothed primal-dual gap. This approach unifies and extends existing primal-dual hybrid gradient and stochastic variants by allowing independent randomization on each side.

Core claim

The doubly stochastic primal-dual hybrid gradient method performs randomized block updates on both primal and dual variables and establishes an O(1/K) ergodic convergence rate for the expected restricted primal-dual gap under suitable blockwise step-size conditions in the general convex setting; its restarted variant achieves linear convergence of the outer iterates under a quadratic growth condition in the smoothed primal-dual gap.

What carries the argument

The doubly stochastic primal-dual hybrid gradient method (DSPDHG), which extends classical PDHG by independently selecting random blocks for primal and dual updates at each iteration.

If this is right

  • The method reduces exactly to deterministic PDHG when every block is selected at each step.
  • The method reduces to one-sided stochastic variants when randomization occurs on only the primal or only the dual side.
  • A restarted outer-loop version delivers linear convergence once a quadratic growth condition holds for the smoothed gap.
  • Practical runs with standard step sizes match or exceed the speed of PDHG, SPDHG, and their restarted forms on large instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same block-randomization idea could be applied to other primal-dual splitting schemes to obtain similar rates without full-vector updates.
  • The analysis suggests that two-sided stochasticity preserves the same asymptotic rate as one-sided stochasticity when step sizes are chosen blockwise.
  • Linear convergence under quadratic growth may extend to problems where the gap satisfies a local error bound rather than strict quadratic growth.

Load-bearing premise

Suitable blockwise step-size conditions exist that make the convergence rate hold even when blocks are chosen randomly.

What would settle it

Numerical runs on a standard convex-concave saddle-point test problem where the measured expected restricted primal-dual gap after K iterations fails to decrease proportionally to 1/K when the stated blockwise step-size rules are followed.

Figures

Figures reproduced from arXiv: 2605.17883 by Huikang Liu, Yiheng Xiao.

Figure 1
Figure 1. Figure 1: Relative error for soft-margin SVM. First column: colon-cancer ( [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Constraint violation and relative error for MPC problems with [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
read the original abstract

We study a block-structured class of convex-concave saddle-point problems in which both the primal and dual variables admit natural separable decompositions. Motivated by large-scale applications where a full update on either side can be computationally expensive, we propose a doubly stochastic primal--dual hybrid gradient method (DSPDHG) that performs randomized block updates on both primal and dual variables.The method extends classical PDHG and stochastic PDHG (SPDHG) schemes in a unified manner:it reduces to deterministic PDHG when all blocks are selected and to one-sided stochastic variants when only one side is randomized. For the general convex setting, we establish an $\mathcal{O}(1/K)$ ergodic convergence rate for the expected restricted primal--dual gap under suitable blockwise step-size conditions. We further analyze a restarted variant of DSPDHG under a quadratic growth condition in terms of the smoothed primal-dual gap. Under this regularity assumption, we prove linear convergence of the restarted outer iterates. Numerical evidence is provided to show that restarted DSPDHG with standard step sizes demonstrates competitive practical performance compared with PDHG, SPDHG, and their restarted variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes the doubly stochastic primal-dual hybrid gradient (DSPDHG) method for block-structured convex-concave saddle-point problems, performing randomized block updates on both primal and dual variables. It reduces to classical PDHG when all blocks are selected and to one-sided stochastic variants otherwise. For the general convex case the paper claims an O(1/K) ergodic rate on the expected restricted primal-dual gap under suitable blockwise step-size conditions; under an additional quadratic-growth assumption it further claims linear convergence of the restarted outer iterates measured in the smoothed primal-dual gap. Numerical experiments are presented to illustrate practical competitiveness with PDHG, SPDHG and their restarted versions.

Significance. If the stated rates are rigorously established with explicit, practical step-size rules, the work supplies a unified convergence theory for doubly stochastic primal-dual schemes that is directly relevant to large-scale saddle-point problems. The linear-rate result under quadratic growth and the provision of numerical evidence are positive features; the main significance hinges on whether the blockwise step-size conditions can be satisfied without knowledge of unknown constants or without forcing steps that shrink with the number of blocks.

major comments (2)
  1. [§3] §3 (General convex convergence): the O(1/K) ergodic bound on the expected restricted primal-dual gap is stated to hold under 'suitable blockwise step-size conditions' for randomized block selections. The explicit form of these conditions (in terms of block Lipschitz constants of the coupling operator, block probabilities, or norms of the blocks) is not given, nor is it shown that feasible choices exist that remain independent of the number of blocks and do not degrade the rate below O(1/K). This is load-bearing for the central claim.
  2. [§4] §4 (Restarted variant): the linear convergence statement for the restarted outer iterates is proved under a quadratic-growth condition on the smoothed primal-dual gap. It is unclear whether the quadratic-growth assumption is imposed on the original or the smoothed problem and how the smoothing parameter enters the linear rate; without this clarification the result cannot be directly compared with existing restarted PDHG analyses.
minor comments (3)
  1. [Abstract] The abstract and introduction should explicitly state whether the blockwise step-size rule reduces to the standard PDHG rule when each side consists of a single block.
  2. [Notation section] Notation for the restricted primal-dual gap versus the smoothed gap should be made uniform across theorems and proofs.
  3. [Numerical experiments] Numerical section: the step sizes actually used in the experiments should be reported and compared with the theoretical blockwise conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will incorporate clarifications and explicit derivations into the revised version.

read point-by-point responses
  1. Referee: [§3] §3 (General convex convergence): the O(1/K) ergodic bound on the expected restricted primal-dual gap is stated to hold under 'suitable blockwise step-size conditions' for randomized block selections. The explicit form of these conditions (in terms of block Lipschitz constants of the coupling operator, block probabilities, or norms of the blocks) is not given, nor is it shown that feasible choices exist that remain independent of the number of blocks and do not degrade the rate below O(1/K). This is load-bearing for the central claim.

    Authors: We agree that the step-size conditions merit a more explicit treatment. In the current manuscript, Assumption 3.1 states the blockwise conditions in terms of the local Lipschitz constants L_{ij} of the coupling operator blocks and the selection probabilities p_i, q_j; Theorem 3.2 then derives the O(1/K) ergodic rate on the expected restricted gap under these conditions. To address the concern directly, the revision will include a new remark (or corollary) that constructs concrete, practical step-size rules: for example, set the primal step sizes τ_i = 1/(2 max_j L_{ij}/p_i) and dual step sizes σ_j = 1/(2 max_i L_{ij}/q_j). These choices depend only on per-block quantities and the local probabilities; because the probabilities sum to one, the resulting constants remain independent of the total number of blocks. Consequently the ergodic rate stays O(1/K) without degradation. We will also verify that these rules reduce correctly to the classical PDHG step-size condition when all blocks are selected with probability one. revision: yes

  2. Referee: [§4] §4 (Restarted variant): the linear convergence statement for the restarted outer iterates is proved under a quadratic-growth condition on the smoothed primal-dual gap. It is unclear whether the quadratic-growth assumption is imposed on the original or the smoothed problem and how the smoothing parameter enters the linear rate; without this clarification the result cannot be directly compared with existing restarted PDHG analyses.

    Authors: The quadratic-growth assumption (Assumption 4.1) is imposed directly on the smoothed primal-dual gap function G_ε, not on the original unsmoothed gap; this is stated explicitly in Section 4 and used in Theorem 4.2. The smoothing parameter ε appears in the linear rate through the strong-convexity modulus of G_ε, which is at least ε/2 in the appropriate norm; the contraction factor of the restarted outer iterates is then of the form 1 − Θ(min{μ, ε}), where μ is the quadratic-growth constant. In the revision we will add a short comparison paragraph noting that this dependence is analogous to the smoothing parameter in restarted PDHG analyses (e.g., the rate degrades gracefully as ε → 0 but remains linear for any fixed positive ε). We will also clarify that the restarted scheme is applied to the smoothed problem while the inner DSPDHG iterations use the original (unsmoothed) operator. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained via standard convex analysis of block updates.

full rationale

The paper establishes the O(1/K) ergodic rate for the expected restricted primal-dual gap directly from the DSPDHG update rules under explicitly stated blockwise step-size conditions, and derives linear convergence of the restarted variant from the quadratic growth assumption on the smoothed gap. These steps rely on standard Lyapunov-style arguments for primal-dual methods and do not reduce to fitted parameters, self-definitions, or load-bearing self-citations. The blockwise conditions are part of the theorem hypotheses rather than presupposed results, and the analysis extends classical PDHG/SPDHG without circular renaming or ansatz smuggling. The derivation remains independent of the target rates.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on standard assumptions of convexity-concavity and block separability of the variables, plus the quadratic growth condition for the linear rate; no new entities are introduced and step sizes are conditioned rather than freely fitted.

axioms (3)
  • domain assumption The problem is convex-concave with natural separable block decompositions for primal and dual variables.
    Invoked to motivate the block-structured class and enable randomized block updates.
  • domain assumption Suitable blockwise step-size conditions hold.
    Required for the O(1/K) ergodic convergence rate in the general convex setting.
  • domain assumption Quadratic growth condition on the smoothed primal-dual gap.
    Used to establish linear convergence for the restarted variant.

pith-pipeline@v0.9.0 · 5727 in / 1601 out tokens · 39203 ms · 2026-05-20T09:41:22.612496+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

  1. [1]

    cuhallar: A gpu accelerated low-rank augmented lagrangian method for large-scale semidefinite programming.arXiv preprint arXiv:2505.13719, 2025

    Jacob M Aguirre, Diego Cifuentes, Vincent Guigues, Renato DC Monteiro, Victor Hugo Nascimento, and Arnesh Sujanani. cuhallar: A gpu accelerated low-rank augmented lagrangian method for large-scale semidefinite programming.arXiv preprint arXiv:2505.13719, 2025

  2. [2]

    Random extrapolation for primal-dual coordinate descent

    Ahmet Alacaoglu, Olivier Fercoq, and V olkan Cevher. Random extrapolation for primal-dual coordinate descent. InInternational conference on machine learning, pages 191–201. PMLR, 2020

  3. [3]

    On the convergence of stochastic primal-dual hybrid gradient.SIAM Journal on Optimization, 32(2):1288–1318, 2022

    Ahmet Alacaoglu, Olivier Fercoq, and V olkan Cevher. On the convergence of stochastic primal-dual hybrid gradient.SIAM Journal on Optimization, 32(2):1288–1318, 2022

  4. [4]

    Practical large-scale linear programming using primal-dual hybrid gradient.Advances in Neural Information Processing Systems, 34:20243–20257, 2021

    David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O’Donoghue, and Warren Schudy. Practical large-scale linear programming using primal-dual hybrid gradient.Advances in Neural Information Processing Systems, 34:20243–20257, 2021

  5. [5]

    Ehrhardt, Peter Richtárik, and Carola-Bibiane Schönlieb

    Antonin Chambolle, Matthias J. Ehrhardt, Peter Richtárik, and Carola-Bibiane Schönlieb. Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications.SIAM Journal on Optimization, 28(4):2783–2808, 2018

  6. [6]

    A first-order primal-dual algorithm for convex problems with applications to imaging.Journal of mathematical imaging and vision, 40:120–145, 2011

    Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging.Journal of mathematical imaging and vision, 40:120–145, 2011

  7. [7]

    On the ergodic convergence rates of a first-order primal–dual algorithm

    Antonin Chambolle and Thomas Pock. On the ergodic convergence rates of a first-order primal–dual algorithm. Mathematical Programming, 159(1):253–287, 2016

  8. [8]

    Libsvm: A library for support vector machines.ACM Trans

    Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3), May 2011

  9. [9]

    Hpr-lp: An implementation of an hpr method for solving linear programming.arXiv preprint arXiv:2408.12179, 2024

    Kaihuang Chen, Defeng Sun, Yancheng Yuan, Guojun Zhang, and Xinyuan Zhao. Hpr-lp: An implementation of an hpr method for solving linear programming.arXiv preprint arXiv:2408.12179, 2024

  10. [10]

    Hpr-qp: A dual halpern peaceman-rachford method for solving large-scale convex composite quadratic programming.arXiv preprint arXiv:2507.02470, 2025

    Kaihuang Chen, Defeng Sun, Yancheng Yuan, Guojun Zhang, and Xinyuan Zhao. Hpr-qp: A dual halpern peaceman-rachford method for solving large-scale convex composite quadratic programming.arXiv preprint arXiv:2507.02470, 2025

  11. [11]

    New understandings and computation on augmented lagrangian methods for low-rank semidefinite programming.arXiv preprint arXiv:2505.15775, 2025

    Lijun Ding, Haihao Lu, and Jinwen Yang. New understandings and computation on augmented lagrangian methods for low-rank semidefinite programming.arXiv preprint arXiv:2505.15775, 2025

  12. [12]

    Quadratic error bound of the smoothed gap and the restarted averaged primal-dual hybrid gradient

    Olivier Fercoq. Quadratic error bound of the smoothed gap and the restarted averaged primal-dual hybrid gradient. arXiv preprint arXiv:2206.03041, 2022

  13. [13]

    Cardinal optimizer (copt) user guide.arXiv preprint arXiv:2208.14314, 2022

    Dongdong Ge, Qi Huangfu, Zizhuo Wang, Jian Wu, and Yinyu Ye. Cardinal optimizer (copt) user guide.arXiv preprint arXiv:2208.14314, 2022

  14. [14]

    A low-rank ADMM splitting approach for semidefinite programming.arXiv preprint arXiv:2403.09133, 2024

    Qiushi Han, Chenxi Li, Zhenwei Lin, Caihua Chen, Qi Deng, Dongdong Ge, Huikang Liu, and Yinyu Ye. A low-rank admm splitting approach for semidefinite programming.arXiv preprint arXiv:2403.09133, 2024

  15. [15]

    Accelerating low-rank factorization-based semidefinite programming algorithms on gpu.arXiv preprint arXiv:2407.15049, 2024

    Qiushi Han, Zhenwei Lin, Hanwen Liu, Caihua Chen, Qi Deng, Dongdong Ge, and Yinyu Ye. Accelerating low-rank factorization-based semidefinite programming algorithms on gpu.arXiv preprint arXiv:2407.15049, 2024

  16. [16]

    Restarted primal-dual hybrid conjugate gradient method for large-scale quadratic programming.arXiv preprint arXiv:2405.16160, 2024

    Yicheng Huang, Wanyu Zhang, Hongpei Li, Dongdong Ge, Huikang Liu, and Yinyu Ye. Restarted primal-dual hybrid conjugate gradient method for large-scale quadratic programming.arXiv preprint arXiv:2405.16160, 2024

  17. [17]

    A practical GPU-enhanced matrix- free primal-dual method for large-scale conic programs, 2025

    Zhenwei Lin, Zikai Xiong, Dongdong Ge, and Yinyu Ye. Pdcs: A primal-dual large-scale conic programming solver with gpu enhancements.arXiv preprint arXiv:2505.00311, 2025. 24 On the Convergence of Doubly Stochastic PDHGA PREPRINT

  18. [18]

    Pdhcg: A scalable first-order method for large-scale competitive market equilibrium computation, 2025

    Huikang Liu, Yicheng Huang, Hongpei Li, Dongdong Ge, and Yinyu Ye. Pdhcg: A scalable first-order method for large-scale competitive market equilibrium computation, 2025

  19. [19]

    Haihao Lu and Jinwen Yang. cupdlp. jl: A gpu implementation of restarted primal-dual hybrid gradient for linear programming in julia.arXiv preprint arXiv:2311.12180, 2023

  20. [20]

    arXiv preprint arXiv:2407.19689

    Haihao Lu and Jinwen Yang. Pdot: A practical primal-dual algorithm and a gpu-based solver for optimal transport. arXiv preprint arXiv:2407.19689, 2024

  21. [21]

    Restarted Halpern PDHG for linear programming

    Haihao Lu and Jinwen Yang. Restarted halpern pdhg for linear programming.arXiv preprint arXiv:2407.16144, 2024

  22. [22]

    A practical and optimal first-order method for large-scale convex quadratic programming.Mathematical Programming, pages 1–38, 2025

    Haihao Lu and Jinwen Yang. A practical and optimal first-order method for large-scale convex quadratic programming.Mathematical Programming, pages 1–38, 2025

  23. [23]

    Linear convergence of a unified primal–dual algorithm for convex–concave saddle point problems with quadratic growth

    Cody Melcher, Afrooz Jalilzadeh, and Erfan Yazdandoost Hamedani. Linear convergence of a unified primal–dual algorithm for convex–concave saddle point problems with quadratic growth. 2025

  24. [24]

    Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem

    Adams Wei Yu, Qihang Lin, and Tianbao Yang. Doubly stochastic primal-dual coordinate method for bilinear saddle-point problem.arXiv preprint arXiv:1508.03390, 2015

  25. [25]

    Solving large multicommodity network flow problems on gpus.arXiv preprint arXiv:2501.17996, 2025

    Fangzhao Zhang and Stephen Boyd. Solving large multicommodity network flow problems on gpus.arXiv preprint arXiv:2501.17996, 2025

  26. [26]

    Hot: An efficient halpern accelerating algorithm for optimal transport problems.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Guojun Zhang, Zhexuan Gu, Yancheng Yuan, and Defeng Sun. Hot: An efficient halpern accelerating algorithm for optimal transport problems.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 25