Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

Abhinav Raghuvanshi; Kushal Chakrabarti; Mayank Baranwal; Susmit Sarkar

arxiv: 2606.11339 · v1 · pith:HPJBCQG7new · submitted 2026-06-09 · 🧮 math.OC · cs.AI· cs.LG· cs.SY· eess.SY· stat.ML

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

Susmit Sarkar , Abhinav Raghuvanshi , Kushal Chakrabarti , Mayank Baranwal This is my paper

Pith reviewed 2026-06-27 12:11 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LGcs.SYeess.SYstat.ML

keywords distributed optimizationstochastic optimizationquantized communicationprimal-dual methodsrestricted secant inequalityPolyak-Lojasiewicz inequalityconvergence analysis

0 comments

The pith

A quantized stochastic primal-dual method converges linearly to an explicit neighborhood under relaxed global geometry conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces q-PDGD for distributed optimization using stochastic gradients and random quantization for communication. Under the restricted secant inequality, constant step sizes produce linear convergence to a neighborhood sized by noise, quantization distortion, and network connectivity, while diminishing steps achieve O(1/k) rates without requiring a shared minimizer. The same setting yields linear-to-neighborhood convergence when the objective satisfies the Polyak-Lojasiewicz inequality. The rates match the best centralized stochastic oracle complexities.

Core claim

q-PDGD achieves linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity under RSI with constant step-size, O(1/k) convergence with diminishing step-size without shared-minimizer assumptions, and linear-to-neighborhood convergence under PL in the stochastic quantized distributed setting.

What carries the argument

The q-PDGD algorithm using primal-dual updates combined with random unbiased quantization, analyzed via RSI and PL inequalities as relaxed global geometry.

Load-bearing premise

The objective satisfies either the restricted secant inequality or the Polyak-Lojasiewicz inequality.

What would settle it

Observing that the convergence rate does not become linear when RSI or PL holds, or that the neighborhood radius does not scale with the quantization distortion as predicted.

Figures

Figures reproduced from arXiv: 2606.11339 by Abhinav Raghuvanshi, Kushal Chakrabarti, Mayank Baranwal, Susmit Sarkar.

**Figure 1.** Figure 1: (Example 2, top) and (Example 1, bottom). Evolution of the consensus error and objective gap for q-PDGD under constant and diminishing step-sizes over a ring graph with n = 10 agents. Constant step-sizes converge linearly before plateauing at a neighborhood, while diminishing step-size achieves consensus and optimization [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 3.** Figure 3: Quantized (Q) versus unquantized (UQ) trajecto [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Asymptotic objective gap of q-PDGD for Example 1 versus (left) quantization step squared ∆2 and (right) gradient-noise variance σ 2 , for multiple network sizes n. The linear trends corroborate the O(A∆2 + Bσ2 ) steady-state bound from Theorem 1 and the network scaling discussed in Remark 2. D EMPIRICAL EVALUATION ON DISTRIBUTED DEEP LEARNING The theoretical results in Section A are stated under RSI (Assum… view at source ↗

**Figure 5.** Figure 5: Iterations to reach a target objective threshold: [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: MNIST, 100 agents, ring topology, 4-bit stochastic quantizer. (a) Mean per-agent test accuracy. (b) Consensus error 1 n P i ∥xi − x¯∥ (log scale): q-PDGD is roughly an order of magnitude lower than the baselines. (c) Mean training loss. (d) Test accuracy of the averaged (deployed) model x¯ = 1 n P i xi [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: MNIST: dispersion across the 100 agents. (a) Mean accuracy with shaded min/max envelope. (b) Histogram of final per-agent test accuracies. q-PDGD produces the right-most and most concentrated distribution [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Fashion-MNIST, 100 agents, ring topology, 4-bit stochastic quantizer. Same panels as [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: Fashion-MNIST: dispersion across the 100 agents. Final per-agent accuracies: q-PDGD µ=0.819, σ=0.0158; CHOCO-SGD µ=0.814, σ=0.0134; q-DCSG µ=0.796, σ=0.0127; q-DGD µ=0.749, σ=0.0205 [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

read the original abstract

We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI), a constant step-size yields linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity, while a diminishing step-size achieves O(1/k) convergence without shared-minimizer assumptions. Under Polyak-Lojasiewicz (PL) inequality, we obtain linear-to-neighborhood convergence in the same stochastic quantized setting. Our results match the best-known centralized stochastic rates in oracle complexity, and are supported by experiments demonstrating the predicted tradeoffs between quantization level, step-size choice, and graph structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

q-PDGD gives linear-to-neighborhood rates under RSI and PL in the quantized stochastic distributed setting and recovers centralized oracle complexity, but the proof details on quantization error bounds need checking.

read the letter

The one thing to know is that this paper derives convergence guarantees for their q-PDGD algorithm under restricted secant inequality and Polyak-Lojasiewicz assumptions, with explicit neighborhoods that fold in gradient noise, quantization distortion, and network connectivity. Constant steps give linear contraction to that neighborhood; diminishing steps give O(1/k) without requiring a shared minimizer.

What is new is the specific combination of primal-dual updates with unbiased random quantization in the stochastic distributed case, plus the removal of the shared-minimizer condition for the diminishing-step result. The paper states the rates clearly and notes that they match the best centralized stochastic rates in oracle complexity.

It does well in making the neighborhood radius explicit with all three error sources and in running experiments that illustrate the predicted effects of quantization bits, step-size choice, and graph structure. That empirical check is useful even if the numbers are illustrative.

The soft spots sit in the soundness of the derivations. The abstract claims the bounds, but the actual error-bound calculations that combine the primal-dual recursion with the quantization model are not visible here, so it is hard to tell whether the constants are reasonable or whether some cross terms were dropped. The assumptions themselves are standard, so the work rests on careful but conventional analysis rather than a breakthrough in geometry.

This paper is for researchers who work on distributed optimization with communication constraints. Someone already following primal-dual or quantized methods would get concrete rates and an algorithm to test. It deserves a serious referee because the claims are specific enough to verify and the setting is practically relevant. I would recommend sending it to peer review, with reviewers asked to focus on the full proof steps and the quantization bias handling.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes q-PDGD, a quantized stochastic primal-dual gradient method for distributed optimization under stochastic gradients and unbiased random quantization for finite-bit communication. Under the restricted secant inequality (RSI), constant step-sizes yield linear contraction to an explicit neighborhood whose radius depends on gradient noise, quantization distortion, and network connectivity; diminishing step-sizes achieve O(1/k) convergence without requiring a shared minimizer. Under the Polyak-Łojasiewicz (PL) inequality the same setting yields linear convergence to a neighborhood. The stated rates recover the best-known centralized stochastic oracle complexities while incorporating quantization and graph effects; numerical experiments illustrate the predicted trade-offs among quantization level, step-size schedule, and graph structure.

Significance. If the supporting derivations are correct, the contribution is significant: it supplies the first explicit neighborhood radii for quantized primal-dual methods under RSI/PL geometry, removes the shared-minimizer hypothesis for the diminishing-step-size regime, and shows that the communication constraint does not degrade the centralized rate order. The explicit dependence of the neighborhood on the quantization variance and the mixing matrix is a concrete, usable result for practitioners designing bit-constrained networks.

minor comments (2)

[Abstract, §1] Abstract and §1: the phrase 'relaxed global geometry' is used before RSI and PL are formally introduced; a one-sentence parenthetical definition on first appearance would improve readability.
[Experiments] The experiments section would benefit from a table (or additional rows in an existing table) reporting the measured neighborhood radii versus the theoretically predicted radii for at least two quantization levels.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which accurately captures the main contributions of the manuscript. The recommendation for minor revision is noted; however, the report does not list any specific major comments requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under standard assumptions

full rationale

The paper derives convergence rates for q-PDGD under the standard restricted secant inequality (RSI) and Polyak-Lojasiewicz (PL) assumptions applied to the global objective, with quantization modeled as unbiased random operators whose distortion enters the neighborhood radius explicitly. These are external geometric conditions independent of the algorithm or its analysis; the stated linear-to-neighborhood and O(1/k) rates follow directly from standard Lyapunov arguments without reducing to fitted parameters, self-definitions, or load-bearing self-citations. The abstract and description indicate recovery of centralized oracle complexity as a comparison result rather than a circular premise. No step in the provided derivation chain equates a prediction to its input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on domain-standard assumptions about the objective geometry and the quantization model; no free parameters are fitted to data and no new entities are postulated.

axioms (2)

domain assumption The objective function satisfies the restricted secant inequality (RSI) or Polyak-Lojasiewicz (PL) inequality.
Invoked to obtain linear contraction rates to a neighborhood.
domain assumption Quantization is modeled as unbiased random quantization with finite bits.
Used to bound the distortion term in the convergence neighborhood.

pith-pipeline@v0.9.1-grok · 5689 in / 1328 out tokens · 50715 ms · 2026-06-27T12:11:17.960805+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 1 canonical work pages

[1]

IFAC-PapersOnLine , volume=

Exponential convergence for distributed optimization under the restricted secant inequality condition , author=. IFAC-PapersOnLine , volume=. 2020 , publisher=

2020
[2]

Advances in Neural Information Processing Systems , volume=

Information-theoretic lower bounds on the oracle complexity of convex optimization , author=. Advances in Neural Information Processing Systems , volume=
[3]

2020 59th IEEE Conference on Decision and Control (CDC) , pages=

Linear convergence for distributed optimization without strong convexity , author=. 2020 59th IEEE Conference on Decision and Control (CDC) , pages=. 2020 , organization=

2020
[4]

Dutta, Amit and Doan, Thinh T , journal=. On the. 2024 , publisher=

2024
[5]

The Journal of Machine Learning Research , volume=

Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=

2014
[6]

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Quantized consensus ADMM for multi-agent distributed optimization , author=. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2016 , organization=

2016
[7]

Advances in neural information processing systems , volume=

QSGD: Communication-efficient SGD via gradient quantization and encoding , author=. Advances in neural information processing systems , volume=
[8]

IEEE Transactions on Signal Processing , volume=

Compressed gradient tracking for decentralized optimization over general directed networks , author=. IEEE Transactions on Signal Processing , volume=. 2022 , publisher=

2022
[9]

Decentralized deep learning with arbitrary communication compression.arXiv preprint arXiv:1907.09356, 2019

Decentralized deep learning with arbitrary communication compression , author=. arXiv preprint arXiv:1907.09356 , year=

work page arXiv 1907
[10]

Optimization Methods and Software , volume=

Distributed learning with compressed gradient differences , author=. Optimization Methods and Software , volume=. 2025 , publisher=

2025
[11]

Advances in Neural Information Processing Systems , volume=

EF21: A new, simpler, theoretically better, and practically faster error feedback , author=. Advances in Neural Information Processing Systems , volume=
[12]

International conference on machine learning , pages=

signSGD: Compressed optimisation for non-convex problems , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[13]

Advances in neural information processing systems , volume=

Terngrad: Ternary gradients to reduce communication in distributed deep learning , author=. Advances in neural information processing systems , volume=
[14]

IEEE Transactions on Automatic Control , volume=

Quantized primal-dual algorithms for network optimization with linear convergence , author=. IEEE Transactions on Automatic Control , volume=. 2023 , publisher=

2023
[15]

IFAC-PapersOnLine , volume=

Distributed optimization with gradient descent and quantized communication , author=. IFAC-PapersOnLine , volume=. 2023 , publisher=

2023
[16]

Advances in neural information processing systems , volume=

Double quantization for communication-efficient distributed optimization , author=. Advances in neural information processing systems , volume=
[17]

IEEE Transactions on Automatic Control , volume=

Quantization design for distributed optimization , author=. IEEE Transactions on Automatic Control , volume=. 2016 , publisher=

2016
[18]

Automatica , volume=

Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , author=. Automatica , volume=. 2015 , publisher=

2015
[19]

SIAM Journal on Optimization , volume=

Extra: An exact first-order algorithm for decentralized consensus optimization , author=. SIAM Journal on Optimization , volume=. 2015 , publisher=

2015
[20]

Linear convergence of gradient and proximal-gradient methods under the

Karimi, Hamed and Nutini, Julie and Schmidt, Mark , booktitle=. Linear convergence of gradient and proximal-gradient methods under the. 2016 , organization=

2016
[21]

Annual Reviews in Control , volume=

A survey of distributed optimization , author=. Annual Reviews in Control , volume=. 2019 , publisher=

2019
[22]

IEEE Transactions on Information Forensics and Security , volume=

Provable privacy advantages of decentralized federated learning via distributed optimization , author=. IEEE Transactions on Information Forensics and Security , volume=. 2024 , publisher=

2024
[23]

International Conference on Artificial Intelligence and Statistics , pages=

Efficient distributed hessian free algorithm for large-scale empirical risk minimization via accumulating sample strategy , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

2020
[24]

2018 21st International Conference on Information Fusion (FUSION) , pages=

A review of forty years of distributed estimation , author=. 2018 21st International Conference on Information Fusion (FUSION) , pages=. 2018 , organization=

2018
[25]

Annual Review of Control, Robotics, and Autonomous Systems , volume=

Distributed optimization for control , author=. Annual Review of Control, Robotics, and Autonomous Systems , volume=. 2018 , publisher=

2018
[26]

SIAM Journal on Optimization , year =

Yuan, Kun and Ling, Qing and Yin, Wotao , title =. SIAM Journal on Optimization , year =
[27]

2018 IEEE Conference on Decision and Control (CDC) , pages=

Quantized decentralized consensus optimization , author=. 2018 IEEE Conference on Decision and Control (CDC) , pages=. 2018 , organization=

2018
[28]

2019 , eprint=

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , author=. 2019 , eprint=

2019

[1] [1]

IFAC-PapersOnLine , volume=

Exponential convergence for distributed optimization under the restricted secant inequality condition , author=. IFAC-PapersOnLine , volume=. 2020 , publisher=

2020

[2] [2]

Advances in Neural Information Processing Systems , volume=

Information-theoretic lower bounds on the oracle complexity of convex optimization , author=. Advances in Neural Information Processing Systems , volume=

[3] [3]

2020 59th IEEE Conference on Decision and Control (CDC) , pages=

Linear convergence for distributed optimization without strong convexity , author=. 2020 59th IEEE Conference on Decision and Control (CDC) , pages=. 2020 , organization=

2020

[4] [4]

Dutta, Amit and Doan, Thinh T , journal=. On the. 2024 , publisher=

2024

[5] [5]

The Journal of Machine Learning Research , volume=

Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization , author=. The Journal of Machine Learning Research , volume=. 2014 , publisher=

2014

[6] [6]

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Quantized consensus ADMM for multi-agent distributed optimization , author=. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2016 , organization=

2016

[7] [7]

Advances in neural information processing systems , volume=

QSGD: Communication-efficient SGD via gradient quantization and encoding , author=. Advances in neural information processing systems , volume=

[8] [8]

IEEE Transactions on Signal Processing , volume=

Compressed gradient tracking for decentralized optimization over general directed networks , author=. IEEE Transactions on Signal Processing , volume=. 2022 , publisher=

2022

[9] [9]

Decentralized deep learning with arbitrary communication compression.arXiv preprint arXiv:1907.09356, 2019

Decentralized deep learning with arbitrary communication compression , author=. arXiv preprint arXiv:1907.09356 , year=

work page arXiv 1907

[10] [10]

Optimization Methods and Software , volume=

Distributed learning with compressed gradient differences , author=. Optimization Methods and Software , volume=. 2025 , publisher=

2025

[11] [11]

Advances in Neural Information Processing Systems , volume=

EF21: A new, simpler, theoretically better, and practically faster error feedback , author=. Advances in Neural Information Processing Systems , volume=

[12] [12]

International conference on machine learning , pages=

signSGD: Compressed optimisation for non-convex problems , author=. International conference on machine learning , pages=. 2018 , organization=

2018

[13] [13]

Advances in neural information processing systems , volume=

Terngrad: Ternary gradients to reduce communication in distributed deep learning , author=. Advances in neural information processing systems , volume=

[14] [14]

IEEE Transactions on Automatic Control , volume=

Quantized primal-dual algorithms for network optimization with linear convergence , author=. IEEE Transactions on Automatic Control , volume=. 2023 , publisher=

2023

[15] [15]

IFAC-PapersOnLine , volume=

Distributed optimization with gradient descent and quantized communication , author=. IFAC-PapersOnLine , volume=. 2023 , publisher=

2023

[16] [16]

Advances in neural information processing systems , volume=

Double quantization for communication-efficient distributed optimization , author=. Advances in neural information processing systems , volume=

[17] [17]

IEEE Transactions on Automatic Control , volume=

Quantization design for distributed optimization , author=. IEEE Transactions on Automatic Control , volume=. 2016 , publisher=

2016

[18] [18]

Automatica , volume=

Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication , author=. Automatica , volume=. 2015 , publisher=

2015

[19] [19]

SIAM Journal on Optimization , volume=

Extra: An exact first-order algorithm for decentralized consensus optimization , author=. SIAM Journal on Optimization , volume=. 2015 , publisher=

2015

[20] [20]

Linear convergence of gradient and proximal-gradient methods under the

Karimi, Hamed and Nutini, Julie and Schmidt, Mark , booktitle=. Linear convergence of gradient and proximal-gradient methods under the. 2016 , organization=

2016

[21] [21]

Annual Reviews in Control , volume=

A survey of distributed optimization , author=. Annual Reviews in Control , volume=. 2019 , publisher=

2019

[22] [22]

IEEE Transactions on Information Forensics and Security , volume=

Provable privacy advantages of decentralized federated learning via distributed optimization , author=. IEEE Transactions on Information Forensics and Security , volume=. 2024 , publisher=

2024

[23] [23]

International Conference on Artificial Intelligence and Statistics , pages=

Efficient distributed hessian free algorithm for large-scale empirical risk minimization via accumulating sample strategy , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

2020

[24] [24]

2018 21st International Conference on Information Fusion (FUSION) , pages=

A review of forty years of distributed estimation , author=. 2018 21st International Conference on Information Fusion (FUSION) , pages=. 2018 , organization=

2018

[25] [25]

Annual Review of Control, Robotics, and Autonomous Systems , volume=

Distributed optimization for control , author=. Annual Review of Control, Robotics, and Autonomous Systems , volume=. 2018 , publisher=

2018

[26] [26]

SIAM Journal on Optimization , year =

Yuan, Kun and Ling, Qing and Yin, Wotao , title =. SIAM Journal on Optimization , year =

[27] [27]

2018 IEEE Conference on Decision and Control (CDC) , pages=

Quantized decentralized consensus optimization , author=. 2018 IEEE Conference on Decision and Control (CDC) , pages=. 2018 , organization=

2018

[28] [28]

2019 , eprint=

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , author=. 2019 , eprint=

2019