pith. sign in

arxiv: 2607.01665 · v1 · pith:MFU2E6UVnew · submitted 2026-07-02 · 💻 cs.LG

Revisiting Decentralized Online Convex Optimization with Compressed Communication

Pith reviewed 2026-07-03 18:00 UTC · model grok-4.3

classification 💻 cs.LG
keywords decentralized online convex optimizationcompressed communicationfollow-the-regularized-leaderonline gradient descentregret boundsaverage consensusbandit feedbackfull-information setting
0
0 comments X

The pith

FTRL-type algorithms for decentralized online convex optimization with compressed communication match or improve upon previous OGD bounds while offering simpler design and analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two new algorithms based on follow-the-regularized-leader for decentralized online convex optimization when communication is compressed. It shows that these algorithms are more elegant than earlier online gradient descent variants because the dual update in FTRL lets compressed consensus be applied directly. In the full-information case the new method recovers the best known regret bounds. In the bandit setting it improves both regret and the amount of communication required. A reader should care because streaming data applications in distributed systems often face bandwidth limits, and better algorithms can reduce both computation and data transfer overhead.

Core claim

The central claim is that the dual update mechanism of follow-the-regularized-leader permits a direct and error-free application of average-consensus techniques under communication compression, yielding two algorithms whose regret bounds are at least as good as those of prior online-gradient-descent methods and strictly better in the bandit case.

What carries the argument

The dual update step of FTRL, which allows compressed average consensus to be incorporated without generating additional error terms in the regret analysis.

Load-bearing premise

The dual update mechanism of FTRL permits a simple application of compressed average consensus without introducing new error terms that would degrade the regret bounds.

What would settle it

A direct implementation on a test problem where the observed regret exceeds the claimed bound when compression is applied would falsify the claim that no new error terms arise.

Figures

Figures reproduced from arXiv: 2607.01665 by Chang Yao, Hao Zhou, Mingli Song, Xiaoyu Wang, Yuanyu Wan.

Figure 1
Figure 1. Figure 1: Experimental results on a small random graph ( [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental results on a small random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental results on a small random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
read the original abstract

Decentralized online convex optimization (D-OCO) is a popular framework for distributed applications with streaming data. To tackle the communication bottleneck, previous studies have investigated D-OCO with compressed communication and proposed several algorithms that are variants of online gradient descent (OGD). However, for D-OCO with exact communication, the best existing algorithms are variants of follow-the-regularized-leader (FTRL). In this paper, for the first time, we propose two FTRL-type algorithms for D-OCO with compressed communication. Compared with OGD-type algorithms, our algorithms are more elegant in both algorithmic design and theoretical analysis. The key insight is that the dual update mechanism of FTRL allows us to make a simple application of the technique for average consensus with communication compression. More specifically, our first algorithm considers the full-information setting, and can match the existing regret bounds. Our second algorithm is designed for the bandit setting, and can significantly improve both the regret bounds and communication costs of existing algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes two FTRL-type algorithms for decentralized online convex optimization (D-OCO) with compressed communication. The full-information algorithm is claimed to match existing regret bounds, while the bandit algorithm is claimed to significantly improve both regret bounds and communication costs compared to prior OGD-based methods. The central insight is that the dual update in FTRL permits direct use of existing compressed average-consensus techniques without introducing new error terms that degrade the regret.

Significance. If the analysis confirms that compression errors remain isolated within the consensus step and do not produce additional additive terms scaling with T or the number of nodes in the FTRL regret bound (including after substitution of the one-point estimator), the work would supply a cleaner algorithmic and analytic framework than OGD variants for this setting.

major comments (1)
  1. [regret analysis sections for both algorithms (dual update and error propagation)] The load-bearing claim that the FTRL dual update allows application of compressed consensus 'without introducing new error terms that degrade the regret bounds' requires explicit verification. The analysis must demonstrate that the standard quantization/sparsification error (bounded by a factor of the gradient norm or diameter) appears only inside the consensus error and does not leak into the online dual variables or produce extra summations over noise that scale with T. This must be shown separately for the full-information case and after the one-point gradient estimator is substituted in the bandit case; otherwise the 'match existing bounds' and 'significantly improve' statements do not hold.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for explicit verification of error isolation in the regret analyses. We address the major comment below and are prepared to strengthen the presentation accordingly.

read point-by-point responses
  1. Referee: [regret analysis sections for both algorithms (dual update and error propagation)] The load-bearing claim that the FTRL dual update allows application of compressed consensus 'without introducing new error terms that degrade the regret bounds' requires explicit verification. The analysis must demonstrate that the standard quantization/sparsification error (bounded by a factor of the gradient norm or diameter) appears only inside the consensus error and does not leak into the online dual variables or produce extra summations over noise that scale with T. This must be shown separately for the full-information case and after the one-point gradient estimator is substituted in the bandit case; otherwise the 'match existing bounds' and 'significantly improve' statements do not hold.

    Authors: We agree that the isolation of compression error must be shown explicitly. In Section 4 (full-information algorithm), the proof of Theorem 4.1 proceeds by substituting the compressed consensus output directly into the FTRL dual update; the quantization error is absorbed into the standard consensus-error term (controlled by the diameter and the number of rounds) and does not appear in the online dual variables or produce an additional summation over T. The same structure is used in Section 5 for the bandit algorithm: after the one-point estimator is inserted, the proof of Theorem 5.1 again confines the compression noise inside the consensus step, yielding the stated improvement over prior OGD-based bounds without extra T-dependent terms. To address the referee’s request for clearer verification, we will add a short dedicated paragraph (or remark) immediately after each theorem statement that isolates the relevant error term and confirms it does not propagate beyond the consensus error. revision: partial

Circularity Check

0 steps flagged

No circularity; new FTRL algorithms and analysis are independently constructed

full rationale

The paper proposes novel FTRL-type algorithms for D-OCO with compressed communication, relying on the dual update mechanism to apply existing average-consensus techniques without introducing degrading error terms. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the regret bounds are derived from standard FTRL analysis combined with consensus compression methods that are treated as external. The central claims rest on explicit algorithmic design and theoretical verification rather than renaming or re-deriving prior quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Assessment limited to abstract; no explicit free parameters, axioms, or invented entities stated beyond standard convexity assumptions in online convex optimization.

axioms (1)
  • domain assumption Loss functions are convex
    Standard assumption in OCO literature referenced in abstract.

pith-pipeline@v0.9.1-grok · 5707 in / 960 out tokens · 25902 ms · 2026-07-03T18:00:47.013914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    Sparse communication for distributed gradient descent

    Alham Fikri Aji and Kenneth Heafield. Sparse communication for distributed gradient descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 440--445, 2017

  2. [2]

    QSGD : Communication-efficient sgd via gradient quantization and encoding

    Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD : Communication-efficient sgd via gradient quantization and encoding. In Advances in neural information processing systems 30, page 1709–1720, 2017

  3. [3]

    The convergence of sparsified gradient methods

    Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, and C \'e dric Renggli. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems 31, pages 5977--5987, 2018

  4. [4]

    Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations

    Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations. In Advances in Neural Information Processing Systems 32, pages 14695--14706, 2019

  5. [5]

    signSGD : Compressed optimisation for non-convex problems

    Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar. signSGD : Compressed optimisation for non-convex problems. In International conference on machine learning, pages 560--569, 2018

  6. [6]

    On biased compression for distributed learning

    Aleksandr Beznosikov, Samuel Horvath, Peter Richt \'a rik, and Mher Safaryan. On biased compression for distributed learning. J. Mach. Learn. Res., 24: 0 276:1--276:50, 2020

  7. [7]

    Decentralized online convex optimization with compressed communications

    Xuanyu Cao and Tamer Ba s ar. Decentralized online convex optimization with compressed communications. Automatica, 156: 0 111186, 2023

  8. [8]

    LIBSVM : A library for support vector machines

    Chih-Chung Chang and Chih-Jen Lin. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2 0 (27): 0 1--27, 2011

  9. [9]

    Adaptive subgradient methods for online learning and stochastic optimization

    John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: 0 2121--2159, 2011

  10. [10]

    Flaxman, Adam Tauman Kalai, and H

    Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385--394, 2005

  11. [11]

    A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization

    Dan Garber and Elad Hazan. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. SIAM Journal on Optimization, 26 0 (3): 0 1493--1528, 2016

  12. [12]

    Exploring network structure, dynamics, and function using networkx

    Aric Hagberg, Pieter J Swart, and Daniel A Schult. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States), 2008

  13. [13]

    Introduction to online convex optimization

    Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2 0 (3--4): 0 157--325, 2016

  14. [14]

    Online distributed optimization via dual averaging

    Saghar Hosseini, Airlie Chapman, and Mehran Mesbahi. Online distributed optimization via dual averaging. In 52nd IEEE Conference on Decision and Control, pages 1484--1489. IEEE, 2013

  15. [15]

    Stich, and Martin Jaggi

    Anastasia Koloskova, Sebastian U. Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication. In Proceedings of the 36th International Conference on Machine Learning, pages 3478--3487, 2019

  16. [16]

    Decentralized online learning with compressed communication for near-sensor data analytics

    Guangxia Li, Jia Liu, Xiao Lu, Peilin Zhao, Yulong Shen, and Dusit Niyato. Decentralized online learning with compressed communication for near-sensor data analytics. IEEE Communications Letters, 25 0 (9): 0 2958--2962, 2021

  17. [17]

    Deep Gradient Compression : Reducing the communication bandwidth for distributed training

    Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep Gradient Compression : Reducing the communication bandwidth for distributed training. In International Conference on Learning Representations, 2018

  18. [18]

    Stephen Morse

    Ji Liu and A. Stephen Morse. Accelerated linear iterations for distributed averaging. Annual Reviews in Control, 35 0 (2): 0 160--165, 2011

  19. [19]

    1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns

    Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Proceedings of the 15th Annual Conference of the International Speech Communication Association, pages 1058--1062, 2014

  20. [20]

    A primal-dual perspective of online learning algorithm

    Shai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithm. Machine Learning, 69 0 (2--3): 0 115--142, 2007

  21. [21]

    Sparsified SGD with memory

    Sebastian U Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsified SGD with memory. In Advances in Neural Information Processing Systems 31, pages 4452--4463, 2018

  22. [22]

    Communication compression for decentralized training

    Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, and Ji Liu. Communication compression for decentralized training. Advances in Neural Information Processing Systems 31, pages 7663--7673, 2018

  23. [23]

    Distributed online convex optimization with compressed communication

    Zhipeng Tu, Xi Wang, Yiguang Hong, Lei Wang, Deming Yuan, and Guodong Shi. Distributed online convex optimization with compressed communication. Advances in Neural Information Processing Systems 35, pages 34492--34504, 2022

  24. [24]

    Projection-free distributed online convex optimization with O ( T ) communication complexity

    Yuanyu Wan, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online convex optimization with O ( T ) communication complexity. In Proceedings of the 37th International Conference on Machine Learning, pages 9818--9828, 2020

  25. [25]

    Projection-free distributed online learning with sublinear communication complexity

    Yuanyu Wan, Guanghui Wang, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online learning with sublinear communication complexity. Journal of Machine Learning Research, 23 0 (172): 0 1--53, 2022

  26. [26]

    Nearly optimal regret for decentralized online convex optimization

    Yuanyu Wan, Tong Wei, Mingli Song, and Lijun Zhang. Nearly optimal regret for decentralized online convex optimization. In Proceedings of the 37th Annual Conference on Learning Theory, pages 4862--4888, 2024

  27. [27]

    Optimal and efficient algorithms for decentralized online convex optimization

    Yuanyu Wan, Tong Wei, Bo Xue, Mingli Song, and Lijun Zhang. Optimal and efficient algorithms for decentralized online convex optimization. Journal of Machine Learning Research, 26 0 (135): 0 1--43, 2025

  28. [28]

    Atomo: Communication-efficient learning via atomic sparsification

    Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. Atomo: Communication-efficient learning via atomic sparsification. Advances in neural information processing systems 31, pages 9872--9883, 2018

  29. [29]

    Revisiting differentially private algorithms for decentralized online learning

    Xiaoyu Wang, Wenhao Yang, Chang Yao, Mingli Song, and Yuanyu Wan. Revisiting differentially private algorithms for decentralized online learning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 65213--65235, 2025

  30. [30]

    Distributed projection-free online learning for smooth and convex losses

    Yibo Wang, Yuanyu Wan, Shimao Zhang, and Lijun Zhang. Distributed projection-free online learning for smooth and convex losses. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pages 10226--10234, 2023

  31. [31]

    Gradient sparsification for communication-efficient distributed optimization

    Jiawei Wangni, Jun Wang, Ji Liu, and Tong Zhang. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems, volume 31, 2018

  32. [32]

    Terngrad: Ternary gradients to reduce communication in distributed deep learning

    Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Terngrad: Ternary gradients to reduce communication in distributed deep learning. Advances in neural information processing systems, 30, 2017

  33. [33]

    Dual averaging method for regularized stochastic learning and online optimization

    Lin Xiao. Dual averaging method for regularized stochastic learning and online optimization. In Advances in Neural Information Processing Systems 22, pages 2116--2124, 2009

  34. [34]

    Fast linear iterations for distributed averaging

    Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging. Systems and Control Letters, 53 0 (1): 0 65--78, 2004

  35. [35]

    Vishwanathan, and Yuan Qi

    Feng Yan, Shreyas Sundaram, S.V.N. Vishwanathan, and Yuan Qi. Distributed autonomous online learning: Regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering, 25 0 (11): 0 2483--2493, 2013

  36. [36]

    Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds

    Sifan Yang, Wenhao Yang, Wei Jiang, and Lijun Zhang. Distributed online convex optimization with efficient communication: Improved algorithm and lower bounds. arXiv preprint arXiv:2601.04907, 2026

  37. [37]

    Wenpeng Zhang, Peilin Zhao, Wenwu Zhu, Steven C. H. Hoi, and Tong Zhang. Projection-free distributed online learning in networks. In Proceedings of the 34th International Conference on Machine Learning, pages 4054--4062, 2017

  38. [38]

    Online convex programming and generalized infinitesimal gradient ascent

    Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, pages 928--936, 2003