Revisiting Decentralized Online Convex Optimization with Compressed Communication

Chang Yao; Hao Zhou; Mingli Song; Xiaoyu Wang; Yuanyu Wan

arxiv: 2607.01665 · v1 · pith:MFU2E6UVnew · submitted 2026-07-02 · 💻 cs.LG

Revisiting Decentralized Online Convex Optimization with Compressed Communication

Hao Zhou , Xiaoyu Wang , Chang Yao , Mingli Song , Yuanyu Wan This is my paper

Pith reviewed 2026-07-03 18:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords decentralized online convex optimizationcompressed communicationfollow-the-regularized-leaderonline gradient descentregret boundsaverage consensusbandit feedbackfull-information setting

0 comments

The pith

FTRL-type algorithms for decentralized online convex optimization with compressed communication match or improve upon previous OGD bounds while offering simpler design and analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two new algorithms based on follow-the-regularized-leader for decentralized online convex optimization when communication is compressed. It shows that these algorithms are more elegant than earlier online gradient descent variants because the dual update in FTRL lets compressed consensus be applied directly. In the full-information case the new method recovers the best known regret bounds. In the bandit setting it improves both regret and the amount of communication required. A reader should care because streaming data applications in distributed systems often face bandwidth limits, and better algorithms can reduce both computation and data transfer overhead.

Core claim

The central claim is that the dual update mechanism of follow-the-regularized-leader permits a direct and error-free application of average-consensus techniques under communication compression, yielding two algorithms whose regret bounds are at least as good as those of prior online-gradient-descent methods and strictly better in the bandit case.

What carries the argument

The dual update step of FTRL, which allows compressed average consensus to be incorporated without generating additional error terms in the regret analysis.

Load-bearing premise

The dual update mechanism of FTRL permits a simple application of compressed average consensus without introducing new error terms that would degrade the regret bounds.

What would settle it

A direct implementation on a test problem where the observed regret exceeds the claimed bound when compression is applied would falsify the claim that no new error terms arise.

Figures

Figures reproduced from arXiv: 2607.01665 by Chang Yao, Hao Zhou, Mingli Song, Xiaoyu Wang, Yuanyu Wan.

**Figure 2.** Figure 2: Experimental results on a small random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental results on a small random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Experimental results on a larger random graph ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Experimental results of CD-FTBL with varying communication rounds [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

read the original abstract

Decentralized online convex optimization (D-OCO) is a popular framework for distributed applications with streaming data. To tackle the communication bottleneck, previous studies have investigated D-OCO with compressed communication and proposed several algorithms that are variants of online gradient descent (OGD). However, for D-OCO with exact communication, the best existing algorithms are variants of follow-the-regularized-leader (FTRL). In this paper, for the first time, we propose two FTRL-type algorithms for D-OCO with compressed communication. Compared with OGD-type algorithms, our algorithms are more elegant in both algorithmic design and theoretical analysis. The key insight is that the dual update mechanism of FTRL allows us to make a simple application of the technique for average consensus with communication compression. More specifically, our first algorithm considers the full-information setting, and can match the existing regret bounds. Our second algorithm is designed for the bandit setting, and can significantly improve both the regret bounds and communication costs of existing algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper brings FTRL to compressed decentralized online convex optimization for the first time, offering potentially simpler analysis, though the no-leakage claim on compression errors needs close proof checking.

read the letter

The main point here is that the authors are the first to bring FTRL methods into decentralized online convex optimization when communication is compressed. They argue this gives cleaner design and theory than the OGD-based algorithms that have been used so far.

What stands out is the insight that FTRL's dual update lets them plug in existing compressed average consensus techniques directly. For the full-information version this matches prior regret bounds. For the bandit version it improves both regret and communication cost. If the analysis works as described, that's a real step forward for the subfield.

The potential weak point is exactly the one the stress-test flags: whether the compression noise stays isolated and does not create extra additive terms that grow with time or number of nodes. The abstract says the dual update makes this simple, but prior OGD work apparently ran into those terms. The paper needs to show explicitly in the proofs that no such leakage happens, especially after substituting the one-point estimator in the bandit case. Without that, the "significantly improve" claim for the bandit algorithm would not stand. The citation pattern looks standard and there is no sign of circular reasoning.

This work is aimed at people doing distributed online learning and communication-efficient methods. Someone already following the D-OCO literature would find the new template useful to build on. It is worth sending to peer review because the algorithmic shift is clear and the claims are testable against existing bounds.

I would recommend engaging with it in review, with attention paid to the regret derivations under compression.

Referee Report

1 major / 0 minor

Summary. The paper proposes two FTRL-type algorithms for decentralized online convex optimization (D-OCO) with compressed communication. The full-information algorithm is claimed to match existing regret bounds, while the bandit algorithm is claimed to significantly improve both regret bounds and communication costs compared to prior OGD-based methods. The central insight is that the dual update in FTRL permits direct use of existing compressed average-consensus techniques without introducing new error terms that degrade the regret.

Significance. If the analysis confirms that compression errors remain isolated within the consensus step and do not produce additional additive terms scaling with T or the number of nodes in the FTRL regret bound (including after substitution of the one-point estimator), the work would supply a cleaner algorithmic and analytic framework than OGD variants for this setting.

major comments (1)

[regret analysis sections for both algorithms (dual update and error propagation)] The load-bearing claim that the FTRL dual update allows application of compressed consensus 'without introducing new error terms that degrade the regret bounds' requires explicit verification. The analysis must demonstrate that the standard quantization/sparsification error (bounded by a factor of the gradient norm or diameter) appears only inside the consensus error and does not leak into the online dual variables or produce extra summations over noise that scale with T. This must be shown separately for the full-information case and after the one-point gradient estimator is substituted in the bandit case; otherwise the 'match existing bounds' and 'significantly improve' statements do not hold.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for explicit verification of error isolation in the regret analyses. We address the major comment below and are prepared to strengthen the presentation accordingly.

read point-by-point responses

Referee: [regret analysis sections for both algorithms (dual update and error propagation)] The load-bearing claim that the FTRL dual update allows application of compressed consensus 'without introducing new error terms that degrade the regret bounds' requires explicit verification. The analysis must demonstrate that the standard quantization/sparsification error (bounded by a factor of the gradient norm or diameter) appears only inside the consensus error and does not leak into the online dual variables or produce extra summations over noise that scale with T. This must be shown separately for the full-information case and after the one-point gradient estimator is substituted in the bandit case; otherwise the 'match existing bounds' and 'significantly improve' statements do not hold.

Authors: We agree that the isolation of compression error must be shown explicitly. In Section 4 (full-information algorithm), the proof of Theorem 4.1 proceeds by substituting the compressed consensus output directly into the FTRL dual update; the quantization error is absorbed into the standard consensus-error term (controlled by the diameter and the number of rounds) and does not appear in the online dual variables or produce an additional summation over T. The same structure is used in Section 5 for the bandit algorithm: after the one-point estimator is inserted, the proof of Theorem 5.1 again confines the compression noise inside the consensus step, yielding the stated improvement over prior OGD-based bounds without extra T-dependent terms. To address the referee’s request for clearer verification, we will add a short dedicated paragraph (or remark) immediately after each theorem statement that isolates the relevant error term and confirms it does not propagate beyond the consensus error. revision: partial

Circularity Check

0 steps flagged

No circularity; new FTRL algorithms and analysis are independently constructed

full rationale

The paper proposes novel FTRL-type algorithms for D-OCO with compressed communication, relying on the dual update mechanism to apply existing average-consensus techniques without introducing degrading error terms. No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the regret bounds are derived from standard FTRL analysis combined with consensus compression methods that are treated as external. The central claims rest on explicit algorithmic design and theoretical verification rather than renaming or re-deriving prior quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Assessment limited to abstract; no explicit free parameters, axioms, or invented entities stated beyond standard convexity assumptions in online convex optimization.

axioms (1)

domain assumption Loss functions are convex
Standard assumption in OCO literature referenced in abstract.

pith-pipeline@v0.9.1-grok · 5707 in / 960 out tokens · 25902 ms · 2026-07-03T18:00:47.013914+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Sparse communication for distributed gradient descent

Alham Fikri Aji and Kenneth Heafield. Sparse communication for distributed gradient descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 440--445, 2017

work page 2017
[2]

QSGD : Communication-efficient sgd via gradient quantization and encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD : Communication-efficient sgd via gradient quantization and encoding. In Advances in neural information processing systems 30, page 1709–1720, 2017

work page 2017
[3]

The convergence of sparsified gradient methods

Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, and C \'e dric Renggli. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems 31, pages 5977--5987, 2018

work page 2018
[4]

Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations

Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations. In Advances in Neural Information Processing Systems 32, pages 14695--14706, 2019

work page 2019
[5]

signSGD : Compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar. signSGD : Compressed optimisation for non-convex problems. In International conference on machine learning, pages 560--569, 2018

work page 2018
[6]

On biased compression for distributed learning

Aleksandr Beznosikov, Samuel Horvath, Peter Richt \'a rik, and Mher Safaryan. On biased compression for distributed learning. J. Mach. Learn. Res., 24: 0 276:1--276:50, 2020

work page 2020
[7]

Decentralized online convex optimization with compressed communications

Xuanyu Cao and Tamer Ba s ar. Decentralized online convex optimization with compressed communications. Automatica, 156: 0 111186, 2023

work page 2023
[8]

LIBSVM : A library for support vector machines

Chih-Chung Chang and Chih-Jen Lin. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2 0 (27): 0 1--27, 2011

work page 2011
[9]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: 0 2121--2159, 2011

work page 2011
[10]

Flaxman, Adam Tauman Kalai, and H

Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385--394, 2005

work page 2005
[11]

A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization

Dan Garber and Elad Hazan. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. SIAM Journal on Optimization, 26 0 (3): 0 1493--1528, 2016

work page 2016
[12]

Exploring network structure, dynamics, and function using networkx

Aric Hagberg, Pieter J Swart, and Daniel A Schult. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States), 2008

work page 2008
[13]

Introduction to online convex optimization

Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2 0 (3--4): 0 157--325, 2016

work page 2016
[14]

Online distributed optimization via dual averaging

Saghar Hosseini, Airlie Chapman, and Mehran Mesbahi. Online distributed optimization via dual averaging. In 52nd IEEE Conference on Decision and Control, pages 1484--1489. IEEE, 2013

work page 2013
[15]

Stich, and Martin Jaggi

Anastasia Koloskova, Sebastian U. Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication. In Proceedings of the 36th International Conference on Machine Learning, pages 3478--3487, 2019

work page 2019
[16]

Decentralized online learning with compressed communication for near-sensor data analytics

Guangxia Li, Jia Liu, Xiao Lu, Peilin Zhao, Yulong Shen, and Dusit Niyato. Decentralized online learning with compressed communication for near-sensor data analytics. IEEE Communications Letters, 25 0 (9): 0 2958--2962, 2021

work page 2021
[17]

Deep Gradient Compression : Reducing the communication bandwidth for distributed training

Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep Gradient Compression : Reducing the communication bandwidth for distributed training. In International Conference on Learning Representations, 2018

work page 2018
[18]

Stephen Morse

Ji Liu and A. Stephen Morse. Accelerated linear iterations for distributed averaging. Annual Reviews in Control, 35 0 (2): 0 160--165, 2011

work page 2011
[19]

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Proceedings of the 15th Annual Conference of the International Speech Communication Association, pages 1058--1062, 2014

work page 2014
[20]

A primal-dual perspective of online learning algorithm

Shai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithm. Machine Learning, 69 0 (2--3): 0 115--142, 2007

work page 2007
[21]

Sparsified SGD with memory

Sebastian U Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsified SGD with memory. In Advances in Neural Information Processing Systems 31, pages 4452--4463, 2018

work page 2018
[22]

Communication compression for decentralized training

Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, and Ji Liu. Communication compression for decentralized training. Advances in Neural Information Processing Systems 31, pages 7663--7673, 2018

work page 2018
[23]

Distributed online convex optimization with compressed communication

Zhipeng Tu, Xi Wang, Yiguang Hong, Lei Wang, Deming Yuan, and Guodong Shi. Distributed online convex optimization with compressed communication. Advances in Neural Information Processing Systems 35, pages 34492--34504, 2022

work page 2022
[24]

Projection-free distributed online convex optimization with O ( T ) communication complexity

Yuanyu Wan, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online convex optimization with O ( T ) communication complexity. In Proceedings of the 37th International Conference on Machine Learning, pages 9818--9828, 2020

work page 2020
[25]

Projection-free distributed online learning with sublinear communication complexity

Yuanyu Wan, Guanghui Wang, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online learning with sublinear communication complexity. Journal of Machine Learning Research, 23 0 (172): 0 1--53, 2022

work page 2022
[26]

Nearly optimal regret for decentralized online convex optimization

Yuanyu Wan, Tong Wei, Mingli Song, and Lijun Zhang. Nearly optimal regret for decentralized online convex optimization. In Proceedings of the 37th Annual Conference on Learning Theory, pages 4862--4888, 2024

work page 2024
[27]

Optimal and efficient algorithms for decentralized online convex optimization

Yuanyu Wan, Tong Wei, Bo Xue, Mingli Song, and Lijun Zhang. Optimal and efficient algorithms for decentralized online convex optimization. Journal of Machine Learning Research, 26 0 (135): 0 1--43, 2025

work page 2025
[28]

Atomo: Communication-efficient learning via atomic sparsification

Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. Atomo: Communication-efficient learning via atomic sparsification. Advances in neural information processing systems 31, pages 9872--9883, 2018

work page 2018
[29]

Revisiting differentially private algorithms for decentralized online learning

Xiaoyu Wang, Wenhao Yang, Chang Yao, Mingli Song, and Yuanyu Wan. Revisiting differentially private algorithms for decentralized online learning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 65213--65235, 2025

work page 2025
[30]

Distributed projection-free online learning for smooth and convex losses

Yibo Wang, Yuanyu Wan, Shimao Zhang, and Lijun Zhang. Distributed projection-free online learning for smooth and convex losses. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pages 10226--10234, 2023

work page 2023
[31]

Gradient sparsification for communication-efficient distributed optimization

Jiawei Wangni, Jun Wang, Ji Liu, and Tong Zhang. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems, volume 31, 2018

work page 2018
[32]

Terngrad: Ternary gradients to reduce communication in distributed deep learning

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Terngrad: Ternary gradients to reduce communication in distributed deep learning. Advances in neural information processing systems, 30, 2017

work page 2017
[33]

Dual averaging method for regularized stochastic learning and online optimization

Lin Xiao. Dual averaging method for regularized stochastic learning and online optimization. In Advances in Neural Information Processing Systems 22, pages 2116--2124, 2009

work page 2009
[34]

Fast linear iterations for distributed averaging

Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging. Systems and Control Letters, 53 0 (1): 0 65--78, 2004

work page 2004
[35]

Vishwanathan, and Yuan Qi

Feng Yan, Shreyas Sundaram, S.V.N. Vishwanathan, and Yuan Qi. Distributed autonomous online learning: Regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering, 25 0 (11): 0 2483--2493, 2013

work page 2013
[36]

Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds

Sifan Yang, Wenhao Yang, Wei Jiang, and Lijun Zhang. Distributed online convex optimization with efficient communication: Improved algorithm and lower bounds. arXiv preprint arXiv:2601.04907, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Wenpeng Zhang, Peilin Zhao, Wenwu Zhu, Steven C. H. Hoi, and Tong Zhang. Projection-free distributed online learning in networks. In Proceedings of the 34th International Conference on Machine Learning, pages 4054--4062, 2017

work page 2017
[38]

Online convex programming and generalized infinitesimal gradient ascent

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, pages 928--936, 2003

work page 2003

[1] [1]

Sparse communication for distributed gradient descent

Alham Fikri Aji and Kenneth Heafield. Sparse communication for distributed gradient descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 440--445, 2017

work page 2017

[2] [2]

QSGD : Communication-efficient sgd via gradient quantization and encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD : Communication-efficient sgd via gradient quantization and encoding. In Advances in neural information processing systems 30, page 1709–1720, 2017

work page 2017

[3] [3]

The convergence of sparsified gradient methods

Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, and C \'e dric Renggli. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems 31, pages 5977--5987, 2018

work page 2018

[4] [4]

Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations

Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. Qsparse-local- SGD : Distributed SGD with quantization, sparsification and local computations. In Advances in Neural Information Processing Systems 32, pages 14695--14706, 2019

work page 2019

[5] [5]

signSGD : Compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Animashree Anandkumar. signSGD : Compressed optimisation for non-convex problems. In International conference on machine learning, pages 560--569, 2018

work page 2018

[6] [6]

On biased compression for distributed learning

Aleksandr Beznosikov, Samuel Horvath, Peter Richt \'a rik, and Mher Safaryan. On biased compression for distributed learning. J. Mach. Learn. Res., 24: 0 276:1--276:50, 2020

work page 2020

[7] [7]

Decentralized online convex optimization with compressed communications

Xuanyu Cao and Tamer Ba s ar. Decentralized online convex optimization with compressed communications. Automatica, 156: 0 111186, 2023

work page 2023

[8] [8]

LIBSVM : A library for support vector machines

Chih-Chung Chang and Chih-Jen Lin. LIBSVM : A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2 0 (27): 0 1--27, 2011

work page 2011

[9] [9]

Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12: 0 2121--2159, 2011

work page 2011

[10] [10]

Flaxman, Adam Tauman Kalai, and H

Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385--394, 2005

work page 2005

[11] [11]

A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization

Dan Garber and Elad Hazan. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. SIAM Journal on Optimization, 26 0 (3): 0 1493--1528, 2016

work page 2016

[12] [12]

Exploring network structure, dynamics, and function using networkx

Aric Hagberg, Pieter J Swart, and Daniel A Schult. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States), 2008

work page 2008

[13] [13]

Introduction to online convex optimization

Elad Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2 0 (3--4): 0 157--325, 2016

work page 2016

[14] [14]

Online distributed optimization via dual averaging

Saghar Hosseini, Airlie Chapman, and Mehran Mesbahi. Online distributed optimization via dual averaging. In 52nd IEEE Conference on Decision and Control, pages 1484--1489. IEEE, 2013

work page 2013

[15] [15]

Stich, and Martin Jaggi

Anastasia Koloskova, Sebastian U. Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication. In Proceedings of the 36th International Conference on Machine Learning, pages 3478--3487, 2019

work page 2019

[16] [16]

Decentralized online learning with compressed communication for near-sensor data analytics

Guangxia Li, Jia Liu, Xiao Lu, Peilin Zhao, Yulong Shen, and Dusit Niyato. Decentralized online learning with compressed communication for near-sensor data analytics. IEEE Communications Letters, 25 0 (9): 0 2958--2962, 2021

work page 2021

[17] [17]

Deep Gradient Compression : Reducing the communication bandwidth for distributed training

Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep Gradient Compression : Reducing the communication bandwidth for distributed training. In International Conference on Learning Representations, 2018

work page 2018

[18] [18]

Stephen Morse

Ji Liu and A. Stephen Morse. Accelerated linear iterations for distributed averaging. Annual Reviews in Control, 35 0 (2): 0 160--165, 2011

work page 2011

[19] [19]

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Proceedings of the 15th Annual Conference of the International Speech Communication Association, pages 1058--1062, 2014

work page 2014

[20] [20]

A primal-dual perspective of online learning algorithm

Shai Shalev-Shwartz and Yoram Singer. A primal-dual perspective of online learning algorithm. Machine Learning, 69 0 (2--3): 0 115--142, 2007

work page 2007

[21] [21]

Sparsified SGD with memory

Sebastian U Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsified SGD with memory. In Advances in Neural Information Processing Systems 31, pages 4452--4463, 2018

work page 2018

[22] [22]

Communication compression for decentralized training

Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, and Ji Liu. Communication compression for decentralized training. Advances in Neural Information Processing Systems 31, pages 7663--7673, 2018

work page 2018

[23] [23]

Distributed online convex optimization with compressed communication

Zhipeng Tu, Xi Wang, Yiguang Hong, Lei Wang, Deming Yuan, and Guodong Shi. Distributed online convex optimization with compressed communication. Advances in Neural Information Processing Systems 35, pages 34492--34504, 2022

work page 2022

[24] [24]

Projection-free distributed online convex optimization with O ( T ) communication complexity

Yuanyu Wan, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online convex optimization with O ( T ) communication complexity. In Proceedings of the 37th International Conference on Machine Learning, pages 9818--9828, 2020

work page 2020

[25] [25]

Projection-free distributed online learning with sublinear communication complexity

Yuanyu Wan, Guanghui Wang, Wei-Wei Tu, and Lijun Zhang. Projection-free distributed online learning with sublinear communication complexity. Journal of Machine Learning Research, 23 0 (172): 0 1--53, 2022

work page 2022

[26] [26]

Nearly optimal regret for decentralized online convex optimization

Yuanyu Wan, Tong Wei, Mingli Song, and Lijun Zhang. Nearly optimal regret for decentralized online convex optimization. In Proceedings of the 37th Annual Conference on Learning Theory, pages 4862--4888, 2024

work page 2024

[27] [27]

Optimal and efficient algorithms for decentralized online convex optimization

Yuanyu Wan, Tong Wei, Bo Xue, Mingli Song, and Lijun Zhang. Optimal and efficient algorithms for decentralized online convex optimization. Journal of Machine Learning Research, 26 0 (135): 0 1--43, 2025

work page 2025

[28] [28]

Atomo: Communication-efficient learning via atomic sparsification

Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. Atomo: Communication-efficient learning via atomic sparsification. Advances in neural information processing systems 31, pages 9872--9883, 2018

work page 2018

[29] [29]

Revisiting differentially private algorithms for decentralized online learning

Xiaoyu Wang, Wenhao Yang, Chang Yao, Mingli Song, and Yuanyu Wan. Revisiting differentially private algorithms for decentralized online learning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 65213--65235, 2025

work page 2025

[30] [30]

Distributed projection-free online learning for smooth and convex losses

Yibo Wang, Yuanyu Wan, Shimao Zhang, and Lijun Zhang. Distributed projection-free online learning for smooth and convex losses. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pages 10226--10234, 2023

work page 2023

[31] [31]

Gradient sparsification for communication-efficient distributed optimization

Jiawei Wangni, Jun Wang, Ji Liu, and Tong Zhang. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems, volume 31, 2018

work page 2018

[32] [32]

Terngrad: Ternary gradients to reduce communication in distributed deep learning

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Terngrad: Ternary gradients to reduce communication in distributed deep learning. Advances in neural information processing systems, 30, 2017

work page 2017

[33] [33]

Dual averaging method for regularized stochastic learning and online optimization

Lin Xiao. Dual averaging method for regularized stochastic learning and online optimization. In Advances in Neural Information Processing Systems 22, pages 2116--2124, 2009

work page 2009

[34] [34]

Fast linear iterations for distributed averaging

Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging. Systems and Control Letters, 53 0 (1): 0 65--78, 2004

work page 2004

[35] [35]

Vishwanathan, and Yuan Qi

Feng Yan, Shreyas Sundaram, S.V.N. Vishwanathan, and Yuan Qi. Distributed autonomous online learning: Regrets and intrinsic privacy-preserving properties. IEEE Transactions on Knowledge and Data Engineering, 25 0 (11): 0 2483--2493, 2013

work page 2013

[36] [36]

Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds

Sifan Yang, Wenhao Yang, Wei Jiang, and Lijun Zhang. Distributed online convex optimization with efficient communication: Improved algorithm and lower bounds. arXiv preprint arXiv:2601.04907, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[37] [37]

Wenpeng Zhang, Peilin Zhao, Wenwu Zhu, Steven C. H. Hoi, and Tong Zhang. Projection-free distributed online learning in networks. In Proceedings of the 34th International Conference on Machine Learning, pages 4054--4062, 2017

work page 2017

[38] [38]

Online convex programming and generalized infinitesimal gradient ascent

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, pages 928--936, 2003

work page 2003