Communication-Efficient Distributed Learning with Differential Privacy

Andreas A. Malikopoulos; Karl H. Johansson; Nicola Bastianello; Thomas Parisini; Xiaoxing Ren; Yuwen Ma

arxiv: 2604.02558 · v1 · submitted 2026-04-02 · 💻 cs.LG · math.OC

Communication-Efficient Distributed Learning with Differential Privacy

Xiaoxing Ren , Yuwen Ma , Nicola Bastianello , Karl H. Johansson , Thomas Parisini , Andreas A. Malikopoulos This is my paper

Pith reviewed 2026-05-13 20:56 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords distributed optimizationdifferential privacycommunication efficiencynonconvex learninggradient perturbationmulti-agent systemsprivacy-preserving learning

0 comments

The pith

Local training with clipped noisy gradients converges to a near-stationary point while guaranteeing differential privacy in distributed nonconvex learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an algorithm for nonconvex optimization problems solved across a network of agents. Each agent performs multiple local training steps that include gradient clipping and additive noise before sharing model updates, which cuts communication frequency. The analysis establishes that the shared iterates converge to a stationary point of the global problem within a distance bounded by the noise level and network properties. It also derives explicit differential privacy bounds showing that individual training data cannot be recovered from the exchanged models. Tests on classification tasks show the method outperforms prior approaches under identical privacy budgets.

Core claim

The algorithm runs several local stochastic gradient steps with clipping and Gaussian noise at each agent, then performs infrequent averaging over the undirected connected network. Under standard smoothness and bounded-gradient assumptions, the iterates reach a point whose gradient norm is bounded by a term that grows with the privacy noise variance yet shrinks with more local steps per communication round, while the noise mechanism satisfies (epsilon, delta)-differential privacy for the agents' data.

What carries the argument

Local multi-step training with gradient clipping and additive noise followed by periodic network averaging.

If this is right

Communication rounds drop proportionally to the number of local steps performed between averaging operations.
The final shared model satisfies differential privacy, so individual training records cannot be inferred even if the model is published.
Stronger privacy (smaller epsilon) increases the convergence error bound by a factor linear in the added noise variance.
The same clipping-and-noise mechanism works for any nonconvex objective meeting the smoothness and bounded-gradient conditions.
Tuning the local-step count trades off communication savings against the distance to stationarity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounded-distance result could be tightened for specific network topologies such as rings or complete graphs.
Adding secure aggregation on top of the noisy updates would further reduce leakage risk without altering the convergence analysis.
The local-training structure naturally extends to asynchronous or time-varying networks if the connectivity assumption is relaxed to average connectivity over time.

Load-bearing premise

The communication network is undirected and connected, and each local loss function is smooth with bounded gradients.

What would settle it

Running the algorithm on a connected undirected network with smooth bounded-gradient losses and observing either divergence beyond the predicted distance bound or reconstruction of an agent's training data that violates the stated (epsilon, delta) privacy parameters.

Figures

Figures reproduced from arXiv: 2604.02558 by Andreas A. Malikopoulos, Karl H. Johansson, Nicola Bastianello, Thomas Parisini, Xiaoxing Ren, Yuwen Ma.

read the original abstract

We address nonconvex learning problems over undirected networks. In particular, we focus on the challenge of designing an algorithm that is both communication-efficient and that guarantees the privacy of the agents' data. The first goal is achieved through a local training approach, which reduces communication frequency. The second goal is achieved by perturbing gradients during local training, specifically through gradient clipping and additive noise. We prove that the resulting algorithm converges to a stationary point of the problem within a bounded distance. Additionally, we provide theoretical privacy guarantees within a differential privacy framework that ensure agents' training data cannot be inferred from the trained model shared over the network. We show the algorithm's superior performance on a classification task under the same privacy budget, compared with state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a concrete local-update algorithm that adds clipping and noise to hit both communication savings and differential privacy for nonconvex distributed training on undirected networks.

read the letter

This paper presents a method for nonconvex distributed learning that reduces communication through local training steps while adding gradient clipping and noise to achieve differential privacy on undirected networks. What it does well is provide a specific algorithm that interleaves local updates with these perturbed gradients, along with claimed proofs for convergence to a stationary point within a bounded distance and privacy guarantees. The experiments demonstrate superior performance compared to state-of-the-art methods on a classification task under the same privacy budget, which suggests the approach has practical merit. The main soft spot is that convergence is to a neighborhood rather than an exact stationary point, with the distance scaling according to the noise variance and other parameters. This is standard for noisy methods but the paper should clearly quantify the trade-offs with communication savings and privacy level. The assumptions of network connectivity, smoothness, and bounded gradients are typical and enable the analysis, but they do limit how general the results are. Without the full derivations, it's difficult to assess if the nonconvex handling and network effects are handled tightly. Overall, this is for researchers focused on federated or distributed optimization where privacy and efficiency both matter. A reader working on real-world deployments with bandwidth limits and data protection needs could find the concrete algorithm and guarantees useful. I think it deserves peer review to check the proofs and see if the empirical gains hold in more settings.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a local-update algorithm for nonconvex distributed optimization over undirected connected networks. Communication efficiency is obtained by performing multiple local gradient steps before sharing; differential privacy is enforced by clipping gradients and adding Gaussian noise at each local step. The central claims are that the iterates converge to a stationary point within a distance that scales with the noise level, that the algorithm satisfies (ε,δ)-differential privacy via the Gaussian mechanism and composition, and that it outperforms prior private distributed methods on a classification task under identical privacy budgets.

Significance. If the stated convergence and privacy bounds hold under the listed assumptions (undirected connected graph, L-smoothness, bounded gradients), the result is a useful incremental advance for private federated learning: it combines standard local-update communication reduction with a transparent privacy mechanism while retaining nonconvex convergence guarantees. The empirical comparison under fixed privacy budget provides supporting evidence. No machine-checked proofs or fully parameter-free derivations are present, but the analysis follows the usual template for noisy local SGD.

major comments (2)

[§4, Theorem 1] §4, Theorem 1: the stated bound on distance to stationarity is O(√(noise variance) + 1/√T); because the noise variance is itself a free parameter chosen to meet the privacy budget, the theorem does not yet quantify the explicit privacy-utility tradeoff that is central to the paper’s motivation. The dependence on the number of local steps K and on the graph diameter should be written out explicitly.
[§5.2] §5.2, privacy analysis: the composition argument counts only the number of communication rounds but does not adjust the per-round privacy parameter for the K local steps performed between rounds; the resulting (ε,δ) guarantee therefore appears optimistic by a factor of K.

minor comments (2)

[Notation section] Notation: the symbol σ is used both for the noise standard deviation and for the smoothness constant; a distinct symbol for the noise scale would improve readability.
[Figure 3] Figure 3: the x-axis label “communication rounds” is ambiguous because each round contains K local steps; adding a second axis or clarifying caption would help readers interpret the communication cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We have carefully considered the major comments and will make revisions to address them as detailed below.

read point-by-point responses

Referee: [§4, Theorem 1] §4, Theorem 1: the stated bound on distance to stationarity is O(√(noise variance) + 1/√T); because the noise variance is itself a free parameter chosen to meet the privacy budget, the theorem does not yet quantify the explicit privacy-utility tradeoff that is central to the paper’s motivation. The dependence on the number of local steps K and on the graph diameter should be written out explicitly.

Authors: We agree with the referee that the convergence bound should be expressed in terms of the privacy parameters to clearly show the privacy-utility tradeoff. In the revised version, we will substitute the expression for the noise variance (derived from the Gaussian mechanism to achieve the target (ε, δ)) into the convergence bound of Theorem 1. This will yield an explicit dependence on ε, δ, T, and other parameters. Furthermore, we will explicitly state the dependence on the number of local steps K and the graph diameter (which enters through the consensus error or mixing rate) in both the theorem statement and the proof sketch. We believe this will strengthen the presentation of the main result. revision: yes
Referee: [§5.2] §5.2, privacy analysis: the composition argument counts only the number of communication rounds but does not adjust the per-round privacy parameter for the K local steps performed between rounds; the resulting (ε,δ) guarantee therefore appears optimistic by a factor of K.

Authors: We appreciate this observation regarding the privacy composition. Upon closer inspection, we recognize that the privacy loss accumulates over each of the K local gradient steps within a communication round. Therefore, the composition theorem should be applied to the total number of steps, which is K times the number of rounds. In the revised manuscript, we will update the privacy analysis in §5.2 to correctly compose over all local steps, adjusting the per-round privacy budget by a factor of 1/K. This correction will ensure the overall (ε, δ) guarantee is accurate and not optimistic. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives convergence of its local-update algorithm (with clipping and additive noise) to a bounded neighborhood of a stationary point under the standard assumptions of L-smoothness, bounded gradients, and an undirected connected network; the neighborhood radius scales explicitly with the noise variance in the usual nonconvex stochastic-gradient analysis. Privacy guarantees are obtained by applying the Gaussian mechanism to clipped gradients followed by standard DP composition over the finite communication rounds. Neither step reduces by construction to fitted parameters, self-definitions, or load-bearing self-citations; both rest on externally verifiable results from distributed optimization and differential privacy. The provided abstract and reader summary contain no equations or claims that equate a prediction to its own input.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions from distributed optimization and differential privacy; no new entities are postulated and the free parameters are the usual privacy-tuning knobs.

free parameters (2)

gradient clipping threshold
Chosen to bound sensitivity for privacy analysis and to stabilize local updates.
noise variance scale
Calibrated to the target privacy budget epsilon and to the number of communication rounds.

axioms (2)

domain assumption The underlying communication graph is undirected and connected.
Invoked to guarantee consensus and information flow in the distributed setting.
domain assumption Each local objective is L-smooth and has bounded gradients.
Required for the nonconvex convergence analysis to a stationary point.

pith-pipeline@v0.9.0 · 5434 in / 1313 out tokens · 32655 ms · 2026-05-13T20:56:56.084876+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that the resulting algorithm converges to a stationary point of the problem within a bounded distance... gradient clipping and additive noise
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1: globally L-smooth... Assumption 4: unbiased stochastic gradient with bounded variance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

[1]

Separation of learning and control for cyber- physical systems,

A. A. Malikopoulos, “Separation of learning and control for cyber- physical systems,”Automatica, vol. 151, no. 110912, 2023

work page 2023
[2]

Combining learning and control in linear systems,

——, “Combining learning and control in linear systems,”European Journal of Control, vol. 80, no. Part A, p. 101043, 2024

work page 2024
[3]

A survey of distributed optimization and control algorithms for electric power systems,

D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,”IEEE Transactions on Smart Grid, vol. 8, no. 6, pp. 2941–2962, Nov. 2017

work page 2017
[4]

On the output redundancy of LTI systems: A geometric approach with application to privacy,

G. Yang, A. J. Gallo, A. Barboni, R. M. G. Ferrari, A. Serrani, and T. Parisini, “On the output redundancy of LTI systems: A geometric approach with application to privacy,”IEEE Transactions on Automatic Control, vol. 70, no. 11, pp. 7509–7522, 2025

work page 2025
[5]

Distributed Optimization for Control,

A. Nedi ´c and J. Liu, “Distributed Optimization for Control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 77–103, May 2018

work page 2018
[6]

Combining graph attention networks and distributed optimization for multi-robot mixed- integer convex programming,

V .-A. Le, P. Kounatidis, and A. A. Malikopoulos, “Combining graph attention networks and distributed optimization for multi-robot mixed- integer convex programming,” in2025 IEEE 64th Conference on Decision and Control (CDC), 2025, pp. 4440–4445

work page 2025
[7]

Multi-robot cooperative navigation in crowds: A game-theoretic learning-based model predictive control approach,

V .-A. Le, V . Tadiparthi, B. Chalaki, H. N. Mahjoub, J. D’sa, E. Moradi- Pari, and A. A. Malikopoulos, “Multi-robot cooperative navigation in crowds: A game-theoretic learning-based model predictive control approach,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 4834–4840

work page 2024
[8]

Qsparse-local- SGD: Distributed SGD with quantization, sparsification, and local computations,

D. Basu, D. Data, C. Karakus, and S. N. Diggavi, “Qsparse-local- SGD: Distributed SGD with quantization, sparsification, and local computations,”IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 217–226, 2020

work page 2020
[9]

Local exact-diffusion for decentralized optimiza- tion and learning,

S. A. Alghunaim, “Local exact-diffusion for decentralized optimiza- tion and learning,”IEEE Transactions on Automatic Control, vol. 69, no. 11, pp. 7371–7386, 2024

work page 2024
[10]

Distributed learning by local training ADMM,

X. Ren, N. Bastianello, K. H. Johansson, and T. Parisini, “Distributed learning by local training ADMM,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 7124–7129

work page 2024
[11]

Communication-efficient stochastic distributed learning,

——, “Communication-efficient stochastic distributed learning,”IEEE Transactions on Automatic Control [early access], 2026

work page 2026
[12]

Innovation compression for communication-efficient distributed optimization with linear conver- gence,

J. Zhang, K. You, and L. Xie, “Innovation compression for communication-efficient distributed optimization with linear conver- gence,”IEEE Transactions on Automatic Control, vol. 68, no. 11, pp. 6899–6906, 2023

work page 2023
[13]

Cedas: A compressed decentralized stochastic gradient method with improved convergence,

K. Huang and S. Pu, “Cedas: A compressed decentralized stochastic gradient method with improved convergence,”IEEE Transactions on Automatic Control, vol. 70, no. 4, pp. 2242–2257, 2024

work page 2024
[14]

Jointly computation- and communication-efficient distributed learning,

X. Ren, N. Bastianello, K. H. Johansson, and T. Parisini, “Jointly computation- and communication-efficient distributed learning,” in 2025 IEEE 64th Conference on Decision and Control (CDC), 2025, pp. 6798–6803

work page 2025
[15]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318

work page 2016
[16]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[17]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” inTheory of cryptography conference. Springer, 2006, pp. 265–284

work page 2006
[18]

Differentially private empirical risk minimization

K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization.”Journal of Machine Learning Research, vol. 12, no. 3, 2011

work page 2011
[19]

Differentially private feder- ated learning on heterogeneous data,

M. Noble, A. Bellet, and A. Dieuleveut, “Differentially private feder- ated learning on heterogeneous data,” inInternational conference on artificial intelligence and statistics. PMLR, 2022, pp. 10 110–10 145

work page 2022
[20]

A survey on secure decentralized optimization and learning,

C. Liu, N. Bastianello, W. Huo, Y . Shi, and K. H. Johansson, “A survey on secure decentralized optimization and learning,” 2024. [Online]. Available: http://arxiv.org/abs/2408.08628

work page arXiv 2024
[21]

Convergence and privacy of decentralized noncon- vex optimization with gradient clipping and communication compres- sion,

B. Li and Y . Chi, “Convergence and privacy of decentralized noncon- vex optimization with gradient clipping and communication compres- sion,”IEEE Journal of Selected Topics in Signal Processing, vol. 19, no. 1, pp. 273–282, 2025

work page 2025
[22]

Differential privacy in distributed learning: Beyond uniformly bounded stochastic gradients,

Y . Huang, J. Zhang, and Q. Ling, “Differential privacy in distributed learning: Beyond uniformly bounded stochastic gradients,” inThe 28th International Conference on Artificial Intelligence and Statistics, 2025

work page 2025
[23]

Differentially private and communication efficient collaborative learning,

J. Ding, G. Liang, J. Bi, and M. Pan, “Differentially private and communication efficient collaborative learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7219–7227

work page 2021
[24]

The algorithmic foundations of differential privacy,

C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–487, 2014

work page 2014
[25]

Rényi differential privacy,

I. Mironov, “Rényi differential privacy,” in2017 IEEE 30th Computer Security Foundations Symposium (CSF). IEEE, 2017, pp. 263–275

work page 2017
[26]

Asyn- chronous Distributed Optimization Over Lossy Networks via Relaxed ADMM: Stability and Linear Convergence,

N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “Asyn- chronous Distributed Optimization Over Lossy Networks via Relaxed ADMM: Stability and Linear Convergence,”IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2620–2635, Jun. 2021

work page 2021
[27]

Subsampled Rényi differential privacy and analytical moments accountant,

Y .-X. Wang, B. Balle, and S. P. Kasiviswanathan, “Subsampled Rényi differential privacy and analytical moments accountant,” inThe 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1226–1235

work page 2019
[28]

Nesterov,Introductory Lectures on Convex Optimization: A Basic Course

Y . Nesterov,Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, 2013, vol. 87

work page 2013

[1] [1]

Separation of learning and control for cyber- physical systems,

A. A. Malikopoulos, “Separation of learning and control for cyber- physical systems,”Automatica, vol. 151, no. 110912, 2023

work page 2023

[2] [2]

Combining learning and control in linear systems,

——, “Combining learning and control in linear systems,”European Journal of Control, vol. 80, no. Part A, p. 101043, 2024

work page 2024

[3] [3]

A survey of distributed optimization and control algorithms for electric power systems,

D. K. Molzahn, F. Dörfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,”IEEE Transactions on Smart Grid, vol. 8, no. 6, pp. 2941–2962, Nov. 2017

work page 2017

[4] [4]

On the output redundancy of LTI systems: A geometric approach with application to privacy,

G. Yang, A. J. Gallo, A. Barboni, R. M. G. Ferrari, A. Serrani, and T. Parisini, “On the output redundancy of LTI systems: A geometric approach with application to privacy,”IEEE Transactions on Automatic Control, vol. 70, no. 11, pp. 7509–7522, 2025

work page 2025

[5] [5]

Distributed Optimization for Control,

A. Nedi ´c and J. Liu, “Distributed Optimization for Control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 77–103, May 2018

work page 2018

[6] [6]

Combining graph attention networks and distributed optimization for multi-robot mixed- integer convex programming,

V .-A. Le, P. Kounatidis, and A. A. Malikopoulos, “Combining graph attention networks and distributed optimization for multi-robot mixed- integer convex programming,” in2025 IEEE 64th Conference on Decision and Control (CDC), 2025, pp. 4440–4445

work page 2025

[7] [7]

Multi-robot cooperative navigation in crowds: A game-theoretic learning-based model predictive control approach,

V .-A. Le, V . Tadiparthi, B. Chalaki, H. N. Mahjoub, J. D’sa, E. Moradi- Pari, and A. A. Malikopoulos, “Multi-robot cooperative navigation in crowds: A game-theoretic learning-based model predictive control approach,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 4834–4840

work page 2024

[8] [8]

Qsparse-local- SGD: Distributed SGD with quantization, sparsification, and local computations,

D. Basu, D. Data, C. Karakus, and S. N. Diggavi, “Qsparse-local- SGD: Distributed SGD with quantization, sparsification, and local computations,”IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 217–226, 2020

work page 2020

[9] [9]

Local exact-diffusion for decentralized optimiza- tion and learning,

S. A. Alghunaim, “Local exact-diffusion for decentralized optimiza- tion and learning,”IEEE Transactions on Automatic Control, vol. 69, no. 11, pp. 7371–7386, 2024

work page 2024

[10] [10]

Distributed learning by local training ADMM,

X. Ren, N. Bastianello, K. H. Johansson, and T. Parisini, “Distributed learning by local training ADMM,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 7124–7129

work page 2024

[11] [11]

Communication-efficient stochastic distributed learning,

——, “Communication-efficient stochastic distributed learning,”IEEE Transactions on Automatic Control [early access], 2026

work page 2026

[12] [12]

Innovation compression for communication-efficient distributed optimization with linear conver- gence,

J. Zhang, K. You, and L. Xie, “Innovation compression for communication-efficient distributed optimization with linear conver- gence,”IEEE Transactions on Automatic Control, vol. 68, no. 11, pp. 6899–6906, 2023

work page 2023

[13] [13]

Cedas: A compressed decentralized stochastic gradient method with improved convergence,

K. Huang and S. Pu, “Cedas: A compressed decentralized stochastic gradient method with improved convergence,”IEEE Transactions on Automatic Control, vol. 70, no. 4, pp. 2242–2257, 2024

work page 2024

[14] [14]

Jointly computation- and communication-efficient distributed learning,

X. Ren, N. Bastianello, K. H. Johansson, and T. Parisini, “Jointly computation- and communication-efficient distributed learning,” in 2025 IEEE 64th Conference on Decision and Control (CDC), 2025, pp. 6798–6803

work page 2025

[15] [15]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318

work page 2016

[16] [16]

Deep leakage from gradients,

L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,”Advances in neural information processing systems, vol. 32, 2019

work page 2019

[17] [17]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” inTheory of cryptography conference. Springer, 2006, pp. 265–284

work page 2006

[18] [18]

Differentially private empirical risk minimization

K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization.”Journal of Machine Learning Research, vol. 12, no. 3, 2011

work page 2011

[19] [19]

Differentially private feder- ated learning on heterogeneous data,

M. Noble, A. Bellet, and A. Dieuleveut, “Differentially private feder- ated learning on heterogeneous data,” inInternational conference on artificial intelligence and statistics. PMLR, 2022, pp. 10 110–10 145

work page 2022

[20] [20]

A survey on secure decentralized optimization and learning,

C. Liu, N. Bastianello, W. Huo, Y . Shi, and K. H. Johansson, “A survey on secure decentralized optimization and learning,” 2024. [Online]. Available: http://arxiv.org/abs/2408.08628

work page arXiv 2024

[21] [21]

Convergence and privacy of decentralized noncon- vex optimization with gradient clipping and communication compres- sion,

B. Li and Y . Chi, “Convergence and privacy of decentralized noncon- vex optimization with gradient clipping and communication compres- sion,”IEEE Journal of Selected Topics in Signal Processing, vol. 19, no. 1, pp. 273–282, 2025

work page 2025

[22] [22]

Differential privacy in distributed learning: Beyond uniformly bounded stochastic gradients,

Y . Huang, J. Zhang, and Q. Ling, “Differential privacy in distributed learning: Beyond uniformly bounded stochastic gradients,” inThe 28th International Conference on Artificial Intelligence and Statistics, 2025

work page 2025

[23] [23]

Differentially private and communication efficient collaborative learning,

J. Ding, G. Liang, J. Bi, and M. Pan, “Differentially private and communication efficient collaborative learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7219–7227

work page 2021

[24] [24]

The algorithmic foundations of differential privacy,

C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–487, 2014

work page 2014

[25] [25]

Rényi differential privacy,

I. Mironov, “Rényi differential privacy,” in2017 IEEE 30th Computer Security Foundations Symposium (CSF). IEEE, 2017, pp. 263–275

work page 2017

[26] [26]

Asyn- chronous Distributed Optimization Over Lossy Networks via Relaxed ADMM: Stability and Linear Convergence,

N. Bastianello, R. Carli, L. Schenato, and M. Todescato, “Asyn- chronous Distributed Optimization Over Lossy Networks via Relaxed ADMM: Stability and Linear Convergence,”IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2620–2635, Jun. 2021

work page 2021

[27] [27]

Subsampled Rényi differential privacy and analytical moments accountant,

Y .-X. Wang, B. Balle, and S. P. Kasiviswanathan, “Subsampled Rényi differential privacy and analytical moments accountant,” inThe 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1226–1235

work page 2019

[28] [28]

Nesterov,Introductory Lectures on Convex Optimization: A Basic Course

Y . Nesterov,Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, 2013, vol. 87

work page 2013