Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning

Alptekin Kupcu; Baturalp Buyukates; Deniz Gunduz; Emre Ozfatura; Kerem Ozfatura; Mert Coskuner

arxiv: 2404.06230 · v3 · submitted 2024-04-09 · 💻 cs.LG · cs.CR· cs.DC

Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning

Emre Ozfatura , Kerem Ozfatura , Baturalp Buyukates , Mert Coskuner , Alptekin Kupcu , Deniz Gunduz This is my paper

Pith reviewed 2026-05-24 01:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.DC

keywords federated learningByzantine attacksmodel poisoningsparse attacksnetwork pruningadversarial machine learningdefense mechanisms

0 comments

The pith

A hybrid sparse Byzantine attack using neural network sensitivities can bypass eight state-of-the-art federated learning defenses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hybrid sparse Byzantine attack for federated learning that combines two coordinated components to poison the global model. One component sparsely alters high-sensitivity parameters to maximize disruption with low visibility, while the other slowly accumulates changes over rounds to avoid statistical detection. Motivated by sparse neural network insights, the attack leverages side information on parameter sensitivities to remain both aggressive and imperceptible. A sympathetic reader would care because it demonstrates that outlier-detection defenses, which ignore internal model structure, can be evaded by targeted poisoning.

Core claim

The central claim is that a hybrid sparse Byzantine attack, consisting of a sparse attack component that selectively manipulates parameters with higher sensitivity in the NN and a slow-accumulating attack component that silently poisons parameters over multiple rounds, creates a strong but imperceptible attack strategy that can bypass common defences, as demonstrated through extensive simulations against eight state-of-the-art defence mechanisms.

What carries the argument

The hybrid sparse Byzantine attack, which uses side information on neural network parameter sensitivities to coordinate a sparse targeted component with a slow-accumulating component.

If this is right

The attack degrades global model accuracy in federated learning while evading outlier-based detection.
Existing defenses that treat malicious updates only as statistical anomalies are insufficient.
Insights from sparse neural networks enable stronger, more targeted poisoning strategies.
Aggregation at the parameter server must account for internal neural network structure to remain robust.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Federated learning systems may need mechanisms to withhold or obscure parameter sensitivity data from clients.
Future defenses could integrate sensitivity analysis or pruning awareness to detect hybrid attacks.
The hybrid approach might extend to other distributed training settings where model internals can be exploited.

Load-bearing premise

That attackers have access to side information on which neural network parameters have higher sensitivity.

What would settle it

An experiment showing the attack fails to bypass the eight defenses when clients lack access to parameter sensitivity information.

Figures

Figures reproduced from arXiv: 2404.06230 by Alptekin Kupcu, Baturalp Buyukates, Deniz Gunduz, Emre Ozfatura, Kerem Ozfatura, Mert Coskuner.

**Figure 1.** Figure 1: Test accuracy convergence results of training ResNet-20 architecture with CIFAR-10 dataset for IID and [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

**Figure 2.** Figure 2: Histogram of 𝑧 values measured for DHSA, MinSum and Min-Max attacks, during training ResNet-20 in the IID setting against TM aggregator. dynamic version of HSA (DHSA) where the scaling coefficient 𝑧1 is adjusted over the iterations. The performance summarization in [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 4.** Figure 4: Test accuracy of 2-layer CNN architecture on F-MNIST dataset under IID and non-IID distributions with [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Test accuracy of 2-layer MLP architecture on MNIST dataset under IID and non-IID distributions with [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Test accuracy on CIFAR-10 classification task [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 8.** Figure 8: Distribution of the non-sparse locations (remaining weights) in ResNet-20 architecture after pruning with [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Histogram of optimized 𝑧 values generated by DHSA using IQR and Gradient-based variance stabilization methods in the non-IID setting against the CM aggregator. Appendix H. Addressing Numerical Instability in Simulations As discussed in Section 4.2.2 and illustrated in [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 11.** Figure 11: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

**Figure 13.** Figure 13: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗

**Figure 14.** Figure 14: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗

read the original abstract

In federated learning (FL), profiling and verifying each client is inherently difficult, which introduces a significant security vulnerability: malicious clients, commonly referred to as Byzantines, can degrade the accuracy of the global model by submitting poisoned updates during training. To mitigate this, the aggregation process at the parameter server must be robust against such adversarial behaviour. Most existing defences approach the Byzantine problem from an outlier detection perspective, treating malicious updates as statistical anomalies and ignoring the internal structure of the trained neural network (NN). Motivated by this, this work highlights the potential of leveraging side information tied to the NN architecture to design stronger, more targeted attacks. In particular, inspired by insights from sparse NNs, we introduce a hybrid sparse Byzantine attack. The attack consists of two coordinated components: (i) A sparse attack component that selectively manipulates parameters with higher sensitivity in the NN, aiming to cause maximum disruption with minimal visibility; (ii) A slow-accumulating attack component that silently poisons parameters over multiple rounds to evade detection. Together, these components create a strong but imperceptible attack strategy that can bypass common defences. We evaluate the proposed attack through extensive simulations and demonstrate its effectiveness against eight state-of-the-art defence mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a hybrid Byzantine attack that pairs sparse targeting of sensitive parameters with slow multi-round accumulation, but the attack's edge depends on the attacker obtaining sensitivity rankings that standard FL clients probably cannot get reliably.

read the letter

The core contribution is a two-part attack: one component zeros in on high-sensitivity weights using pruning-style insights, the other adds small changes over many rounds. Together they aim to stay under the radar of outlier-based defenses while still hurting the model. This specific pairing does not appear in the defenses the authors cite, so the combination itself is new material for the subfield.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a hybrid sparse Byzantine attack in federated learning that combines (i) a sparse attack component selectively targeting high-sensitivity parameters (inspired by network pruning insights) with (ii) a slow-accumulating component over multiple rounds. The central claim is that this produces a strong yet imperceptible attack capable of bypassing eight state-of-the-art defenses, as demonstrated via simulations.

Significance. If the attack can be realized under standard FL threat models and the simulation results are reproducible, the work would highlight a structural weakness in current outlier-based defenses and motivate defenses that incorporate model architecture. The sparsity-motivated targeting is a potentially useful angle, but the absence of experimental details prevents assessment of whether the claimed bypass holds.

major comments (2)

[Abstract] Abstract: the claim that the hybrid attack 'can bypass common defences' and is 'demonstrated against eight state-of-the-art defence mechanisms' is load-bearing, yet the abstract (and available text) provides no information on experimental setup, datasets, metrics, baselines, number of rounds, or potential confounds, rendering the central effectiveness claim unverifiable.
[Threat Model / Attack Definition] Threat model / attack construction: the sparse component presupposes that the attacker can obtain a reliable per-parameter sensitivity ranking. In the standard FL threat model a malicious client receives only the current global weights and its own local data; the manuscript does not specify or demonstrate a local procedure that produces a faithful sensitivity map without extra non-local information. This assumption is load-bearing for the 'imperceptible yet effective' property.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater clarity in the abstract and threat model. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the hybrid attack 'can bypass common defences' and is 'demonstrated against eight state-of-the-art defence mechanisms' is load-bearing, yet the abstract (and available text) provides no information on experimental setup, datasets, metrics, baselines, number of rounds, or potential confounds, rendering the central effectiveness claim unverifiable.

Authors: We agree the abstract should be self-contained to support its central claims. The full manuscript (Section 4) specifies the experimental setup, including datasets (MNIST, CIFAR-10), models (LeNet, ResNet-18), 100 clients with 10% malicious, 200 communication rounds, metrics (test accuracy, attack success rate), and the eight defenses (Krum, Median, Trimmed Mean, Bulyan, FLTrust, FoolsGold, RFA, and Multi-Krum). To address the concern, we will revise the abstract to concisely include key setup elements and the list of defenses evaluated. revision: yes
Referee: [Threat Model / Attack Definition] Threat model / attack construction: the sparse component presupposes that the attacker can obtain a reliable per-parameter sensitivity ranking. In the standard FL threat model a malicious client receives only the current global weights and its own local data; the manuscript does not specify or demonstrate a local procedure that produces a faithful sensitivity map without extra non-local information. This assumption is load-bearing for the 'imperceptible yet effective' property.

Authors: This is a substantive point on the threat model. The manuscript assumes the attacker has the model architecture (standard in FL, as the global model is broadcast) and can compute sensitivity locally. We will add a dedicated subsection in the revised version detailing a local procedure: the attacker uses the magnitude of parameters in the received global model combined with the norm of local gradients computed on its own data (adapted from pruning sensitivity metrics like those in SNIP). This requires no non-local information beyond what is available to any client. We will also include a brief validation showing the local ranking correlates with global impact. revision: yes

Circularity Check

0 steps flagged

No circularity; attack defined from external sparse-NN insights and evaluated empirically

full rationale

The paper constructs a hybrid sparse+accumulating Byzantine attack by explicitly defining its two components from external sparse-NN literature (sensitivity ranking and slow poisoning). No equations or claims reduce the attack definition to its own outputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The central claim is an empirical demonstration against eight independent defenses; the derivation chain is self-contained and does not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard FL threat model assumptions and introduces two new attack components without external independent evidence beyond the described simulations.

axioms (2)

domain assumption Malicious clients (Byzantines) exist and can submit arbitrary poisoned updates in federated learning
Core premise of the Byzantine problem in FL, invoked in the abstract's motivation section.
domain assumption Neural network parameter sensitivities can be identified and leveraged by attackers as side information
Motivation paragraph states this as the basis for the sparse attack component.

invented entities (2)

sparse attack component no independent evidence
purpose: Selectively manipulates parameters with higher sensitivity to cause maximum disruption with minimal visibility
New attack element introduced to exploit NN architecture.
slow-accumulating attack component no independent evidence
purpose: Silently poisons parameters over multiple rounds to evade detection
New attack element introduced to complement the sparse component.

pith-pipeline@v0.9.0 · 5778 in / 1327 out tokens · 19997 ms · 2026-05-24T01:59:07.738410+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 1 internal anchor

[1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

work page 2017
[2]

Federated learning: Strategies for improving com- munication efficiency,

J. Kone ˇcn`y, H. B. McMahan, F. X. Yu, P. Richt ´arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving com- munication efficiency,”NIPS workshop on Private Multiparty Ma- chine Learning, 2016

work page 2016
[3]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

work page 2020
[4]

Analyzing federated learning through an adversarial lens,

A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” inICML, 2019

work page 2019
[5]

The limitations of federated learning in sybil settings

C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings.” inUsenix RAID, 2020

work page 2020
[6]

Can you really backdoor federated learning?

Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911
[7]

Dba: Distributed backdoor attacks against federated learning,

C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2020

work page 2020
[8]

Machine learning with adversaries: Byzantine tolerant gradient descent,

P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” inNIPS, 2017

work page 2017
[9]

Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,

C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,” inUncer- tainty in Artificial Intelligence, 2020

work page 2020
[10]

A little is enough: Cir- cumventing defenses for distributed learning,

G. Baruch, M. Baruch, and Y . Goldberg, “A little is enough: Cir- cumventing defenses for distributed learning,” inNeurIPS, 2019

work page 2019
[11]

Robust aggregation for federated learning,

K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, 2022

work page 2022
[12]

Byzantine-robust distributed learning: Towards optimal statistical rates,

D. Yin, Y . Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” inICML, 2018

work page 2018
[13]

The hidden vulnerability of distributed learning in byzantium,

E. M. El Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vulnerability of distributed learning in byzantium,” inICML, 2018

work page 2018
[14]

Byzantine-robust federated machine learning through adaptive model averaging,

L. Mu ˜noz-Gonz´alez, K. T. Co, and E. C. Lupu, “Byzantine-robust federated machine learning through adaptive model averaging,” arXiv preprint arXiv:1909.05125, 2019

work page arXiv 1909
[15]

FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,

S. Han, B. Buyukates, Z. Hu, H. Jin, W. Jin, L. Sun, X. Wang, W. Wu, C. Xie, Y . Yao, K. Zhang, Q. Zhang, Y . Zhang, C. Joe- Wong, S. Avestimehr, and C. He, “FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, p. 5070–5081

work page 2024
[16]

Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,

S. Han, W. Wu, B. Buyukates, W. Jin, Q. Zhang, Y . Yao, S. Aves- timehr, and C. He, “Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,”arXiv preprint arXiv:2310.04055, 2023

work page arXiv 2023
[17]

Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,

X. Chen, T. Chen, H. Sun, S. Z. Wu, and M. Hong, “Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,”Advances in Neural Information Processing Systems, vol. 33, pp. 21 616–21 626, 2020

work page 2020
[18]

Learning from history for byzantine robust optimization,

S. P. Karimireddy, L. He, and M. Jaggi, “Learning from history for byzantine robust optimization,” inICML, 2021

work page 2021
[19]

Can decentralized learning be more robust than federated learning?

M. Raynal, D. Pasquini, and C. Troncoso, “Can decentralized learning be more robust than federated learning?”arXiv, 2023

work page 2023
[20]

Genuinely distributed byzantine machine learning,

E.-M. El-Mhamdi, R. Guerraoui, A. Guirguis, L. N. Hoang, and S. Rouault, “Genuinely distributed byzantine machine learning,” ser. PODC ’20. Association for Computing Machinery, 2020

work page 2020
[21]

AnO(log3/2 n) parallel time population protocol for majority withO(log n) states

N. Gupta and N. H. Vaidya, “Fault-tolerance in distributed optimization: The case of redundancy,” inProceedings of the 39th Symposium on Principles of Distributed Computing, ser. PODC ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 365–374. [Online]. Available: https://doi.org/10.1145/3382734.3405748

work page doi:10.1145/3382734.3405748 2020
[22]

Fltrust: Byzantine-robust federated learning via trust bootstrapping,

X. Cao, M. Fang, J. Liu, and N. Z. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

work page 2021
[23]

Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,

C. Xie, S. Koyejo, and I. Gupta, “Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,” inICML, 2019

work page 2019
[24]

Mixed nash for robust federated learning,

W. Xie, T. Pethick, A. Ramezani-Kebrya, and V . Cevher, “Mixed nash for robust federated learning,”TMLR, 2023

work page 2023
[26]

Local model poisoning attacks to byzantine-robust federated learning,

M. Fang, X. Cao, J. Jia, and N. Z. Gong, “Local model poisoning attacks to byzantine-robust federated learning,” inUSENIX Con- ference on Security Symposium, 2020

work page 2020
[27]

Byzantine-robust learning on heterogeneous datasets via bucketing,

S. P. Karimireddy, L. He, and M. Jaggi, “Byzantine-robust learning on heterogeneous datasets via bucketing,” inICLR, 2022

work page 2022
[28]

Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,

E. Gorbunov, S. Horv ´ath, P. Richt ´arik, and G. Gidel, “Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,” in ICLR, 2023

work page 2023
[29]

Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,

J. Peng, Z. Wu, Q. Ling, and T. Chen, “Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,”Infor- mation Sciences, 2022

work page 2022
[30]

Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,

Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,”IEEE Transactions on Signal Processing, 2020

work page 2020
[31]

Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,

H. Zhu and Q. Ling, “Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,” inICASSP, 2022

work page 2022
[32]

Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,

J. Peng, W. Li, and Q. Ling, “Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,” inICASSP, 2022

work page 2022
[33]

Byzantines can also learn from history: Fall of centered clipping in federated learning,

K. ¨Ozfatura, E. ¨Ozfatura, A. K ¨upc ¸¨u, and D. Gunduz, “Byzantines can also learn from history: Fall of centered clipping in federated learning,”IEEE Transactions on Information Forensics and Secu- rity, vol. 19, pp. 2010–2022, 2024

work page 2010
[34]

Byzantine machine learning made easy by resilient averaging of momentums,

S. Farhadkhani, R. Guerraoui, N. Gupta, R. Pinot, and J. Stephan, “Byzantine machine learning made easy by resilient averaging of momentums,” inICML, 2022

work page 2022
[35]

Distributed momentum for byzantine-resilient stochastic gradient descent,

E.-M. El-Mhamdi, R. Guerraoui, and S. Rouault, “Distributed momentum for byzantine-resilient stochastic gradient descent,” in ICLR, 2021

work page 2021
[36]

Some methods of speeding up the convergence of iteration methods,

B. Polyak, “Some methods of speeding up the convergence of iteration methods,”USSR Computational Mathematics and Math- ematical Physics, 1964

work page 1964
[37]

Adam: A method for stochastic opti- mization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,”ICLR, 2015

work page 2015
[38]

On the impor- tance of initialization and momentum in deep learning,

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the impor- tance of initialization and momentum in deep learning,” inICML, 2013

work page 2013
[39]

Fedadc: Accelerated federated learning with drift control,

E. Ozfatura, K. Ozfatura, and D. G ¨und¨uz, “Fedadc: Accelerated federated learning with drift control,” inISIT, 2021, pp. 467–472

work page 2021
[40]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429– 450, 2020

work page 2020
[41]

Federated learning based on dynamic regular- ization,

D. A. E. Acar, Y . Zhao, R. Matas, M. Mattina, P. Whatmough, and V . Saligrama, “Federated learning based on dynamic regular- ization,” inICLR, 2021

work page 2021
[42]

Feddc: Federated learning with non-iid data via local drift decoupling and correction,

L. Gao, H. Fu, L. Li, Y . Chen, M. Xu, and C.-Z. Xu, “Feddc: Federated learning with non-iid data via local drift decoupling and correction,” inCVPR, 2022

work page 2022
[43]

Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,

J. Wang, V . Tantia, N. Ballas, and M. G. Rabbat, “Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,”ICLR, 2020

work page 2020
[44]

Scaffold: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inICML, 2020

work page 2020
[45]

A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),

Y . Nesterov, “A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),”Proceedings of the USSR Academy of Sciences, 1983

work page 1983
[46]

Byzantine- robust decentralized learning via self-centered clipping,

L. H. andfan Sai Praneeth Karimireddy and M. Jaggi, “Byzantine- robust decentralized learning via self-centered clipping,”ArXiv, 2022

work page 2022
[47]

The byzantine generals problem,

L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,”ACM Trans. Program. Lang. Syst., 1982

work page 1982
[48]

signsgd with majority vote is communication efficient and fault tolerant,

J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, “signsgd with majority vote is communication efficient and fault tolerant,” inICLR, 2019

work page 2019
[49]

Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,

R. Jin, Y . Liu, Y . Huang, X. He, T. Wu, and H. Dai, “Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,”TNNLS, 2024

work page 2024
[50]

Byzantine- robust learning on heterogeneous data via gradient splitting,

Y . Liu, C. Chen, L. Lyu, F. Wu, S. Wu, and G. Chen, “Byzantine- robust learning on heterogeneous data via gradient splitting,” in ICML, 2023

work page 2023
[51]

The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,

S. Liu, T. Chen, X. Chen, L. Shen, D. C. Mocanu, Z. Wang, and M. Pechenizkiy, “The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,” in International Conference on Learning Representations, 2022

work page 2022
[52]

Progressive skeletonization: Trimming more fat from a network at initialization,

P. de e, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” inICLR, 2021

work page 2021
[53]

Pruning neural networks without any data by iteratively conserving synaptic flow,

H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” inNeurips, 2020

work page 2020
[54]

Powersgd: Practi- cal low-rank gradient compression for distributed optimization,

T. V ogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practi- cal low-rank gradient compression for distributed optimization,” Neurips, vol. 32, 2019

work page 2019
[55]

Efficient lottery ticket finding: Less data is more,

Z. Zhang, X. Chen, T. Chen, and Z. Wang, “Efficient lottery ticket finding: Less data is more,” inICML, 2021

work page 2021
[56]

Group fisher pruning for practical network compression,

L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y . Chen, W. Yang, Q. Liao, and W. Zhang, “Group fisher pruning for practical network compression,” inICML, 2021

work page 2021
[57]

Rare gems: Finding lottery tickets at initialization,

K. Sreenivasan, J. yong Sohn, L. Yang, M. Grinde, A. Nagle, H. Wang, E. Xing, K. Lee, and D. Papailiopoulos, “Rare gems: Finding lottery tickets at initialization,” inNeurips, 2022

work page 2022
[58]

Dual lottery ticket hypothesis,

Y . Bai, H. Wang, Z. TAO, K. Li, and Y . Fu, “Dual lottery ticket hypothesis,” inICLR, 2022

work page 2022
[59]

Layer-adaptive sparsity for the magnitude-based pruning,

J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitude-based pruning,” inICLR, 2021

work page 2021
[60]

Poisoning attacks against support vector machines,

B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” inICML, 2012

work page 2012
[61]

Cifar-10 (canadian institute for advanced research)

A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).”

work page
[62]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016

work page 2016
[63]

Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,

V . Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,” inIEEE SP, 2022

work page 2022
[64]

Robust federated learning with attack- adaptive aggregation,

C. P. Wan and Q. Chen, “Robust federated learning with attack- adaptive aggregation,”arXiv, 2021

work page 2021
[65]

Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,

A. Gupta, T. Luo, M. V . Ngo, and S. K. Das, “Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,” inESORICS, 2022

work page 2022
[66]

Defending against data poisoning attack in federated learning with non-iid data,

C. Yin and Q. Zeng, “Defending against data poisoning attack in federated learning with non-iid data,”IEEE Transactions on Computational Social Systems, 2023

work page 2023
[67]

Estimating a dirichlet distribution,

T. Minka, “Estimating a dirichlet distribution,” 2000

work page 2000
[68]

Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,

L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling, “Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 1544–1551

work page 2019
[69]

Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,

J. Xu, Z. Zhang, and R. Hu, “Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1508–1517

work page 2025
[70]

Do we really need to design new byzantine-robust aggregation rules?

M. Fang, S. Nabavirazavi, Z. Liu, W. Sun, S. S. Iyengar, and H. Yang, “Do we really need to design new byzantine-robust aggregation rules?” inNDSS, 2025

work page 2025
[71]

Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,

H. Mostafa and X. Wang, “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,” inInternational Conference on Machine Learning. PMLR, 2019, pp. 4646–4655

work page 2019
[72]

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,

D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta, “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,”Nature communications, vol. 9, no. 1, p. 2383, 2018

work page 2018
[73]

SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,

N. Lee, T. Ajanthan, and P. Torr, “SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,” inICLR, 2019

work page 2019
[74]

The lottery ticket hypothesis: Finding sparse, trainable neural networks,

J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” inICLR, 2019

work page 2019
[75]

Pruning Convolutional Neural Networks for Resource Efficient Inference

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” arXiv preprint arXiv:1611.06440, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[76]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cum- mingset al., “Advances and open problems in federated learning,” Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[77]

Rigging the lottery: Making all tickets winners,

U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” inICML, 2020

work page 2020
[78]

Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,

V . Shejwalkar and A. Houmansadr, “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,” inNDSS, 2021

work page 2021
[79]

Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error

Y . Xie, M. Fang, and N. Z. Gong, “Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error.” International Conference on Machine Learning, 2024

work page 2024
[80]

Fl-defender: Combating targeted attacks in federated learning,

N. M. Jebreel and J. Domingo-Ferrer, “Fl-defender: Combating targeted attacks in federated learning,”Knowledge-Based Systems, vol. 260, p. 110178, 2023

work page 2023
[81]

{FLAME}: Taming backdoors in federated learning,

T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouni et al., “{FLAME}: Taming backdoors in federated learning,” in 31st USENIX security symposium (USENIX Security 22), 2022, pp. 1415–1432

work page 2022

Showing first 80 references.

[1] [1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

work page 2017

[2] [2]

Federated learning: Strategies for improving com- munication efficiency,

J. Kone ˇcn`y, H. B. McMahan, F. X. Yu, P. Richt ´arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving com- munication efficiency,”NIPS workshop on Private Multiparty Ma- chine Learning, 2016

work page 2016

[3] [3]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

work page 2020

[4] [4]

Analyzing federated learning through an adversarial lens,

A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” inICML, 2019

work page 2019

[5] [5]

The limitations of federated learning in sybil settings

C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings.” inUsenix RAID, 2020

work page 2020

[6] [6]

Can you really backdoor federated learning?

Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911

[7] [7]

Dba: Distributed backdoor attacks against federated learning,

C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2020

work page 2020

[8] [8]

Machine learning with adversaries: Byzantine tolerant gradient descent,

P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” inNIPS, 2017

work page 2017

[9] [9]

Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,

C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,” inUncer- tainty in Artificial Intelligence, 2020

work page 2020

[10] [10]

A little is enough: Cir- cumventing defenses for distributed learning,

G. Baruch, M. Baruch, and Y . Goldberg, “A little is enough: Cir- cumventing defenses for distributed learning,” inNeurIPS, 2019

work page 2019

[11] [11]

Robust aggregation for federated learning,

K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, 2022

work page 2022

[12] [12]

Byzantine-robust distributed learning: Towards optimal statistical rates,

D. Yin, Y . Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” inICML, 2018

work page 2018

[13] [13]

The hidden vulnerability of distributed learning in byzantium,

E. M. El Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vulnerability of distributed learning in byzantium,” inICML, 2018

work page 2018

[14] [14]

Byzantine-robust federated machine learning through adaptive model averaging,

L. Mu ˜noz-Gonz´alez, K. T. Co, and E. C. Lupu, “Byzantine-robust federated machine learning through adaptive model averaging,” arXiv preprint arXiv:1909.05125, 2019

work page arXiv 1909

[15] [15]

FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,

S. Han, B. Buyukates, Z. Hu, H. Jin, W. Jin, L. Sun, X. Wang, W. Wu, C. Xie, Y . Yao, K. Zhang, Q. Zhang, Y . Zhang, C. Joe- Wong, S. Avestimehr, and C. He, “FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, p. 5070–5081

work page 2024

[16] [16]

Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,

S. Han, W. Wu, B. Buyukates, W. Jin, Q. Zhang, Y . Yao, S. Aves- timehr, and C. He, “Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,”arXiv preprint arXiv:2310.04055, 2023

work page arXiv 2023

[17] [17]

Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,

X. Chen, T. Chen, H. Sun, S. Z. Wu, and M. Hong, “Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,”Advances in Neural Information Processing Systems, vol. 33, pp. 21 616–21 626, 2020

work page 2020

[18] [18]

Learning from history for byzantine robust optimization,

S. P. Karimireddy, L. He, and M. Jaggi, “Learning from history for byzantine robust optimization,” inICML, 2021

work page 2021

[19] [19]

Can decentralized learning be more robust than federated learning?

M. Raynal, D. Pasquini, and C. Troncoso, “Can decentralized learning be more robust than federated learning?”arXiv, 2023

work page 2023

[20] [20]

Genuinely distributed byzantine machine learning,

E.-M. El-Mhamdi, R. Guerraoui, A. Guirguis, L. N. Hoang, and S. Rouault, “Genuinely distributed byzantine machine learning,” ser. PODC ’20. Association for Computing Machinery, 2020

work page 2020

[21] [21]

AnO(log3/2 n) parallel time population protocol for majority withO(log n) states

N. Gupta and N. H. Vaidya, “Fault-tolerance in distributed optimization: The case of redundancy,” inProceedings of the 39th Symposium on Principles of Distributed Computing, ser. PODC ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 365–374. [Online]. Available: https://doi.org/10.1145/3382734.3405748

work page doi:10.1145/3382734.3405748 2020

[22] [22]

Fltrust: Byzantine-robust federated learning via trust bootstrapping,

X. Cao, M. Fang, J. Liu, and N. Z. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

work page 2021

[23] [23]

Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,

C. Xie, S. Koyejo, and I. Gupta, “Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,” inICML, 2019

work page 2019

[24] [24]

Mixed nash for robust federated learning,

W. Xie, T. Pethick, A. Ramezani-Kebrya, and V . Cevher, “Mixed nash for robust federated learning,”TMLR, 2023

work page 2023

[25] [26]

Local model poisoning attacks to byzantine-robust federated learning,

M. Fang, X. Cao, J. Jia, and N. Z. Gong, “Local model poisoning attacks to byzantine-robust federated learning,” inUSENIX Con- ference on Security Symposium, 2020

work page 2020

[26] [27]

Byzantine-robust learning on heterogeneous datasets via bucketing,

S. P. Karimireddy, L. He, and M. Jaggi, “Byzantine-robust learning on heterogeneous datasets via bucketing,” inICLR, 2022

work page 2022

[27] [28]

Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,

E. Gorbunov, S. Horv ´ath, P. Richt ´arik, and G. Gidel, “Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,” in ICLR, 2023

work page 2023

[28] [29]

Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,

J. Peng, Z. Wu, Q. Ling, and T. Chen, “Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,”Infor- mation Sciences, 2022

work page 2022

[29] [30]

Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,

Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,”IEEE Transactions on Signal Processing, 2020

work page 2020

[30] [31]

Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,

H. Zhu and Q. Ling, “Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,” inICASSP, 2022

work page 2022

[31] [32]

Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,

J. Peng, W. Li, and Q. Ling, “Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,” inICASSP, 2022

work page 2022

[32] [33]

Byzantines can also learn from history: Fall of centered clipping in federated learning,

K. ¨Ozfatura, E. ¨Ozfatura, A. K ¨upc ¸¨u, and D. Gunduz, “Byzantines can also learn from history: Fall of centered clipping in federated learning,”IEEE Transactions on Information Forensics and Secu- rity, vol. 19, pp. 2010–2022, 2024

work page 2010

[33] [34]

Byzantine machine learning made easy by resilient averaging of momentums,

S. Farhadkhani, R. Guerraoui, N. Gupta, R. Pinot, and J. Stephan, “Byzantine machine learning made easy by resilient averaging of momentums,” inICML, 2022

work page 2022

[34] [35]

Distributed momentum for byzantine-resilient stochastic gradient descent,

E.-M. El-Mhamdi, R. Guerraoui, and S. Rouault, “Distributed momentum for byzantine-resilient stochastic gradient descent,” in ICLR, 2021

work page 2021

[35] [36]

Some methods of speeding up the convergence of iteration methods,

B. Polyak, “Some methods of speeding up the convergence of iteration methods,”USSR Computational Mathematics and Math- ematical Physics, 1964

work page 1964

[36] [37]

Adam: A method for stochastic opti- mization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,”ICLR, 2015

work page 2015

[37] [38]

On the impor- tance of initialization and momentum in deep learning,

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the impor- tance of initialization and momentum in deep learning,” inICML, 2013

work page 2013

[38] [39]

Fedadc: Accelerated federated learning with drift control,

E. Ozfatura, K. Ozfatura, and D. G ¨und¨uz, “Fedadc: Accelerated federated learning with drift control,” inISIT, 2021, pp. 467–472

work page 2021

[39] [40]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429– 450, 2020

work page 2020

[40] [41]

Federated learning based on dynamic regular- ization,

D. A. E. Acar, Y . Zhao, R. Matas, M. Mattina, P. Whatmough, and V . Saligrama, “Federated learning based on dynamic regular- ization,” inICLR, 2021

work page 2021

[41] [42]

Feddc: Federated learning with non-iid data via local drift decoupling and correction,

L. Gao, H. Fu, L. Li, Y . Chen, M. Xu, and C.-Z. Xu, “Feddc: Federated learning with non-iid data via local drift decoupling and correction,” inCVPR, 2022

work page 2022

[42] [43]

Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,

J. Wang, V . Tantia, N. Ballas, and M. G. Rabbat, “Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,”ICLR, 2020

work page 2020

[43] [44]

Scaffold: Stochastic controlled averaging for federated learning,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inICML, 2020

work page 2020

[44] [45]

A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),

Y . Nesterov, “A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),”Proceedings of the USSR Academy of Sciences, 1983

work page 1983

[45] [46]

Byzantine- robust decentralized learning via self-centered clipping,

L. H. andfan Sai Praneeth Karimireddy and M. Jaggi, “Byzantine- robust decentralized learning via self-centered clipping,”ArXiv, 2022

work page 2022

[46] [47]

The byzantine generals problem,

L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,”ACM Trans. Program. Lang. Syst., 1982

work page 1982

[47] [48]

signsgd with majority vote is communication efficient and fault tolerant,

J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, “signsgd with majority vote is communication efficient and fault tolerant,” inICLR, 2019

work page 2019

[48] [49]

Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,

R. Jin, Y . Liu, Y . Huang, X. He, T. Wu, and H. Dai, “Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,”TNNLS, 2024

work page 2024

[49] [50]

Byzantine- robust learning on heterogeneous data via gradient splitting,

Y . Liu, C. Chen, L. Lyu, F. Wu, S. Wu, and G. Chen, “Byzantine- robust learning on heterogeneous data via gradient splitting,” in ICML, 2023

work page 2023

[50] [51]

The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,

S. Liu, T. Chen, X. Chen, L. Shen, D. C. Mocanu, Z. Wang, and M. Pechenizkiy, “The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,” in International Conference on Learning Representations, 2022

work page 2022

[51] [52]

Progressive skeletonization: Trimming more fat from a network at initialization,

P. de e, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” inICLR, 2021

work page 2021

[52] [53]

Pruning neural networks without any data by iteratively conserving synaptic flow,

H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” inNeurips, 2020

work page 2020

[53] [54]

Powersgd: Practi- cal low-rank gradient compression for distributed optimization,

T. V ogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practi- cal low-rank gradient compression for distributed optimization,” Neurips, vol. 32, 2019

work page 2019

[54] [55]

Efficient lottery ticket finding: Less data is more,

Z. Zhang, X. Chen, T. Chen, and Z. Wang, “Efficient lottery ticket finding: Less data is more,” inICML, 2021

work page 2021

[55] [56]

Group fisher pruning for practical network compression,

L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y . Chen, W. Yang, Q. Liao, and W. Zhang, “Group fisher pruning for practical network compression,” inICML, 2021

work page 2021

[56] [57]

Rare gems: Finding lottery tickets at initialization,

K. Sreenivasan, J. yong Sohn, L. Yang, M. Grinde, A. Nagle, H. Wang, E. Xing, K. Lee, and D. Papailiopoulos, “Rare gems: Finding lottery tickets at initialization,” inNeurips, 2022

work page 2022

[57] [58]

Dual lottery ticket hypothesis,

Y . Bai, H. Wang, Z. TAO, K. Li, and Y . Fu, “Dual lottery ticket hypothesis,” inICLR, 2022

work page 2022

[58] [59]

Layer-adaptive sparsity for the magnitude-based pruning,

J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitude-based pruning,” inICLR, 2021

work page 2021

[59] [60]

Poisoning attacks against support vector machines,

B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” inICML, 2012

work page 2012

[60] [61]

Cifar-10 (canadian institute for advanced research)

A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).”

work page

[61] [62]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016

work page 2016

[62] [63]

Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,

V . Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,” inIEEE SP, 2022

work page 2022

[63] [64]

Robust federated learning with attack- adaptive aggregation,

C. P. Wan and Q. Chen, “Robust federated learning with attack- adaptive aggregation,”arXiv, 2021

work page 2021

[64] [65]

Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,

A. Gupta, T. Luo, M. V . Ngo, and S. K. Das, “Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,” inESORICS, 2022

work page 2022

[65] [66]

Defending against data poisoning attack in federated learning with non-iid data,

C. Yin and Q. Zeng, “Defending against data poisoning attack in federated learning with non-iid data,”IEEE Transactions on Computational Social Systems, 2023

work page 2023

[66] [67]

Estimating a dirichlet distribution,

T. Minka, “Estimating a dirichlet distribution,” 2000

work page 2000

[67] [68]

Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,

L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling, “Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 1544–1551

work page 2019

[68] [69]

Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,

J. Xu, Z. Zhang, and R. Hu, “Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1508–1517

work page 2025

[69] [70]

Do we really need to design new byzantine-robust aggregation rules?

M. Fang, S. Nabavirazavi, Z. Liu, W. Sun, S. S. Iyengar, and H. Yang, “Do we really need to design new byzantine-robust aggregation rules?” inNDSS, 2025

work page 2025

[70] [71]

Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,

H. Mostafa and X. Wang, “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,” inInternational Conference on Machine Learning. PMLR, 2019, pp. 4646–4655

work page 2019

[71] [72]

Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,

D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta, “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,”Nature communications, vol. 9, no. 1, p. 2383, 2018

work page 2018

[72] [73]

SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,

N. Lee, T. Ajanthan, and P. Torr, “SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,” inICLR, 2019

work page 2019

[73] [74]

The lottery ticket hypothesis: Finding sparse, trainable neural networks,

J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” inICLR, 2019

work page 2019

[74] [75]

Pruning Convolutional Neural Networks for Resource Efficient Inference

P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” arXiv preprint arXiv:1611.06440, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[75] [76]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cum- mingset al., “Advances and open problems in federated learning,” Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021

[76] [77]

Rigging the lottery: Making all tickets winners,

U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” inICML, 2020

work page 2020

[77] [78]

Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,

V . Shejwalkar and A. Houmansadr, “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,” inNDSS, 2021

work page 2021

[78] [79]

Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error

Y . Xie, M. Fang, and N. Z. Gong, “Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error.” International Conference on Machine Learning, 2024

work page 2024

[79] [80]

Fl-defender: Combating targeted attacks in federated learning,

N. M. Jebreel and J. Domingo-Ferrer, “Fl-defender: Combating targeted attacks in federated learning,”Knowledge-Based Systems, vol. 260, p. 110178, 2023

work page 2023

[80] [81]

{FLAME}: Taming backdoors in federated learning,

T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouni et al., “{FLAME}: Taming backdoors in federated learning,” in 31st USENIX security symposium (USENIX Security 22), 2022, pp. 1415–1432

work page 2022