pith. sign in

arxiv: 2404.06230 · v3 · submitted 2024-04-09 · 💻 cs.LG · cs.CR· cs.DC

Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning

Pith reviewed 2026-05-24 01:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.DC
keywords federated learningByzantine attacksmodel poisoningsparse attacksnetwork pruningadversarial machine learningdefense mechanisms
0
0 comments X

The pith

A hybrid sparse Byzantine attack using neural network sensitivities can bypass eight state-of-the-art federated learning defenses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a hybrid sparse Byzantine attack for federated learning that combines two coordinated components to poison the global model. One component sparsely alters high-sensitivity parameters to maximize disruption with low visibility, while the other slowly accumulates changes over rounds to avoid statistical detection. Motivated by sparse neural network insights, the attack leverages side information on parameter sensitivities to remain both aggressive and imperceptible. A sympathetic reader would care because it demonstrates that outlier-detection defenses, which ignore internal model structure, can be evaded by targeted poisoning.

Core claim

The central claim is that a hybrid sparse Byzantine attack, consisting of a sparse attack component that selectively manipulates parameters with higher sensitivity in the NN and a slow-accumulating attack component that silently poisons parameters over multiple rounds, creates a strong but imperceptible attack strategy that can bypass common defences, as demonstrated through extensive simulations against eight state-of-the-art defence mechanisms.

What carries the argument

The hybrid sparse Byzantine attack, which uses side information on neural network parameter sensitivities to coordinate a sparse targeted component with a slow-accumulating component.

If this is right

  • The attack degrades global model accuracy in federated learning while evading outlier-based detection.
  • Existing defenses that treat malicious updates only as statistical anomalies are insufficient.
  • Insights from sparse neural networks enable stronger, more targeted poisoning strategies.
  • Aggregation at the parameter server must account for internal neural network structure to remain robust.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Federated learning systems may need mechanisms to withhold or obscure parameter sensitivity data from clients.
  • Future defenses could integrate sensitivity analysis or pruning awareness to detect hybrid attacks.
  • The hybrid approach might extend to other distributed training settings where model internals can be exploited.

Load-bearing premise

That attackers have access to side information on which neural network parameters have higher sensitivity.

What would settle it

An experiment showing the attack fails to bypass the eight defenses when clients lack access to parameter sensitivity information.

Figures

Figures reproduced from arXiv: 2404.06230 by Alptekin Kupcu, Baturalp Buyukates, Deniz Gunduz, Emre Ozfatura, Kerem Ozfatura, Mert Coskuner.

Figure 1
Figure 1. Figure 1: Test accuracy convergence results of training ResNet-20 architecture with CIFAR-10 dataset for IID and [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Histogram of 𝑧 values measured for DHSA, Min￾Sum and Min-Max attacks, during training ResNet-20 in the IID setting against TM aggregator. dynamic version of HSA (DHSA) where the scaling coef￾ficient 𝑧1 is adjusted over the iterations. The performance summarization in [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test accuracy of 2-layer CNN architecture on F-MNIST dataset under IID and non-IID distributions with [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Test accuracy of 2-layer MLP architecture on MNIST dataset under IID and non-IID distributions with [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Test accuracy on CIFAR-10 classification task [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of the non-sparse locations (remaining weights) in ResNet-20 architecture after pruning with [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Histogram of optimized 𝑧 values generated by DHSA using IQR and Gradient-based variance stabi￾lization methods in the non-IID setting against the CM aggregator. Appendix H. Addressing Numerical Instability in Simula￾tions As discussed in Section 4.2.2 and illustrated in [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Layer-wise sparsity density distribution for [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗
read the original abstract

In federated learning (FL), profiling and verifying each client is inherently difficult, which introduces a significant security vulnerability: malicious clients, commonly referred to as Byzantines, can degrade the accuracy of the global model by submitting poisoned updates during training. To mitigate this, the aggregation process at the parameter server must be robust against such adversarial behaviour. Most existing defences approach the Byzantine problem from an outlier detection perspective, treating malicious updates as statistical anomalies and ignoring the internal structure of the trained neural network (NN). Motivated by this, this work highlights the potential of leveraging side information tied to the NN architecture to design stronger, more targeted attacks. In particular, inspired by insights from sparse NNs, we introduce a hybrid sparse Byzantine attack. The attack consists of two coordinated components: (i) A sparse attack component that selectively manipulates parameters with higher sensitivity in the NN, aiming to cause maximum disruption with minimal visibility; (ii) A slow-accumulating attack component that silently poisons parameters over multiple rounds to evade detection. Together, these components create a strong but imperceptible attack strategy that can bypass common defences. We evaluate the proposed attack through extensive simulations and demonstrate its effectiveness against eight state-of-the-art defence mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a hybrid sparse Byzantine attack in federated learning that combines (i) a sparse attack component selectively targeting high-sensitivity parameters (inspired by network pruning insights) with (ii) a slow-accumulating component over multiple rounds. The central claim is that this produces a strong yet imperceptible attack capable of bypassing eight state-of-the-art defenses, as demonstrated via simulations.

Significance. If the attack can be realized under standard FL threat models and the simulation results are reproducible, the work would highlight a structural weakness in current outlier-based defenses and motivate defenses that incorporate model architecture. The sparsity-motivated targeting is a potentially useful angle, but the absence of experimental details prevents assessment of whether the claimed bypass holds.

major comments (2)
  1. [Abstract] Abstract: the claim that the hybrid attack 'can bypass common defences' and is 'demonstrated against eight state-of-the-art defence mechanisms' is load-bearing, yet the abstract (and available text) provides no information on experimental setup, datasets, metrics, baselines, number of rounds, or potential confounds, rendering the central effectiveness claim unverifiable.
  2. [Threat Model / Attack Definition] Threat model / attack construction: the sparse component presupposes that the attacker can obtain a reliable per-parameter sensitivity ranking. In the standard FL threat model a malicious client receives only the current global weights and its own local data; the manuscript does not specify or demonstrate a local procedure that produces a faithful sensitivity map without extra non-local information. This assumption is load-bearing for the 'imperceptible yet effective' property.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater clarity in the abstract and threat model. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the hybrid attack 'can bypass common defences' and is 'demonstrated against eight state-of-the-art defence mechanisms' is load-bearing, yet the abstract (and available text) provides no information on experimental setup, datasets, metrics, baselines, number of rounds, or potential confounds, rendering the central effectiveness claim unverifiable.

    Authors: We agree the abstract should be self-contained to support its central claims. The full manuscript (Section 4) specifies the experimental setup, including datasets (MNIST, CIFAR-10), models (LeNet, ResNet-18), 100 clients with 10% malicious, 200 communication rounds, metrics (test accuracy, attack success rate), and the eight defenses (Krum, Median, Trimmed Mean, Bulyan, FLTrust, FoolsGold, RFA, and Multi-Krum). To address the concern, we will revise the abstract to concisely include key setup elements and the list of defenses evaluated. revision: yes

  2. Referee: [Threat Model / Attack Definition] Threat model / attack construction: the sparse component presupposes that the attacker can obtain a reliable per-parameter sensitivity ranking. In the standard FL threat model a malicious client receives only the current global weights and its own local data; the manuscript does not specify or demonstrate a local procedure that produces a faithful sensitivity map without extra non-local information. This assumption is load-bearing for the 'imperceptible yet effective' property.

    Authors: This is a substantive point on the threat model. The manuscript assumes the attacker has the model architecture (standard in FL, as the global model is broadcast) and can compute sensitivity locally. We will add a dedicated subsection in the revised version detailing a local procedure: the attacker uses the magnitude of parameters in the received global model combined with the norm of local gradients computed on its own data (adapted from pruning sensitivity metrics like those in SNIP). This requires no non-local information beyond what is available to any client. We will also include a brief validation showing the local ranking correlates with global impact. revision: yes

Circularity Check

0 steps flagged

No circularity; attack defined from external sparse-NN insights and evaluated empirically

full rationale

The paper constructs a hybrid sparse+accumulating Byzantine attack by explicitly defining its two components from external sparse-NN literature (sensitivity ranking and slow poisoning). No equations or claims reduce the attack definition to its own outputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported via self-citation. The central claim is an empirical demonstration against eight independent defenses; the derivation chain is self-contained and does not collapse to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard FL threat model assumptions and introduces two new attack components without external independent evidence beyond the described simulations.

axioms (2)
  • domain assumption Malicious clients (Byzantines) exist and can submit arbitrary poisoned updates in federated learning
    Core premise of the Byzantine problem in FL, invoked in the abstract's motivation section.
  • domain assumption Neural network parameter sensitivities can be identified and leveraged by attackers as side information
    Motivation paragraph states this as the basis for the sparse attack component.
invented entities (2)
  • sparse attack component no independent evidence
    purpose: Selectively manipulates parameters with higher sensitivity to cause maximum disruption with minimal visibility
    New attack element introduced to exploit NN architecture.
  • slow-accumulating attack component no independent evidence
    purpose: Silently poisons parameters over multiple rounds to evade detection
    New attack element introduced to complement the sparse component.

pith-pipeline@v0.9.0 · 5778 in / 1327 out tokens · 19997 ms · 2026-05-24T01:59:07.738410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 1 internal anchor

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

  2. [2]

    Federated learning: Strategies for improving com- munication efficiency,

    J. Kone ˇcn`y, H. B. McMahan, F. X. Yu, P. Richt ´arik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving com- munication efficiency,”NIPS workshop on Private Multiparty Ma- chine Learning, 2016

  3. [3]

    How to backdoor federated learning,

    E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

  4. [4]

    Analyzing federated learning through an adversarial lens,

    A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” inICML, 2019

  5. [5]

    The limitations of federated learning in sybil settings

    C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings.” inUsenix RAID, 2020

  6. [6]

    Can you really backdoor federated learning?

    Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

  7. [7]

    Dba: Distributed backdoor attacks against federated learning,

    C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2020

  8. [8]

    Machine learning with adversaries: Byzantine tolerant gradient descent,

    P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” inNIPS, 2017

  9. [9]

    Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,

    C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,” inUncer- tainty in Artificial Intelligence, 2020

  10. [10]

    A little is enough: Cir- cumventing defenses for distributed learning,

    G. Baruch, M. Baruch, and Y . Goldberg, “A little is enough: Cir- cumventing defenses for distributed learning,” inNeurIPS, 2019

  11. [11]

    Robust aggregation for federated learning,

    K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, 2022

  12. [12]

    Byzantine-robust distributed learning: Towards optimal statistical rates,

    D. Yin, Y . Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” inICML, 2018

  13. [13]

    The hidden vulnerability of distributed learning in byzantium,

    E. M. El Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vulnerability of distributed learning in byzantium,” inICML, 2018

  14. [14]

    Byzantine-robust federated machine learning through adaptive model averaging,

    L. Mu ˜noz-Gonz´alez, K. T. Co, and E. C. Lupu, “Byzantine-robust federated machine learning through adaptive model averaging,” arXiv preprint arXiv:1909.05125, 2019

  15. [15]

    FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,

    S. Han, B. Buyukates, Z. Hu, H. Jin, W. Jin, L. Sun, X. Wang, W. Wu, C. Xie, Y . Yao, K. Zhang, Q. Zhang, Y . Zhang, C. Joe- Wong, S. Avestimehr, and C. He, “FedSecurity: A benchmark for attacks and defenses in federated learning and federated LLMs,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, p. 5070–5081

  16. [16]

    Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,

    S. Han, W. Wu, B. Buyukates, W. Jin, Q. Zhang, Y . Yao, S. Aves- timehr, and C. He, “Kick bad guys out! conditionally activated anomaly detection in federated learning with zero-knowledge proof verification,”arXiv preprint arXiv:2310.04055, 2023

  17. [17]

    Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,

    X. Chen, T. Chen, H. Sun, S. Z. Wu, and M. Hong, “Distributed training with heterogeneous data: Bridging median-and mean- based algorithms,”Advances in Neural Information Processing Systems, vol. 33, pp. 21 616–21 626, 2020

  18. [18]

    Learning from history for byzantine robust optimization,

    S. P. Karimireddy, L. He, and M. Jaggi, “Learning from history for byzantine robust optimization,” inICML, 2021

  19. [19]

    Can decentralized learning be more robust than federated learning?

    M. Raynal, D. Pasquini, and C. Troncoso, “Can decentralized learning be more robust than federated learning?”arXiv, 2023

  20. [20]

    Genuinely distributed byzantine machine learning,

    E.-M. El-Mhamdi, R. Guerraoui, A. Guirguis, L. N. Hoang, and S. Rouault, “Genuinely distributed byzantine machine learning,” ser. PODC ’20. Association for Computing Machinery, 2020

  21. [21]

    AnO(log3/2 n) parallel time population protocol for majority withO(log n) states

    N. Gupta and N. H. Vaidya, “Fault-tolerance in distributed optimization: The case of redundancy,” inProceedings of the 39th Symposium on Principles of Distributed Computing, ser. PODC ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 365–374. [Online]. Available: https://doi.org/10.1145/3382734.3405748

  22. [22]

    Fltrust: Byzantine-robust federated learning via trust bootstrapping,

    X. Cao, M. Fang, J. Liu, and N. Z. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

  23. [23]

    Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,

    C. Xie, S. Koyejo, and I. Gupta, “Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,” inICML, 2019

  24. [24]

    Mixed nash for robust federated learning,

    W. Xie, T. Pethick, A. Ramezani-Kebrya, and V . Cevher, “Mixed nash for robust federated learning,”TMLR, 2023

  25. [26]

    Local model poisoning attacks to byzantine-robust federated learning,

    M. Fang, X. Cao, J. Jia, and N. Z. Gong, “Local model poisoning attacks to byzantine-robust federated learning,” inUSENIX Con- ference on Security Symposium, 2020

  26. [27]

    Byzantine-robust learning on heterogeneous datasets via bucketing,

    S. P. Karimireddy, L. He, and M. Jaggi, “Byzantine-robust learning on heterogeneous datasets via bucketing,” inICLR, 2022

  27. [28]

    Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,

    E. Gorbunov, S. Horv ´ath, P. Richt ´arik, and G. Gidel, “Variance reduction is an antidote to byzantines: Better rates, weaker assump- tions and communication compression as a cherry on the top,” in ICLR, 2023

  28. [29]

    Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,

    J. Peng, Z. Wu, Q. Ling, and T. Chen, “Byzantine-robust variance- reduced federated learning over distributed non-i.i.d. data,”Infor- mation Sciences, 2022

  29. [30]

    Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,

    Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance- reduced stochastic gradient descent with robustness to byzantine attacks,”IEEE Transactions on Signal Processing, 2020

  30. [31]

    Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,

    H. Zhu and Q. Ling, “Byzantine-robust aggregation with gradi- ent difference compression and stochastic variance reduction for federated learning,” inICASSP, 2022

  31. [32]

    Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,

    J. Peng, W. Li, and Q. Ling, “Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,” inICASSP, 2022

  32. [33]

    Byzantines can also learn from history: Fall of centered clipping in federated learning,

    K. ¨Ozfatura, E. ¨Ozfatura, A. K ¨upc ¸¨u, and D. Gunduz, “Byzantines can also learn from history: Fall of centered clipping in federated learning,”IEEE Transactions on Information Forensics and Secu- rity, vol. 19, pp. 2010–2022, 2024

  33. [34]

    Byzantine machine learning made easy by resilient averaging of momentums,

    S. Farhadkhani, R. Guerraoui, N. Gupta, R. Pinot, and J. Stephan, “Byzantine machine learning made easy by resilient averaging of momentums,” inICML, 2022

  34. [35]

    Distributed momentum for byzantine-resilient stochastic gradient descent,

    E.-M. El-Mhamdi, R. Guerraoui, and S. Rouault, “Distributed momentum for byzantine-resilient stochastic gradient descent,” in ICLR, 2021

  35. [36]

    Some methods of speeding up the convergence of iteration methods,

    B. Polyak, “Some methods of speeding up the convergence of iteration methods,”USSR Computational Mathematics and Math- ematical Physics, 1964

  36. [37]

    Adam: A method for stochastic opti- mization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,”ICLR, 2015

  37. [38]

    On the impor- tance of initialization and momentum in deep learning,

    I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the impor- tance of initialization and momentum in deep learning,” inICML, 2013

  38. [39]

    Fedadc: Accelerated federated learning with drift control,

    E. Ozfatura, K. Ozfatura, and D. G ¨und¨uz, “Fedadc: Accelerated federated learning with drift control,” inISIT, 2021, pp. 467–472

  39. [40]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429– 450, 2020

  40. [41]

    Federated learning based on dynamic regular- ization,

    D. A. E. Acar, Y . Zhao, R. Matas, M. Mattina, P. Whatmough, and V . Saligrama, “Federated learning based on dynamic regular- ization,” inICLR, 2021

  41. [42]

    Feddc: Federated learning with non-iid data via local drift decoupling and correction,

    L. Gao, H. Fu, L. Li, Y . Chen, M. Xu, and C.-Z. Xu, “Feddc: Federated learning with non-iid data via local drift decoupling and correction,” inCVPR, 2022

  42. [43]

    Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,

    J. Wang, V . Tantia, N. Ballas, and M. G. Rabbat, “Slowmo: Improving communication-efficient distributed SGD with slow mo- mentum,”ICLR, 2020

  43. [44]

    Scaffold: Stochastic controlled averaging for federated learning,

    S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” inICML, 2020

  44. [45]

    A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),

    Y . Nesterov, “A method for solving the convex programming problem with convergence rate𝑜(1/𝑘 2 ),”Proceedings of the USSR Academy of Sciences, 1983

  45. [46]

    Byzantine- robust decentralized learning via self-centered clipping,

    L. H. andfan Sai Praneeth Karimireddy and M. Jaggi, “Byzantine- robust decentralized learning via self-centered clipping,”ArXiv, 2022

  46. [47]

    The byzantine generals problem,

    L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,”ACM Trans. Program. Lang. Syst., 1982

  47. [48]

    signsgd with majority vote is communication efficient and fault tolerant,

    J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, “signsgd with majority vote is communication efficient and fault tolerant,” inICLR, 2019

  48. [49]

    Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,

    R. Jin, Y . Liu, Y . Huang, X. He, T. Wu, and H. Dai, “Sign- based gradient descent with heterogeneous data: Convergence and byzantine resilience,”TNNLS, 2024

  49. [50]

    Byzantine- robust learning on heterogeneous data via gradient splitting,

    Y . Liu, C. Chen, L. Lyu, F. Wu, S. Wu, and G. Chen, “Byzantine- robust learning on heterogeneous data via gradient splitting,” in ICML, 2023

  50. [51]

    The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,

    S. Liu, T. Chen, X. Chen, L. Shen, D. C. Mocanu, Z. Wang, and M. Pechenizkiy, “The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training,” in International Conference on Learning Representations, 2022

  51. [52]

    Progressive skeletonization: Trimming more fat from a network at initialization,

    P. de e, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” inICLR, 2021

  52. [53]

    Pruning neural networks without any data by iteratively conserving synaptic flow,

    H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” inNeurips, 2020

  53. [54]

    Powersgd: Practi- cal low-rank gradient compression for distributed optimization,

    T. V ogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practi- cal low-rank gradient compression for distributed optimization,” Neurips, vol. 32, 2019

  54. [55]

    Efficient lottery ticket finding: Less data is more,

    Z. Zhang, X. Chen, T. Chen, and Z. Wang, “Efficient lottery ticket finding: Less data is more,” inICML, 2021

  55. [56]

    Group fisher pruning for practical network compression,

    L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y . Chen, W. Yang, Q. Liao, and W. Zhang, “Group fisher pruning for practical network compression,” inICML, 2021

  56. [57]

    Rare gems: Finding lottery tickets at initialization,

    K. Sreenivasan, J. yong Sohn, L. Yang, M. Grinde, A. Nagle, H. Wang, E. Xing, K. Lee, and D. Papailiopoulos, “Rare gems: Finding lottery tickets at initialization,” inNeurips, 2022

  57. [58]

    Dual lottery ticket hypothesis,

    Y . Bai, H. Wang, Z. TAO, K. Li, and Y . Fu, “Dual lottery ticket hypothesis,” inICLR, 2022

  58. [59]

    Layer-adaptive sparsity for the magnitude-based pruning,

    J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-adaptive sparsity for the magnitude-based pruning,” inICLR, 2021

  59. [60]

    Poisoning attacks against support vector machines,

    B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” inICML, 2012

  60. [61]

    Cifar-10 (canadian institute for advanced research)

    A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).”

  61. [62]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016

  62. [63]

    Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,

    V . Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,” inIEEE SP, 2022

  63. [64]

    Robust federated learning with attack- adaptive aggregation,

    C. P. Wan and Q. Chen, “Robust federated learning with attack- adaptive aggregation,”arXiv, 2021

  64. [65]

    Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,

    A. Gupta, T. Luo, M. V . Ngo, and S. K. Das, “Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning,” inESORICS, 2022

  65. [66]

    Defending against data poisoning attack in federated learning with non-iid data,

    C. Yin and Q. Zeng, “Defending against data poisoning attack in federated learning with non-iid data,”IEEE Transactions on Computational Social Systems, 2023

  66. [67]

    Estimating a dirichlet distribution,

    T. Minka, “Estimating a dirichlet distribution,” 2000

  67. [68]

    Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,

    L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling, “Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 1544–1551

  68. [69]

    Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,

    J. Xu, Z. Zhang, and R. Hu, “Achieving byzantine-resilient feder- ated learning via layer-adaptive sparsified model aggregation,” in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1508–1517

  69. [70]

    Do we really need to design new byzantine-robust aggregation rules?

    M. Fang, S. Nabavirazavi, Z. Liu, W. Sun, S. S. Iyengar, and H. Yang, “Do we really need to design new byzantine-robust aggregation rules?” inNDSS, 2025

  70. [71]

    Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,

    H. Mostafa and X. Wang, “Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameteriza- tion,” inInternational Conference on Machine Learning. PMLR, 2019, pp. 4646–4655

  71. [72]

    Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,

    D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta, “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,”Nature communications, vol. 9, no. 1, p. 2383, 2018

  72. [73]

    SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,

    N. Lee, T. Ajanthan, and P. Torr, “SNIP: SINGLE-SHOT NET- WORK PRUNING BASED ON CONNECTION SENSITIVITY ,” inICLR, 2019

  73. [74]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks,

    J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” inICLR, 2019

  74. [75]

    Pruning Convolutional Neural Networks for Resource Efficient Inference

    P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” arXiv preprint arXiv:1611.06440, 2016

  75. [76]

    Advances and open problems in federated learning,

    P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cum- mingset al., “Advances and open problems in federated learning,” Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

  76. [77]

    Rigging the lottery: Making all tickets winners,

    U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” inICML, 2020

  77. [78]

    Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,

    V . Shejwalkar and A. Houmansadr, “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,” inNDSS, 2021

  78. [79]

    Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error

    Y . Xie, M. Fang, and N. Z. Gong, “Fedredefense: Defending against model poisoning attacks for federated learning using model update reconstruction error.” International Conference on Machine Learning, 2024

  79. [80]

    Fl-defender: Combating targeted attacks in federated learning,

    N. M. Jebreel and J. Domingo-Ferrer, “Fl-defender: Combating targeted attacks in federated learning,”Knowledge-Based Systems, vol. 260, p. 110178, 2023

  80. [81]

    {FLAME}: Taming backdoors in federated learning,

    T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouni et al., “{FLAME}: Taming backdoors in federated learning,” in 31st USENIX security symposium (USENIX Security 22), 2022, pp. 1415–1432

Showing first 80 references.