Hammer and Anvil: Toward a Theory of Backdoors in Federated Learning

Florian Kerschbaum; Jacob Yan; Lucas Fenaux; Nathan Chung; Zheng Wang

arxiv: 2509.08089 · v2 · submitted 2025-09-09 · 💻 cs.LG · cs.CR

Hammer and Anvil: Toward a Theory of Backdoors in Federated Learning

Lucas Fenaux , Zheng Wang , Jacob Yan , Nathan Chung , Florian Kerschbaum This is my paper

Pith reviewed 2026-05-18 17:20 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords federated learningbackdoor attacksadaptive adversariesoutlier detectionrobust aggregationdefense frameworksmodel poisoning

0 comments

The pith

A principled combination of outlier detection and removal-based defenses defeats adaptive backdoor attacks in federated learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that backdoor attacks in federated learning differ in how far their model updates stray from the average of honest updates. This deviation, called δ, determines which existing defenses can catch them. Defenses good at spotting large deviations leave room for subtle attacks, while those targeting small deviations miss obvious ones. Combining the two types in a principled way closes the gap and works even when the attacker knows the defense details and all other updates. This offers a more complete protection for distributed training systems.

Core claim

Backdoors can be categorized by the deviation δ of their updates from the mean update. Type 1 defenses handle large-δ attacks through outlier detection and robust aggregation, while Type 2 defenses address small-δ attacks through removal. Single-type defenses and non-principled combinations leave exploitable gaps for adaptive adversaries, but principled combinations of Type 1 and Type 2, such as HA_Flame^CSFT, HA_Krum^CSFT, and HA_Multi-Metrics^CSFT, remain effective against a full-information adaptive adversary across various datasets and settings.

What carries the argument

The Hammer and Anvil framework that classifies backdoors according to the scalar deviation δ of malicious updates from the mean and combines Type 1 (Anvil) outlier detection with Type 2 (Hammer) removal defenses.

Load-bearing premise

That all backdoor attacks can be effectively classified and blocked using the scalar deviation δ of their update from the mean update.

What would settle it

Demonstration of a backdoor attack succeeding against HA_Flame^CSFT or similar combined defense when the attacker knows the benign updates, aggregation algorithm, and all parameters.

Figures

Figures reproduced from arXiv: 2509.08089 by Florian Kerschbaum, Jacob Yan, Lucas Fenaux, Nathan Chung, Zheng Wang.

**Figure 3.** Figure 3: Simplified example of the adaptive attack on MoM [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Learning rate scheduling of super-fine-tuning, taken [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Federated models backdoored with m = 1 malicious clients, fine-tuned with 2000 samples over different gradient clipping thresholds. 5.6 Clipped-super-fine-tuning Super-fine-tuning [16], a modified fine-tuning algorithm that periodically adjusts the learning rate used during fine-tuning, has been demonstrated to be an effective method for removing backdoors post-training. While initially designed for backd… view at source ↗

**Figure 7.** Figure 7: Federated models backdoored with m = 8 malicious clients, fine-tuned with 2000 samples over different gradient clipping thresholds. how adaptive attacks can break them. Following the principle described in Section 4, we present a new defense approach: HA, that combines CSFT with another defense, to produce a defense capable of preventing backdoors against adversaries with exact knowledge of the benign clie… view at source ↗

**Figure 8.** Figure 8: The effect of varying the size of the fine-tuning data [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 10.** Figure 10: CSFT of federated models with 500 (1%) samples. number of fine-tuning epochs used for CSFT. Throughout our experiments, we set the number of fine-tuning epochs to 100. However, in this section, we investigate how varying the number of fine-tuning epochs, ranging from 0 to 100, affects the performance of Krum+. We present the finetuning trace of federated models with 2000 (4%) samples with m ∈ {1,2,4,8} … view at source ↗

**Figure 11.** Figure 11: Experimental results with m ∈ [1,2,4,8] malicious clients out of n = 20 total benign clients with fine tune sample size of 4% (2000 samples). 16 [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Experimental results with m ∈ [1,2,4,8] malicious clients out of n = 20 total benign clients with fine tune sample size of 1% (500 samples). 17 [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

read the original abstract

Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present "Hammer and Anvil", a principled theoretical framework that categorizes backdoors by the deviation, $\delta$, of their updates to the mean of the updates. We identify two fundamental defense types: "Type 1 (The Anvil)", comprising outlier detection and robust aggregation effective against large-deviation attacks, and "Type 2 (The Hammer)", consisting of removal-based defenses effective against small-deviation attacks. We demonstrate that defenses of a single type and non-principled combined defenses inherently leave an exploitable gap for adaptive attackers. To bridge this gap, we propose the principled combination of Type 1 and Type 2 defenses. We evaluate our framework against a new, worst-case, full-information adaptive adversary that knows the benign updates, the aggregation algorithm, and its parameters, and yet this adversary fails against our combined defenses. Our empirical evaluation across various datasets and settings shows that single-typed and non-principled combined defenses are easily broken, often by a single malicious client. In contrast, our best combined defense variants, $HA_{Flame}^{CSFT}$, $HA_{Krum}^{CSFT}$, and $HA_{Multi-Metrics}^{CSFT}$, remain undefeated even in the most adversarial settings. Our results provide a principled approach for research on backdoors in federated learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean taxonomy splitting FL backdoors by deviation δ and shows principled Type-1-plus-Type-2 combos resist full-info adaptive attacks, though the boundary case around δ still needs verification.

read the letter

The paper's main contribution is a taxonomy that divides backdoor attacks according to the deviation δ of their updates from the mean. Type 1 covers large deviations with outlier detection and robust aggregation. Type 2 covers small deviations with removal-based approaches. The central point is that single-type defenses and non-principled combinations leave room for a full-information adaptive adversary, while the principled combination does not.

Referee Report

3 major / 2 minor

Summary. The manuscript presents the 'Hammer and Anvil' framework for backdoor attacks in federated learning. Attacks are categorized by the scalar deviation δ of a malicious update from the mean of benign updates. Type-1 defenses (outlier detection and robust aggregation) target large-δ attacks while Type-2 defenses (removal-based) target small-δ attacks. The central claim is that single-type defenses and non-principled combinations leave exploitable gaps for adaptive adversaries, but principled Type-1-plus-Type-2 combinations close the gap. This is evaluated against a new worst-case full-information adaptive adversary that knows benign updates, the aggregation rule, and all parameters; empirical results across datasets show that variants such as HA_Flame^CSFT, HA_Krum^CSFT, and HA_Multi-Metrics^CSFT remain effective while single-type and non-principled defenses are defeated, often by one malicious client.

Significance. If the central claims hold, the work supplies a useful organizing principle for combining existing FL defenses and demonstrates that principled combinations can resist a strong full-information adversary where isolated defenses fail. The explicit worst-case adversary model and the breadth of empirical settings are positive contributions. The framework may help explain observed defense failures and guide future designs, though its generality hinges on whether the scalar-δ classification is exhaustive.

major comments (3)

[Abstract and §3] Abstract and §3 (framework definition): the claim that every backdoor update is 'usefully described by a single scalar δ' and that the principled combination therefore necessarily covers the entire space is load-bearing for the 'toward a theory' contribution. The manuscript must show, either by proof or exhaustive case analysis, that an adaptive full-information adversary cannot produce an update whose effective deviation lies in the transition region or exploits the particular metric used to compute δ, thereby evading both defense types simultaneously.
[§4 and experimental section] §4 (adversary model) and experimental section: the description of how the full-information adversary constructs its update (knowing the exact benign updates, aggregation parameters, and defense thresholds) is insufficiently detailed. Without an explicit algorithm or pseudocode showing the adversary's optimization over δ and direction, it is impossible to confirm that the reported 'undefeated' status of the combined variants is not an artifact of an incomplete adversary implementation.
[Results table/figure] Table or figure reporting attack success rates (e.g., the table or plot containing HA_Flame^CSFT results): the manuscript should report the precise δ threshold used to separate Type-1 and Type-2 regimes and demonstrate that the adversary was allowed to optimize across that threshold; if the threshold is a free parameter, the claim that the combination is 'principled' and gap-free requires a sensitivity analysis.

minor comments (2)

[Notation and abstract] The superscript notation HA_Flame^CSFT (and analogous variants) is introduced without an immediate expansion; define the components (CSFT, etc.) at first use.
[Figures] Ensure all figures plotting attack success versus number of malicious clients or δ values include error bars or multiple random seeds and clearly distinguish the single-type, non-principled, and principled-combination curves.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (framework definition): the claim that every backdoor update is 'usefully described by a single scalar δ' and that the principled combination therefore necessarily covers the entire space is load-bearing for the 'toward a theory' contribution. The manuscript must show, either by proof or exhaustive case analysis, that an adaptive full-information adversary cannot produce an update whose effective deviation lies in the transition region or exploits the particular metric used to compute δ, thereby evading both defense types simultaneously.

Authors: We agree that substantiating the exhaustiveness of the scalar-δ categorization is central to the theoretical framing. Section 3 defines δ as the Euclidean deviation from the benign mean and argues that this scalar distinguishes the regimes targeted by Type-1 versus Type-2 defenses. We do not supply a formal proof that no update can evade by metric exploitation or by landing precisely in a transition region; instead, the framework treats δ as a useful organizing scalar for any update in the space. To respond, we will add a dedicated paragraph in the revised §3 that provides additional justification for why metric-specific evasion is countered by the combined defense and includes a short case analysis of transition-region attempts. This addition will be supported by the existing empirical results in which the full-information adversary, free to select any δ, still fails against the principled combinations. The revision is therefore partial, as a complete formal proof would require further theoretical work beyond the present scope. revision: partial
Referee: [§4 and experimental section] §4 (adversary model) and experimental section: the description of how the full-information adversary constructs its update (knowing the exact benign updates, aggregation parameters, and defense thresholds) is insufficiently detailed. Without an explicit algorithm or pseudocode showing the adversary's optimization over δ and direction, it is impossible to confirm that the reported 'undefeated' status of the combined variants is not an artifact of an incomplete adversary implementation.

Authors: We accept that the adversary construction requires greater explicitness. Section 4 states the adversary's knowledge (benign updates, aggregation rule, all defense parameters) and its objective, yet omits step-by-step construction details. In the revision we will insert a new algorithm box that specifies the procedure: (i) compute the benign mean from the known updates, (ii) perform a search over candidate δ magnitudes and perturbation directions, (iii) evaluate each candidate against the known defense thresholds, and (iv) select the update that maximizes backdoor success while attempting to evade the combined defense. This addition will make the optimization transparent and allow independent verification that the reported resilience of HA_Flame^CSFT, HA_Krum^CSFT, and HA_Multi-Metrics^CSFT is not an artifact of an under-powered adversary. revision: yes
Referee: [Results table/figure] Table or figure reporting attack success rates (e.g., the table or plot containing HA_Flame^CSFT results): the manuscript should report the precise δ threshold used to separate Type-1 and Type-2 regimes and demonstrate that the adversary was allowed to optimize across that threshold; if the threshold is a free parameter, the claim that the combination is 'principled' and gap-free requires a sensitivity analysis.

Authors: We agree that the separation threshold and cross-threshold optimization must be stated explicitly. The threshold is not an arbitrary free parameter; it is derived from the concrete Type-1 defense (e.g., neighbor-selection radius in Krum or clustering cutoff in Flame). We will revise the experimental section to list the exact numerical threshold applied in each reported setting. In addition, we will include a sensitivity plot (new figure or appendix) that varies the threshold while allowing the adversary to optimize δ continuously across the boundary. The results of this analysis will be summarized in the text to confirm that the principled Type-1-plus-Type-2 combinations remain effective even when the adversary is granted freedom to straddle the regimes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual framework is self-contained

full rationale

The paper introduces a categorization of backdoor attacks by scalar deviation δ from the mean update and defines Type 1 (outlier/robust aggregation for large δ) and Type 2 (removal for small δ) defenses as a new organizational lens. It conceptually argues that single-type defenses and non-principled combinations leave exploitable gaps for adaptive adversaries while principled Type-1-plus-Type-2 combinations close them, supported by empirical evaluation against a full-information adversary. No equations, fitted parameters, or self-citations are shown to reduce the central claims to their own inputs by construction; the derivation is an independent conceptual organization rather than a tautological or self-referential loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the new deviation-based categorization and on the empirical claim that the combined defenses survive the stated adversary; both are introduced by the paper rather than taken from prior literature.

free parameters (1)

deviation threshold separating Type-1 and Type-2 regimes
The scalar δ that decides whether an attack is treated as large-deviation or small-deviation is a modeling choice that affects which defense is applied.

axioms (1)

domain assumption Backdoor attacks can be exhaustively categorized by the deviation of their updates from the mean of benign updates.
This is the load-bearing premise that lets the authors split defenses into two types and claim the combination is complete.

pith-pipeline@v0.9.0 · 5817 in / 1318 out tokens · 39792 ms · 2026-05-18T17:20:27.899753+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

δ represents the magnitude of a given backdoor attack, with weakly inserted backdoors with smaller l2-norm updates having a small δ and larger l2-norm attack updates having a larger δ. ... an attacker can only succeed for δ∈[δ₂,δ₁]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–

work page
[2]

Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Sid- dharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

work page 2019
[3]

Dba: Distributed backdoor attacks against federated learning

Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. Dba: Distributed backdoor attacks against federated learning. InInternational conference on learning representations, 2019

work page 2019
[4]

Neurotoxin: Durable backdoors in federated learning

Zhengming Zhang, Ashwinee Panda, Linyue Song, Yao- qing Yang, Michael Mahoney, Prateek Mittal, Ram- chandran Kannan, and Joseph Gonzalez. Neurotoxin: Durable backdoors in federated learning. InInterna- tional Conference on Machine Learning, pages 26429– 26446. PMLR, 2022

work page 2022
[5]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learn- ing systems using data poisoning.arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Forget-me-not: Making backdoor hard to be forgotten in fine-tuning

Tran Ngoc Huynh, Anh Tuan Tran, Khoa D Doan, and Tung Pham. Forget-me-not: Making backdoor hard to be forgotten in fine-tuning

work page
[7]

How to backdoor fed- erated learning

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Debo- rah Estrin, and Vitaly Shmatikov. How to backdoor fed- erated learning. InInternational conference on artificial intelligence and statistics, pages 2938–2948. PMLR, 2020

work page 2020
[8]

Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions.En- gineering Applications of Artificial Intelligence, 127: 107166, 2024

Thuy Dung Nguyen, Tuan Nguyen, Phi Le Nguyen, Hieu H Pham, Khoa D Doan, and Kok-Seng Wong. Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions.En- gineering Applications of Artificial Intelligence, 127: 107166, 2024

work page 2024
[9]

Machine learning with adver- saries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guer- raoui, and Julien Stainer. Machine learning with adver- saries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

work page 2017
[10]

Efficient median of means estimator

Stanislav Minsker. Efficient median of means estimator. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5925–5933. PMLR, 2023

work page 2023
[11]

Can you really backdoor federated learning?

Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H Brendan McMahan. Can you really backdoor federated learning?arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911
[12]

Flare: defending federated learning against model poisoning attacks via latent space representations

Ning Wang, Yang Xiao, Yimin Chen, Yang Hu, Wenjing Lou, and Y Thomas Hou. Flare: defending federated learning against model poisoning attacks via latent space representations. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Se- curity, pages 946–958, 2022

work page 2022
[13]

{FLAME}: Taming backdoors in fed- erated learning

Thien Duc Nguyen, Phillip Rieger, Huili Chen, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Shaza Zeitouni, et al. {FLAME}: Taming backdoors in fed- erated learning. In31st USENIX security symposium (USENIX Security 22), pages 1415–1432, 2022

work page 2022
[14]

Mitigating backdoor attacks in federated learning.arXiv preprint arXiv:2011.01767, 2020

Chen Wu, Xian Yang, Sencun Zhu, and Prasenjit Mitra. Mitigating backdoor attacks in federated learning.arXiv preprint arXiv:2011.01767, 2020

work page arXiv 2011
[15]

Fine-pruning: Defending against backdooring attacks on deep neural networks

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. InInternational symposium on research in attacks, intrusions, and defenses, pages 273–294. Springer, 2018

work page 2018
[16]

Fine-tuning is all you need to miti- gate backdoor attacks.arXiv preprint arXiv:2212.09067, 2022

Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Hum- bert, and Yang Zhang. Fine-tuning is all you need to miti- gate backdoor attacks.arXiv preprint arXiv:2212.09067, 2022. 14

work page arXiv 2022
[17]

Poisoning with cerberus: Stealthy and colluded backdoor attack against federated learning

Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Bin Wang, Jiqiang Liu, and Xiangliang Zhang. Poisoning with cerberus: Stealthy and colluded backdoor attack against federated learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9020–9028, 2023

work page 2023
[18]

Sba: A swift and stealthy backdoor attack framework for federated learning

Junhan Wang, Zhangming Wu, Zhuoyue Wang, and Lu Dong. Sba: A swift and stealthy backdoor attack framework for federated learning. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025
[19]

Lurking in the shadows: Unveiling stealthy back- door attacks against personalized federated learning

Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yong- sheng Zhu, Guangquan Xu, Jiqiang Liu, and Xiangliang Zhang. Lurking in the shadows: Unveiling stealthy back- door attacks against personalized federated learning. In 33rd USENIX Security Symposium (USENIX Security 24), pages 4157–4174, 2024

work page 2024
[20]

Probabilistic interpretation of feedfor- ward classification network outputs, with relationships to statistical pattern recognition

John S Bridle. Probabilistic interpretation of feedfor- ward classification network outputs, with relationships to statistical pattern recognition. InNeurocomputing: Algorithms, architectures and applications, pages 227–

work page
[21]

Analyzing federated learning through an adversarial lens

Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mit- tal, and Seraphin Calo. Analyzing federated learning through an adversarial lens. InInternational conference on machine learning, pages 634–643. PMLR, 2019

work page 2019
[22]

A survey on vulnerability of federated learn- ing: A learning algorithm perspective.Neurocomputing, 573:127225, 2024

Xianghua Xie, Chen Hu, Hanchi Ren, and Jingjing Deng. A survey on vulnerability of federated learn- ing: A learning algorithm perspective.Neurocomputing, 573:127225, 2024

work page 2024
[23]

Trustworthy federated learning: A comprehensive review, architecture, key challenges, and future research prospects.IEEE Open Journal of the Communications Society, 2024

Asadullah Tariq, Mohamed Adel Serhani, Farag M Sal- labi, Ezedin S Barka, Tariq Qayyum, Heba M Khater, and Khaled A Shuaib. Trustworthy federated learning: A comprehensive review, architecture, key challenges, and future research prospects.IEEE Open Journal of the Communications Society, 2024

work page 2024
[24]

A survey of security threats in federated learning.Complex & Intelligent Systems, 11(2):1–26, 2025

Yunhao Feng, Yanming Guo, Yinjian Hou, Yulun Wu, Mingrui Lao, Tianyuan Yu, and Gang Liu. A survey of security threats in federated learning.Complex & Intelligent Systems, 11(2):1–26, 2025

work page 2025
[25]

Backdoor attacks and defense mechanisms in federated learning: A survey.Information Fusion, page 103248, 2025

Zhaozheng Li, Jiahe Lan, Zheng Yan, and Erol Gelenbe. Backdoor attacks and defense mechanisms in federated learning: A survey.Information Fusion, page 103248, 2025

work page 2025
[26]

Yichen Wan, Youyang Qu, Wei Ni, Yong Xiang, Longxi- ang Gao, and Ekram Hossain. Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanisms: A comprehensive survey.IEEE Communications Surveys & Tutorials, 26(3):1861–1897, 2024

work page 2024
[27]

Adaptive adversaries in byzantine-robust federated learn- ing: A survey.Cryptology ePrint Archive, 2025

Jakub Kacper Szel ˛ ag, Ji-Jian Chin, and Sook-Chin Yip. Adaptive adversaries in byzantine-robust federated learn- ing: A survey.Cryptology ePrint Archive, 2025

work page 2025
[28]

An investigation of recent backdoor attacks and defenses in federated learn- ing

Qiuxian Chen and Yizheng Tao. An investigation of recent backdoor attacks and defenses in federated learn- ing. In2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC), pages 262–269. IEEE, 2023

work page 2023
[29]

Mesas: Poi- soning defense for federated learning resilient against adaptive attackers

Torsten Krauß and Alexandra Dmitrienko. Mesas: Poi- soning defense for federated learning resilient against adaptive attackers. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1526–1540, 2023

work page 2023
[30]

Automatic adversarial adaption for stealthy poisoning attacks in federated learning

Torsten Krauß, Jan König, Alexandra Dmitrienko, and Christian Kanzow. Automatic adversarial adaption for stealthy poisoning attacks in federated learning. InTo appear soon at the Network and Distributed System Security Symposium (NDSS), 2024

work page 2024
[31]

The hidden vulnerability of distributed learning in byzantium

Rachid Guerraoui, Sébastien Rouault, et al. The hidden vulnerability of distributed learning in byzantium. In International conference on machine learning, pages 3521–3530. PMLR, 2018

work page 2018
[32]

Byzantine-robust distributed learning: Towards optimal statistical rates

Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. InInternational con- ference on machine learning, pages 5650–5659. Pmlr, 2018

work page 2018
[33]

Learning multiple layers of features from tiny images, 2009

Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009. InTechnical report

work page 2009
[34]

Rofl: Robustness of secure federated learning

Hidde Lycklama, Lukas Burkhalter, Alexander Viand, Nicolas Küchler, and Anwar Hithnawi. Rofl: Robustness of secure federated learning. In2023 IEEE Symposium on Security and Privacy (SP), pages 453–476. IEEE, 2023

work page 2023
[35]

Backdoor federated learning by poisoning backdoor-critical layers.arXiv preprint arXiv:2308.04466, 2023

Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, and Xu Yuan. Backdoor federated learning by poisoning backdoor-critical layers.arXiv preprint arXiv:2308.04466, 2023

work page arXiv 2023
[36]

The mnist database of handwritten digits,

Yann LeCun. The mnist database of handwritten digits,

work page
[37]

InTechnical report. Appendix 15 0 25 50 75 100 Epoch 0.2 0.4 0.6 0.8Clean Accuracy Accuracy badnet krum local m=1 m=2 m=4 m=8 0 25 50 75 100 Epoch 0.5 0.6 0.7 0.8 0.9 1.0ASR ASR badnet krum success m=1 m=2 m=4 m=8 0 25 50 75 100 0.2 0.4 0.6 0.8 Accuracy blended krum 0 25 50 75 100 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ASR blended krum 0 25 50 75 100 0.2 0.4 0.6 0.8...

work page 2000

[1] [1]

Communication- efficient learning of deep networks from decentralized data

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–

work page

[2] [2]

Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Sid- dharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks.IEEE Access, 7:47230–47244, 2019

work page 2019

[3] [3]

Dba: Distributed backdoor attacks against federated learning

Chulin Xie, Keli Huang, Pin-Yu Chen, and Bo Li. Dba: Distributed backdoor attacks against federated learning. InInternational conference on learning representations, 2019

work page 2019

[4] [4]

Neurotoxin: Durable backdoors in federated learning

Zhengming Zhang, Ashwinee Panda, Linyue Song, Yao- qing Yang, Michael Mahoney, Prateek Mittal, Ram- chandran Kannan, and Joseph Gonzalez. Neurotoxin: Durable backdoors in federated learning. InInterna- tional Conference on Machine Learning, pages 26429– 26446. PMLR, 2022

work page 2022

[5] [5]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learn- ing systems using data poisoning.arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Forget-me-not: Making backdoor hard to be forgotten in fine-tuning

Tran Ngoc Huynh, Anh Tuan Tran, Khoa D Doan, and Tung Pham. Forget-me-not: Making backdoor hard to be forgotten in fine-tuning

work page

[7] [7]

How to backdoor fed- erated learning

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Debo- rah Estrin, and Vitaly Shmatikov. How to backdoor fed- erated learning. InInternational conference on artificial intelligence and statistics, pages 2938–2948. PMLR, 2020

work page 2020

[8] [8]

Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions.En- gineering Applications of Artificial Intelligence, 127: 107166, 2024

Thuy Dung Nguyen, Tuan Nguyen, Phi Le Nguyen, Hieu H Pham, Khoa D Doan, and Kok-Seng Wong. Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions.En- gineering Applications of Artificial Intelligence, 127: 107166, 2024

work page 2024

[9] [9]

Machine learning with adver- saries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guer- raoui, and Julien Stainer. Machine learning with adver- saries: Byzantine tolerant gradient descent.Advances in neural information processing systems, 30, 2017

work page 2017

[10] [10]

Efficient median of means estimator

Stanislav Minsker. Efficient median of means estimator. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5925–5933. PMLR, 2023

work page 2023

[11] [11]

Can you really backdoor federated learning?

Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H Brendan McMahan. Can you really backdoor federated learning?arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911

[12] [12]

Flare: defending federated learning against model poisoning attacks via latent space representations

Ning Wang, Yang Xiao, Yimin Chen, Yang Hu, Wenjing Lou, and Y Thomas Hou. Flare: defending federated learning against model poisoning attacks via latent space representations. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Se- curity, pages 946–958, 2022

work page 2022

[13] [13]

{FLAME}: Taming backdoors in fed- erated learning

Thien Duc Nguyen, Phillip Rieger, Huili Chen, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Shaza Zeitouni, et al. {FLAME}: Taming backdoors in fed- erated learning. In31st USENIX security symposium (USENIX Security 22), pages 1415–1432, 2022

work page 2022

[14] [14]

Mitigating backdoor attacks in federated learning.arXiv preprint arXiv:2011.01767, 2020

Chen Wu, Xian Yang, Sencun Zhu, and Prasenjit Mitra. Mitigating backdoor attacks in federated learning.arXiv preprint arXiv:2011.01767, 2020

work page arXiv 2011

[15] [15]

Fine-pruning: Defending against backdooring attacks on deep neural networks

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. InInternational symposium on research in attacks, intrusions, and defenses, pages 273–294. Springer, 2018

work page 2018

[16] [16]

Fine-tuning is all you need to miti- gate backdoor attacks.arXiv preprint arXiv:2212.09067, 2022

Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Hum- bert, and Yang Zhang. Fine-tuning is all you need to miti- gate backdoor attacks.arXiv preprint arXiv:2212.09067, 2022. 14

work page arXiv 2022

[17] [17]

Poisoning with cerberus: Stealthy and colluded backdoor attack against federated learning

Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Bin Wang, Jiqiang Liu, and Xiangliang Zhang. Poisoning with cerberus: Stealthy and colluded backdoor attack against federated learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9020–9028, 2023

work page 2023

[18] [18]

Sba: A swift and stealthy backdoor attack framework for federated learning

Junhan Wang, Zhangming Wu, Zhuoyue Wang, and Lu Dong. Sba: A swift and stealthy backdoor attack framework for federated learning. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025

[19] [19]

Lurking in the shadows: Unveiling stealthy back- door attacks against personalized federated learning

Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yong- sheng Zhu, Guangquan Xu, Jiqiang Liu, and Xiangliang Zhang. Lurking in the shadows: Unveiling stealthy back- door attacks against personalized federated learning. In 33rd USENIX Security Symposium (USENIX Security 24), pages 4157–4174, 2024

work page 2024

[20] [20]

Probabilistic interpretation of feedfor- ward classification network outputs, with relationships to statistical pattern recognition

John S Bridle. Probabilistic interpretation of feedfor- ward classification network outputs, with relationships to statistical pattern recognition. InNeurocomputing: Algorithms, architectures and applications, pages 227–

work page

[21] [21]

Analyzing federated learning through an adversarial lens

Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mit- tal, and Seraphin Calo. Analyzing federated learning through an adversarial lens. InInternational conference on machine learning, pages 634–643. PMLR, 2019

work page 2019

[22] [22]

A survey on vulnerability of federated learn- ing: A learning algorithm perspective.Neurocomputing, 573:127225, 2024

Xianghua Xie, Chen Hu, Hanchi Ren, and Jingjing Deng. A survey on vulnerability of federated learn- ing: A learning algorithm perspective.Neurocomputing, 573:127225, 2024

work page 2024

[23] [23]

Trustworthy federated learning: A comprehensive review, architecture, key challenges, and future research prospects.IEEE Open Journal of the Communications Society, 2024

Asadullah Tariq, Mohamed Adel Serhani, Farag M Sal- labi, Ezedin S Barka, Tariq Qayyum, Heba M Khater, and Khaled A Shuaib. Trustworthy federated learning: A comprehensive review, architecture, key challenges, and future research prospects.IEEE Open Journal of the Communications Society, 2024

work page 2024

[24] [24]

A survey of security threats in federated learning.Complex & Intelligent Systems, 11(2):1–26, 2025

Yunhao Feng, Yanming Guo, Yinjian Hou, Yulun Wu, Mingrui Lao, Tianyuan Yu, and Gang Liu. A survey of security threats in federated learning.Complex & Intelligent Systems, 11(2):1–26, 2025

work page 2025

[25] [25]

Backdoor attacks and defense mechanisms in federated learning: A survey.Information Fusion, page 103248, 2025

Zhaozheng Li, Jiahe Lan, Zheng Yan, and Erol Gelenbe. Backdoor attacks and defense mechanisms in federated learning: A survey.Information Fusion, page 103248, 2025

work page 2025

[26] [26]

Yichen Wan, Youyang Qu, Wei Ni, Yong Xiang, Longxi- ang Gao, and Ekram Hossain. Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanisms: A comprehensive survey.IEEE Communications Surveys & Tutorials, 26(3):1861–1897, 2024

work page 2024

[27] [27]

Adaptive adversaries in byzantine-robust federated learn- ing: A survey.Cryptology ePrint Archive, 2025

Jakub Kacper Szel ˛ ag, Ji-Jian Chin, and Sook-Chin Yip. Adaptive adversaries in byzantine-robust federated learn- ing: A survey.Cryptology ePrint Archive, 2025

work page 2025

[28] [28]

An investigation of recent backdoor attacks and defenses in federated learn- ing

Qiuxian Chen and Yizheng Tao. An investigation of recent backdoor attacks and defenses in federated learn- ing. In2023 Eighth International Conference on Fog and Mobile Edge Computing (FMEC), pages 262–269. IEEE, 2023

work page 2023

[29] [29]

Mesas: Poi- soning defense for federated learning resilient against adaptive attackers

Torsten Krauß and Alexandra Dmitrienko. Mesas: Poi- soning defense for federated learning resilient against adaptive attackers. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1526–1540, 2023

work page 2023

[30] [30]

Automatic adversarial adaption for stealthy poisoning attacks in federated learning

Torsten Krauß, Jan König, Alexandra Dmitrienko, and Christian Kanzow. Automatic adversarial adaption for stealthy poisoning attacks in federated learning. InTo appear soon at the Network and Distributed System Security Symposium (NDSS), 2024

work page 2024

[31] [31]

The hidden vulnerability of distributed learning in byzantium

Rachid Guerraoui, Sébastien Rouault, et al. The hidden vulnerability of distributed learning in byzantium. In International conference on machine learning, pages 3521–3530. PMLR, 2018

work page 2018

[32] [32]

Byzantine-robust distributed learning: Towards optimal statistical rates

Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. InInternational con- ference on machine learning, pages 5650–5659. Pmlr, 2018

work page 2018

[33] [33]

Learning multiple layers of features from tiny images, 2009

Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009. InTechnical report

work page 2009

[34] [34]

Rofl: Robustness of secure federated learning

Hidde Lycklama, Lukas Burkhalter, Alexander Viand, Nicolas Küchler, and Anwar Hithnawi. Rofl: Robustness of secure federated learning. In2023 IEEE Symposium on Security and Privacy (SP), pages 453–476. IEEE, 2023

work page 2023

[35] [35]

Backdoor federated learning by poisoning backdoor-critical layers.arXiv preprint arXiv:2308.04466, 2023

Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, and Xu Yuan. Backdoor federated learning by poisoning backdoor-critical layers.arXiv preprint arXiv:2308.04466, 2023

work page arXiv 2023

[36] [36]

The mnist database of handwritten digits,

Yann LeCun. The mnist database of handwritten digits,

work page

[37] [37]

InTechnical report. Appendix 15 0 25 50 75 100 Epoch 0.2 0.4 0.6 0.8Clean Accuracy Accuracy badnet krum local m=1 m=2 m=4 m=8 0 25 50 75 100 Epoch 0.5 0.6 0.7 0.8 0.9 1.0ASR ASR badnet krum success m=1 m=2 m=4 m=8 0 25 50 75 100 0.2 0.4 0.6 0.8 Accuracy blended krum 0 25 50 75 100 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ASR blended krum 0 25 50 75 100 0.2 0.4 0.6 0.8...

work page 2000