pith. sign in

arxiv: 2508.02115 · v4 · submitted 2025-08-04 · 💻 cs.CR · cs.AI

Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection

Pith reviewed 2026-05-19 01:24 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords backdoor detectionfederated learningOOD biasproactive defensewatermark injectioncollision effectinverted detectionmulti-backdoor
0
0 comments X

The pith

Coward injects a collided backdoor watermark via dual-mapping on OOD data to enable inverted proactive detection that counters OOD bias in federated learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to make proactive backdoor detection practical in federated learning despite non-i.i.d. client data and irregular participation. It starts from the observation that planting distinct backdoors one after another causes earlier ones to be suppressed, an effect the authors call multi-backdoor collision. Using this effect, the method adds a specially designed watermark to the global model through regulated dual-mapping learning performed on out-of-distribution data. The resulting inverted detection scheme avoids the coexistence bias that misleads earlier proactive methods and keeps training disruption low. A sympathetic reader would care because the approach promises fewer false alarms while preserving the advantages of server-side intervention.

Core claim

The authors report that consecutively planted distinct backdoors significantly suppress earlier ones. They exploit this multi-backdoor collision effect by modifying the federated global model with a backdoor-collided watermark produced through regulated dual-mapping learning on OOD data. The construction yields an inverted detection paradigm that naturally offsets OOD prediction bias, limits the strength of that bias through low-disruptive intervention, and produces fewer misjudgments than prior proactive techniques.

What carries the argument

the backdoor-collided watermark produced by regulated dual-mapping learning on OOD data, which harnesses multi-backdoor collision to support inverted detection

If this is right

  • The inverted detection paradigm naturally counteracts the adverse impact of OOD prediction bias.
  • The low-disruptive training intervention inherently limits the strength of OOD bias.
  • Detection produces significantly fewer misjudgments than earlier proactive methods.
  • The approach reaches state-of-the-art performance on standard benchmark datasets.
  • OOD bias is effectively alleviated while preserving proactive server intervention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same collision principle could be tested in other distributed training regimes such as decentralized or edge learning.
  • Dynamic re-injection of the watermark might allow the server to refresh the detection signal as new clients join.
  • Combining the watermark with existing passive anomaly checks could produce a hybrid detector with higher overall reliability.
  • Measuring detection accuracy as the fraction of malicious clients increases would reveal the practical operating range of the collision mechanism.

Load-bearing premise

The multi-backdoor collision effect transfers from controlled settings to federated learning and can be turned into a usable watermark without creating new attack surfaces or excessive training disruption.

What would settle it

A controlled federated run in which several distinct backdoors are planted in sequence and the measured suppression of the earliest backdoor is checked to determine whether the resulting watermark produces accurate detection under realistic non-i.i.d. client distributions.

Figures

Figures reproduced from arXiv: 2508.02115 by Shu-Tao Xia, Shuxin Li, Siying Gu, Tianwei Zhang, Wenjie Li, Yiming Li, Zhili Chen.

Figure 1
Figure 1. Figure 1: Comparison between our collision-based proactive defense and the existing method via co-existent mechanism. By developing a backdoor-collided watermark, our method adopts an inverted detection paradigm with respect to mali￾cious clients that is naturally compatible with OOD prediction bias, thereby enabling more reliable and accurate detection. pattern into a model such that it behaves normally on clean in… view at source ↗
Figure 2
Figure 2. Figure 2: The distraction effect of non-i.i.d. data on passive backdoor detection. Divergent client data distributions (left of Figure (a)) substantially reduce the suspiciousness of mali￾cious clients, as reflected in both gradient norms (middle) and the update directions of benign models (right). In contrast, it increases the perceived suspiciousness of benign clients, particularly those with larger gradient norms… view at source ↗
Figure 3
Figure 3. Figure 3: The misdirection effect of OOD bias against the existing proactive backdoor detection. With the attacker￾specified target class set to ‘0’ (red), we show five clients from the same round, presenting their OOD prediciton distribution. The gray subfigure shows the local data distribution. The red dashed line marks the detection threshold; classes with inspection accuracy above this line are flagged as malici… view at source ↗
Figure 4
Figure 4. Figure 4: Exacerbation of OOD bias induced by existing proactive methods. The Indicator method intensifies both the magnitude and variation of OOD bias across training rounds. Settings. We analyze the prediction distribution of OOD sam￾ples under Indicator [18], both with and without the watermark planting process. The main FL task is CIFAR-10 classification, with EMNIST used for proactive pattern planting. During p… view at source ↗
Figure 5
Figure 5. Figure 5: The collision effect between target-different back￾doors. (Left): A subsequently injected backdoor significantly degrades the ASR of the first backdoor. (Right): The ASR degradation of the first backdoor caused by standard fine￾tuning is notably smaller than that caused by backdoor. prediction bias in benign clients no longer misleads detection, instead, it becomes beneficial in some cases. Building on the… view at source ↗
Figure 6
Figure 6. Figure 6: The overall pipeline of our Coward method. (left) A defensive watermark is carefully embedded into the global model via low-cost OOD data training. (Middle) Random participants conduct local training based on the watermarked global model, where attackers tend to remove the watermark while benign clients preserve it. (Right) After local training, the server inspects the strength of the watermark; models wit… view at source ↗
Figure 7
Figure 7. Figure 7: A Holistic Quantification of OOD Bias and Its Impact on FPR. The left panel illustrates the OOD bias dis￾tribution, while the right panel shows the relationship between bias severity and the number of misjudged benign clients. Our method significantly reduces the OOD bias level, leading to a lower false positive rate, whereas the BackdoorIndicator increases the bias level and exhibits a high false positive… view at source ↗
Figure 1
Figure 1. Figure 1: OOD watermark collision under dynamic FL scenario. The collision effect remains highly effective in distinguishing malicious behavior under dynamic federated participation. The attacker exhibits a strong collision effect, while benign clients show diverse but generally higher levels of watermark retention. 0 5 10 15 0.0 Local Steps 0.2 0.4 0.6 0.8 1.0 Backdoor ASR 1st BA 2nd BA collision 0 5 10 15 0.0 Loca… view at source ↗
Figure 3
Figure 3. Figure 3: OOD watermark collision under static FL scenario. Once the attacker (in red) begins injecting the backdoor at round 1021, its watermark accuracy rapidly drops to 0, indicating a strong collision effect and a clear distinction from benign clients. Moreover, the BN switch plays a critical role in accurately reflecting the benign clients’ (in varying shades of blue) ability to retain the watermark. collision … view at source ↗
Figure 2
Figure 2. Figure 2: OOD watermark collision under centralized sce￾nario. Our OOD watermark is planted as the second back￾door. The resulting collision effect is significant, regardless of whether the BN layer is switched. However, switching the BN layer creates a more pronounced performance discrepancy between benign finetuning and backdoor injection behaviors. APPENDIX A COLLISION EFFECTS ON OOD WATERMARK In Section III-C, w… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of trigger configurations. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
read the original abstract

Backdoor detection is currently the mainstream defense against backdoor attacks in federated learning (FL), where a small number of malicious clients can upload poisoned updates to compromise the federated global model. Existing backdoor detection techniques fall into two categories, passive and proactive, depending on whether the server proactively intervenes in the training process. However, both of them have practical limitations: passive detection methods are disrupted by common non-i.i.d. data distributions and random participation of FL clients, whereas current proactive detection methods are misled by an inevitable out-of-distribution (OOD) bias because they rely on backdoor coexistence effects. To address these issues, we introduce a novel proactive detection method dubbed Coward, inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones. Correspondingly, we modify the federated global model by injecting a carefully designed backdoor-collided watermark, implemented via regulated dual-mapping learning on OOD data. This design not only enables an inverted detection paradigm compared to existing proactive methods, thereby naturally counteracting the adverse impact of OOD prediction bias, but also introduces a low-disruptive training intervention that inherently limits the strength of OOD bias, leading to significantly fewer misjudgments. Extensive experiments on benchmark datasets show that Coward achieves state-of-the-art performance and effectively alleviates OOD bias.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Coward, a proactive backdoor detection method for federated learning. It is based on the empirical discovery of multi-backdoor collision effects, in which consecutively planted distinct backdoors suppress earlier ones. The server injects a backdoor-collided watermark via regulated dual-mapping learning on OOD data to enable an inverted detection paradigm that counters OOD prediction bias while imposing only low-disruptive intervention. Experiments on benchmark datasets are reported to achieve state-of-the-art performance and to alleviate OOD bias compared to existing passive and proactive baselines.

Significance. If the collision suppression reliably transfers to realistic FL conditions, the inverted paradigm could meaningfully reduce misjudgments arising from OOD bias while preserving utility. The low-disruptive design and explicit focus on practical constraints (random participation, non-i.i.d. data) are constructive. Credit is given for the empirical demonstration of SOTA results on standard benchmarks; however, the absence of machine-checked proofs, parameter-free derivations, or falsifiable predictions limits the long-term impact.

major comments (2)
  1. The central claim that the multi-backdoor collision effect transfers to the federated setting and can be harnessed for inverted detection rests on the unverified assumption that consecutive-planting suppression survives random client sampling and non-i.i.d. local datasets. This premise is load-bearing for the claimed advantage over OOD-biased proactive baselines, yet the manuscript provides only centralized empirical observations without targeted ablation under realistic FL participation schedules.
  2. Section 5 (Experiments): performance tables report SOTA results and reduced OOD bias, but no error bars, client participation schedules, or controls for post-hoc data selection are described. These omissions directly affect confidence in the reproducibility of the alleviation-of-OOD-bias claim under the low-disruptive intervention constraint.
minor comments (2)
  1. The notation for the dual-mapping regulation parameters and watermark injection strength is introduced without an explicit equation or pseudocode block, making it difficult to reproduce the exact training intervention.
  2. Figure captions for the collision-effect illustrations could more clearly distinguish centralized observations from federated transfer results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, providing clarifications on our experimental design while committing to targeted revisions that strengthen reproducibility and evidence for the collision effect under realistic federated conditions.

read point-by-point responses
  1. Referee: The central claim that the multi-backdoor collision effect transfers to the federated setting and can be harnessed for inverted detection rests on the unverified assumption that consecutive-planting suppression survives random client sampling and non-i.i.d. local datasets. This premise is load-bearing for the claimed advantage over OOD-biased proactive baselines, yet the manuscript provides only centralized empirical observations without targeted ablation under realistic FL participation schedules.

    Authors: The initial discovery of collision suppression was indeed shown via centralized experiments. However, all performance results in Section 5 were obtained from full federated simulations that already incorporate random client sampling and non-i.i.d. local datasets on standard benchmarks. These results demonstrate both the inverted detection paradigm and reduced OOD bias under the low-disruptive intervention. To directly verify transfer of the suppression mechanism itself, we will add a dedicated ablation subsection that varies participation rates and data heterogeneity while measuring collision strength. revision: partial

  2. Referee: Section 5 (Experiments): performance tables report SOTA results and reduced OOD bias, but no error bars, client participation schedules, or controls for post-hoc data selection are described. These omissions directly affect confidence in the reproducibility of the alleviation-of-OOD-bias claim under the low-disruptive intervention constraint.

    Authors: We agree that these details are necessary for reproducibility. In the revised manuscript we will (i) report error bars computed over at least five independent runs with different random seeds, (ii) explicitly tabulate or describe the client participation schedules used in each experiment, and (iii) add a control experiment that fixes the data selection procedure in advance to rule out post-hoc bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical discovery supports independent design

full rationale

The paper's central contribution rests on an empirical observation of multi-backdoor collision effects (consecutively planted distinct backdoors suppressing earlier ones) and a subsequent engineering intervention via regulated dual-mapping watermark injection on OOD data. No equations, fitted parameters, or predictions reduce the claimed detection performance directly to inputs by construction. The derivation chain is self-contained against external benchmarks, with performance claims validated experimentally rather than through self-definitional loops, load-bearing self-citations, or ansatz smuggling. This is the expected honest non-finding for an empirical method paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the transferability of the multi-backdoor collision observation to federated training and on design choices for the dual-mapping procedure; these are treated as empirical findings rather than derived quantities.

free parameters (1)
  • watermark injection strength and dual-mapping regulation parameters
    Hand-chosen or tuned values that control how strongly the collided watermark is embedded while limiting disruption to normal training.
axioms (1)
  • domain assumption Consecutively planted distinct backdoors significantly suppress earlier ones in the federated global model
    Core empirical discovery invoked to justify the watermark design and inverted detection.
invented entities (1)
  • backdoor-collided watermark no independent evidence
    purpose: Enables inverted proactive detection that counteracts OOD prediction bias
    New construct introduced via regulated dual-mapping on OOD data; no independent falsifiable handle outside the method itself is described.

pith-pipeline@v0.9.0 · 5794 in / 1437 out tokens · 32520 ms · 2026-05-19T01:24:53.389475+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 1 internal anchor

  1. [1]

    Communication-efficient learning of deep networks from decentralized data,

    B. McMahan, E. Moore, D. Ramage, S. Hampsonet al., “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

  2. [2]

    Advances and open problems in federated learning,

    P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

  3. [3]

    Federated learning for generalization, robustness, fairness: A survey and benchmark,

    W. Huang, M. Ye, Z. Shi, G. Wan, H. Li, B. Du, and Q. Yang, “Federated learning for generalization, robustness, fairness: A survey and benchmark,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9387–9406, 2024

  4. [4]

    Federated one-shot learning with data privacy and objective-hiding,

    M. Egger, R. Urbanke, and R. Bitar, “Federated one-shot learning with data privacy and objective-hiding,”IEEE Transactions on Information Forensics and Security, 2025

  5. [5]

    Towards federated foundation models: Scalable dataset pipelines for group- structured learning,

    Z. Charles, N. Mitchell, K. Pillutla, M. Reneer, and Z. Garrett, “Towards federated foundation models: Scalable dataset pipelines for group- structured learning,” inNeurIPS, 2023, pp. 32 299–32 327

  6. [6]

    Ten challenging problems in federated founda- tion models,

    T. Fan, H. Gu, X. Cao, C. S. Chan, Q. Chen, Y . Chen, Y . Feng, Y . Gu, J. Geng, B. Luoet al., “Ten challenging problems in federated founda- tion models,”IEEE Transactions on Knowledge and Data Engineering, 2025

  7. [7]

    A federated learning system for precision oncology in europe: Digione,

    P. Mahon, I. Chatzitheofilou, A. Dekker, X. Fern ´andez, G. Hall, A. Hel- land, A. Traverso, C. Van Marcke, J. Vehreschild, G. Cilibertoet al., “A federated learning system for precision oncology in europe: Digione,” nature medicine, vol. 30, no. 2, pp. 334–337, 2024

  8. [8]

    Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,

    W. Ahmad, A. Vashist, N. Sinha, M. Prasad, V . Shrivastava, and J. H. Muzamal, “Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,” in SEDE, 2024, pp. 139–156

  9. [9]

    Topology- aware federated learning in edge computing: A comprehensive survey,

    J. Wu, F. Dong, H. Leung, Z. Zhu, J. Zhou, and S. Drew, “Topology- aware federated learning in edge computing: A comprehensive survey,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–41, 2024. PREPRINT 13

  10. [10]

    Backdoor learning: A survey,

    Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 35, no. 1, pp. 5–22, 2022

  11. [11]

    Chameleon: Adapting to peer images for planting durable backdoors in federated learning,

    Y . Dai and S. Li, “Chameleon: Adapting to peer images for planting durable backdoors in federated learning,” inICML, 2023

  12. [12]

    Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,

    S. Yang, J. Bai, K. Gao, Y . Yang, Y . Li, and S.-T. Xia, “Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,” inCVPR, 2024, pp. 24 431–24 441

  13. [13]

    How to backdoor federated learning,

    E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

  14. [14]

    Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,

    Q. Li, D. Wu, D. Zhou, C. Lin, S. Liu, C. Wang, and C. Shen, “Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,”IEEE Transactions on Information Forensics and Security, 2025

  15. [15]

    Federated learning minimal model replacement attack using optimal transport: an attacker perspective,

    K. N. Kumar, C. K. Mohan, and L. R. Cenkeramaddi, “Federated learning minimal model replacement attack using optimal transport: an attacker perspective,”IEEE Transactions on Information Forensics and Security, 2024

  16. [16]

    Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,

    P. Rieger, T. D. Nguyen, M. Miettinen, and A.-R. Sadeghi, “Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,” inNDSS, 2022

  17. [17]

    {FLAME}: Taming backdoors in federated learning,

    T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M ¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouniet al., “{FLAME}: Taming backdoors in federated learning,” inUSENIX Security, 2022, pp. 1415–1432

  18. [18]

    {BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,

    S. Li and Y . Dai, “{BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,” inUSENIX Security, 2024, pp. 4193–4210

  19. [19]

    Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

    A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” inCVPR, 2015, pp. 427–436

  20. [20]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inICML, 2017, pp. 1321–1330

  21. [21]

    Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,

    M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” inCVPR, 2019, pp. 41–50

  22. [22]

    Federated machine learning: Concept and applications,

    Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019

  23. [23]

    Badnets: Evaluating backdooring attacks on deep neural networks,

    T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

  24. [24]

    Wanet-imperceptible warping-based backdoor attack,

    T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2021

  25. [25]

    Backdoor attack with sparse and invisible trigger,

    Y . Gao, Y . Li, X. Gong, Z. Li, S.-T. Xia, and Q. Wang, “Backdoor attack with sparse and invisible trigger,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 6364–6376, 2024

  26. [26]

    Blind backdoors in deep learning models,

    E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

  27. [27]

    Csba: Covert semantic backdoor attack against intelligent connected vehicles,

    X. Xu, Y . Chen, B. Wang, Z. Bian, S. Han, C. Dong, C. Sun, W. Zhang, L. Xu, and P. Zhang, “Csba: Covert semantic backdoor attack against intelligent connected vehicles,”IEEE Transactions on Vehicular Technology, 2024

  28. [28]

    Witches’ brew: Industrial scale data poisoning via gradient matching,

    J. Geiping, L. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,”arXiv preprint arXiv:2009.02276, 2020

  29. [29]

    Backdoor contrastive learning via bi-level trigger optimization,

    W. Sun, X. Zhang, H. Lu, Y . Chen, T. Wang, J. Chen, and L. Lin, “Backdoor contrastive learning via bi-level trigger optimization,” in ICLR, 2024

  30. [30]

    Input-aware dynamic backdoor attack,

    T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in NeurIPS, 2020, pp. 3454–3464

  31. [31]

    Toward stealthy backdoor attacks against speech recognition via elements of sound,

    H. Cai, P. Zhang, H. Dong, Y . Xiao, S. Koffas, and Y . Li, “Toward stealthy backdoor attacks against speech recognition via elements of sound,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5852–5866, 2024

  32. [32]

    Towards sample- specific backdoor attack with clean labels via attribute trigger,

    M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, and Z. Qin, “Towards sample- specific backdoor attack with clean labels via attribute trigger,”IEEE Transactions on Dependable and Secure Computing, 2025

  33. [33]

    Can you really backdoor federated learning?

    Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

  34. [34]

    Neurotoxin: Durable backdoors in federated learning,

    Z. Zhang, A. Panda, L. Song, Y . Yang, M. Mahoney, P. Mittal, R. Kan- nan, and J. Gonzalez, “Neurotoxin: Durable backdoors in federated learning,” inICML, 2022, pp. 26 429–26 446

  35. [35]

    Edge-case backdoor attacks against federated learning,

    B. Wang, Y . Yao, X. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Edge-case backdoor attacks against federated learning,” inNeurIPS Workshop, 2020

  36. [36]

    Dba: Distributed backdoor attacks against federated learning,

    C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2019

  37. [37]

    Coordinated backdoor attacks against federated learning with model- dependent triggers,

    X. Gong, Y . Chen, H. Huang, Y . Liao, S. Wang, and Q. Wang, “Coordinated backdoor attacks against federated learning with model- dependent triggers,”IEEE network, vol. 36, no. 1, pp. 84–90, 2022

  38. [38]

    Non- cooperative backdoor attacks in federated learning: A new threat land- scape,

    T. Nguyen, D. T. Nguyen, K. D. Doan, and K.-S. Wong, “Non- cooperative backdoor attacks in federated learning: A new threat land- scape,”arXiv preprint arXiv:2407.07917, 2024

  39. [39]

    Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,

    B. Li, Y . Cai, H. Li, F. Xue, Z. Li, and Y . Li, “Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,” inCVPR, 2024, pp. 24 523–24 533

  40. [40]

    Towards reliable and efficient backdoor trigger inversion via decoupling benign features,

    X. Xu, K. Huang, Y . Li, Z. Qin, and K. Ren, “Towards reliable and efficient backdoor trigger inversion via decoupling benign features,” in ICLR, 2024

  41. [41]

    Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,

    H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,” inAsiaCCS, 2021, pp. 363–377

  42. [42]

    Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,

    B. Yi, T. Huang, S. Chen, T. Li, Z. Liu, Z. Chu, and Y . Li, “Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,” inICLR, 2025

  43. [43]

    Robust aggregation for federated learning,

    K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, vol. 70, pp. 1142–1154, 2022

  44. [45]

    Defending against backdoors in federated learning with robust learning rate,

    M. S. Ozdayi, M. Kantarcioglu, and Y . R. Gel, “Defending against backdoors in federated learning with robust learning rate,” inAAAI, 2021, pp. 9268–9276

  45. [46]

    Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,

    J. Zhang, C. Zhu, X. Sun, C. Ge, B. Chen, W. Susilo, and S. Yu, “Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,”IEEE Transactions on Information Forensics and Security, 2024

  46. [47]

    Ma- chine learning with adversaries: Byzantine tolerant gradient descent,

    P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Ma- chine learning with adversaries: Byzantine tolerant gradient descent,” in NeurIPS, 2017

  47. [48]

    The limitations of federated learning in sybil settings,

    C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings,” inRAID, 2020, pp. 301–316

  48. [49]

    Fltrust: Byzantine-robust federated learning via trust bootstrapping,

    X. Cao, M. Fang, J. Liu, and N. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

  49. [50]

    Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,

    Z. Ma, J. Ma, Y . Miao, Y . Li, and R. H. Deng, “Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,”IEEE Transactions on Information Forensics and Security, 2022

  50. [51]

    Rflbat: A robust fed- erated learning algorithm against backdoor attack,

    Y . Wang, D. Zhai, Y . Zhan, and Y . Xia, “Rflbat: A robust fed- erated learning algorithm against backdoor attack,”arXiv preprint arXiv:2201.03772, 2022

  51. [52]

    Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,

    X. Zhou, X. Chen, S. Liu, X. Fan, Q. Sun, L. Chen, M. Qiu, and T. Xiang, “Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,”IEEE Transactions on Information Forensics and Security, 2025

  52. [53]

    Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,

    Y . Li, J. He, H. Huang, J. Sun, and X. Ma, “Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,”arXiv preprint arXiv:2401.15295, 2024

  53. [54]

    Provably robust multi-bit watermarking for ai-generated text,

    W. Qu, W. Zheng, T. Tao, D. Yin, Y . Jiang, Z. Tian, W. Zou, J. Jia, and J. Zhang, “Provably robust multi-bit watermarking for ai-generated text,” inUSENIX Security, 2025

  54. [55]

    Rethinking data protection in the (generative) artificial intelligence era,

    Y . Li, S. Shao, Y . He, J. Guo, T. Zhang, Z. Qin, P.-Y . Chen, M. Backes, P. Torr, D. Taoet al., “Rethinking data protection in the (generative) artificial intelligence era,”arXiv preprint arXiv:2507.03034, 2025

  55. [56]

    Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,

    S. Shao, Y . Li, H. Yao, Y . He, Z. Qin, and K. Ren, “Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,” inNDSS, 2025

  56. [57]

    Towards robust model watermark via reducing parametric vulnerability,

    G. Gan, Y . Li, D. Wu, and S.-T. Xia, “Towards robust model watermark via reducing parametric vulnerability,” inICCV, 2023, pp. 4751–4761

  57. [58]

    Emnist: Extending mnist to handwritten letters,

    G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” inIJCNN, 2017, pp. 2921–2926

  58. [59]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep. TR-2009, 2009

  59. [60]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

  60. [61]

    Federated optimization in heterogeneous networks,

    T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inMLSys, 2020

  61. [62]

    Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

    X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

  62. [63]

    How to backdoor federated learning,

    E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020, pp. 2938–2948. PREPRINT 1 1000 1100 1200 0.0 0.5 1.0 Attacker 0 1000 1100 1200 0.0 0.5 1.0 Benign 3 1000 1100 1200 0.0 0.5 1.0 Benign 6 1000 1100 1200 0.0 0.5 1.0 Benign 9 1000 1100 1200 0.0 0.5 1.0 Benign 12 1000 1100 1200 0.0 0.5 1.0 Be...