Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection

Shu-Tao Xia; Shuxin Li; Siying Gu; Tianwei Zhang; Wenjie Li; Yiming Li; Zhili Chen

arxiv: 2508.02115 · v4 · submitted 2025-08-04 · 💻 cs.CR · cs.AI

Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection

Wenjie Li , Siying Gu , Yiming Li , Shuxin Li , Zhili Chen , Tianwei Zhang , Shu-Tao Xia This is my paper

Pith reviewed 2026-05-19 01:24 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords backdoor detectionfederated learningOOD biasproactive defensewatermark injectioncollision effectinverted detectionmulti-backdoor

0 comments

The pith

Coward injects a collided backdoor watermark via dual-mapping on OOD data to enable inverted proactive detection that counters OOD bias in federated learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to make proactive backdoor detection practical in federated learning despite non-i.i.d. client data and irregular participation. It starts from the observation that planting distinct backdoors one after another causes earlier ones to be suppressed, an effect the authors call multi-backdoor collision. Using this effect, the method adds a specially designed watermark to the global model through regulated dual-mapping learning performed on out-of-distribution data. The resulting inverted detection scheme avoids the coexistence bias that misleads earlier proactive methods and keeps training disruption low. A sympathetic reader would care because the approach promises fewer false alarms while preserving the advantages of server-side intervention.

Core claim

The authors report that consecutively planted distinct backdoors significantly suppress earlier ones. They exploit this multi-backdoor collision effect by modifying the federated global model with a backdoor-collided watermark produced through regulated dual-mapping learning on OOD data. The construction yields an inverted detection paradigm that naturally offsets OOD prediction bias, limits the strength of that bias through low-disruptive intervention, and produces fewer misjudgments than prior proactive techniques.

What carries the argument

the backdoor-collided watermark produced by regulated dual-mapping learning on OOD data, which harnesses multi-backdoor collision to support inverted detection

If this is right

The inverted detection paradigm naturally counteracts the adverse impact of OOD prediction bias.
The low-disruptive training intervention inherently limits the strength of OOD bias.
Detection produces significantly fewer misjudgments than earlier proactive methods.
The approach reaches state-of-the-art performance on standard benchmark datasets.
OOD bias is effectively alleviated while preserving proactive server intervention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same collision principle could be tested in other distributed training regimes such as decentralized or edge learning.
Dynamic re-injection of the watermark might allow the server to refresh the detection signal as new clients join.
Combining the watermark with existing passive anomaly checks could produce a hybrid detector with higher overall reliability.
Measuring detection accuracy as the fraction of malicious clients increases would reveal the practical operating range of the collision mechanism.

Load-bearing premise

The multi-backdoor collision effect transfers from controlled settings to federated learning and can be turned into a usable watermark without creating new attack surfaces or excessive training disruption.

What would settle it

A controlled federated run in which several distinct backdoors are planted in sequence and the measured suppression of the earliest backdoor is checked to determine whether the resulting watermark produces accurate detection under realistic non-i.i.d. client distributions.

Figures

Figures reproduced from arXiv: 2508.02115 by Shu-Tao Xia, Shuxin Li, Siying Gu, Tianwei Zhang, Wenjie Li, Yiming Li, Zhili Chen.

**Figure 1.** Figure 1: Comparison between our collision-based proactive defense and the existing method via co-existent mechanism. By developing a backdoor-collided watermark, our method adopts an inverted detection paradigm with respect to malicious clients that is naturally compatible with OOD prediction bias, thereby enabling more reliable and accurate detection. pattern into a model such that it behaves normally on clean in… view at source ↗

**Figure 2.** Figure 2: The distraction effect of non-i.i.d. data on passive backdoor detection. Divergent client data distributions (left of Figure (a)) substantially reduce the suspiciousness of malicious clients, as reflected in both gradient norms (middle) and the update directions of benign models (right). In contrast, it increases the perceived suspiciousness of benign clients, particularly those with larger gradient norms… view at source ↗

**Figure 3.** Figure 3: The misdirection effect of OOD bias against the existing proactive backdoor detection. With the attackerspecified target class set to ‘0’ (red), we show five clients from the same round, presenting their OOD prediciton distribution. The gray subfigure shows the local data distribution. The red dashed line marks the detection threshold; classes with inspection accuracy above this line are flagged as malici… view at source ↗

**Figure 4.** Figure 4: Exacerbation of OOD bias induced by existing proactive methods. The Indicator method intensifies both the magnitude and variation of OOD bias across training rounds. Settings. We analyze the prediction distribution of OOD samples under Indicator [18], both with and without the watermark planting process. The main FL task is CIFAR-10 classification, with EMNIST used for proactive pattern planting. During p… view at source ↗

**Figure 5.** Figure 5: The collision effect between target-different backdoors. (Left): A subsequently injected backdoor significantly degrades the ASR of the first backdoor. (Right): The ASR degradation of the first backdoor caused by standard finetuning is notably smaller than that caused by backdoor. prediction bias in benign clients no longer misleads detection, instead, it becomes beneficial in some cases. Building on the… view at source ↗

**Figure 6.** Figure 6: The overall pipeline of our Coward method. (left) A defensive watermark is carefully embedded into the global model via low-cost OOD data training. (Middle) Random participants conduct local training based on the watermarked global model, where attackers tend to remove the watermark while benign clients preserve it. (Right) After local training, the server inspects the strength of the watermark; models wit… view at source ↗

**Figure 7.** Figure 7: A Holistic Quantification of OOD Bias and Its Impact on FPR. The left panel illustrates the OOD bias distribution, while the right panel shows the relationship between bias severity and the number of misjudged benign clients. Our method significantly reduces the OOD bias level, leading to a lower false positive rate, whereas the BackdoorIndicator increases the bias level and exhibits a high false positive… view at source ↗

**Figure 1.** Figure 1: OOD watermark collision under dynamic FL scenario. The collision effect remains highly effective in distinguishing malicious behavior under dynamic federated participation. The attacker exhibits a strong collision effect, while benign clients show diverse but generally higher levels of watermark retention. 0 5 10 15 0.0 Local Steps 0.2 0.4 0.6 0.8 1.0 Backdoor ASR 1st BA 2nd BA collision 0 5 10 15 0.0 Loca… view at source ↗

**Figure 3.** Figure 3: OOD watermark collision under static FL scenario. Once the attacker (in red) begins injecting the backdoor at round 1021, its watermark accuracy rapidly drops to 0, indicating a strong collision effect and a clear distinction from benign clients. Moreover, the BN switch plays a critical role in accurately reflecting the benign clients’ (in varying shades of blue) ability to retain the watermark. collision … view at source ↗

**Figure 2.** Figure 2: OOD watermark collision under centralized scenario. Our OOD watermark is planted as the second backdoor. The resulting collision effect is significant, regardless of whether the BN layer is switched. However, switching the BN layer creates a more pronounced performance discrepancy between benign finetuning and backdoor injection behaviors. APPENDIX A COLLISION EFFECTS ON OOD WATERMARK In Section III-C, w… view at source ↗

**Figure 4.** Figure 4: Visualization of trigger configurations. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

Backdoor detection is currently the mainstream defense against backdoor attacks in federated learning (FL), where a small number of malicious clients can upload poisoned updates to compromise the federated global model. Existing backdoor detection techniques fall into two categories, passive and proactive, depending on whether the server proactively intervenes in the training process. However, both of them have practical limitations: passive detection methods are disrupted by common non-i.i.d. data distributions and random participation of FL clients, whereas current proactive detection methods are misled by an inevitable out-of-distribution (OOD) bias because they rely on backdoor coexistence effects. To address these issues, we introduce a novel proactive detection method dubbed Coward, inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones. Correspondingly, we modify the federated global model by injecting a carefully designed backdoor-collided watermark, implemented via regulated dual-mapping learning on OOD data. This design not only enables an inverted detection paradigm compared to existing proactive methods, thereby naturally counteracting the adverse impact of OOD prediction bias, but also introduces a low-disruptive training intervention that inherently limits the strength of OOD bias, leading to significantly fewer misjudgments. Extensive experiments on benchmark datasets show that Coward achieves state-of-the-art performance and effectively alleviates OOD bias.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Coward's collision-based watermark offers a promising inversion of proactive detection in federated learning but rests on shaky transfer of the central effect to non-i.i.d. settings.

read the letter

The main point for you is that Coward tries to solve a real pain point in federated backdoor detection by exploiting a collision between backdoors instead of relying on their coexistence. This inverted approach could sidestep the OOD bias that trips up current proactive methods. The novelty comes from discovering that planting distinct backdoors one after another causes the later ones to suppress the earlier ones significantly. They turn this into a practical tool by having the server inject a carefully regulated watermark backdoor using dual-mapping on OOD data. This setup allows detection by looking for the inversion signal rather than the usual backdoor presence, which naturally counters the bias from out-of-distribution predictions. It also keeps the training disruption low, which is important for FL deployments. The paper handles the motivation well. It clearly explains why passive methods break with non-i.i.d. distributions and client dropout, and why proactive ones get misled by OOD bias. The design choices around the watermark seem thoughtful for limiting that bias. Where it gets soft is in the validation of the core assumption. The collision effect is presented as an empirical finding, but the experiments need to show it survives random client participation and non-i.i.d. local data. From what I can see, there aren't enough specifics on experimental controls, error bars, or how they simulated realistic FL schedules. That makes the SOTA claims harder to evaluate. There's also the question of whether this watermark injection opens up new ways for attackers to exploit the system. Overall, this is for people focused on security in federated and distributed machine learning. Anyone dealing with backdoor threats in real-world FL setups could pick up useful ideas here, especially if they want to move beyond standard proactive techniques. The work shows clear thinking on the problem and engages with the limitations of prior art, so it deserves a serious referee even if the results need more scrutiny. I'd push for it to go to peer review, but flag the need for detailed ablations on the collision transfer and potential side effects of the watermark.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Coward, a proactive backdoor detection method for federated learning. It is based on the empirical discovery of multi-backdoor collision effects, in which consecutively planted distinct backdoors suppress earlier ones. The server injects a backdoor-collided watermark via regulated dual-mapping learning on OOD data to enable an inverted detection paradigm that counters OOD prediction bias while imposing only low-disruptive intervention. Experiments on benchmark datasets are reported to achieve state-of-the-art performance and to alleviate OOD bias compared to existing passive and proactive baselines.

Significance. If the collision suppression reliably transfers to realistic FL conditions, the inverted paradigm could meaningfully reduce misjudgments arising from OOD bias while preserving utility. The low-disruptive design and explicit focus on practical constraints (random participation, non-i.i.d. data) are constructive. Credit is given for the empirical demonstration of SOTA results on standard benchmarks; however, the absence of machine-checked proofs, parameter-free derivations, or falsifiable predictions limits the long-term impact.

major comments (2)

The central claim that the multi-backdoor collision effect transfers to the federated setting and can be harnessed for inverted detection rests on the unverified assumption that consecutive-planting suppression survives random client sampling and non-i.i.d. local datasets. This premise is load-bearing for the claimed advantage over OOD-biased proactive baselines, yet the manuscript provides only centralized empirical observations without targeted ablation under realistic FL participation schedules.
Section 5 (Experiments): performance tables report SOTA results and reduced OOD bias, but no error bars, client participation schedules, or controls for post-hoc data selection are described. These omissions directly affect confidence in the reproducibility of the alleviation-of-OOD-bias claim under the low-disruptive intervention constraint.

minor comments (2)

The notation for the dual-mapping regulation parameters and watermark injection strength is introduced without an explicit equation or pseudocode block, making it difficult to reproduce the exact training intervention.
Figure captions for the collision-effect illustrations could more clearly distinguish centralized observations from federated transfer results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, providing clarifications on our experimental design while committing to targeted revisions that strengthen reproducibility and evidence for the collision effect under realistic federated conditions.

read point-by-point responses

Referee: The central claim that the multi-backdoor collision effect transfers to the federated setting and can be harnessed for inverted detection rests on the unverified assumption that consecutive-planting suppression survives random client sampling and non-i.i.d. local datasets. This premise is load-bearing for the claimed advantage over OOD-biased proactive baselines, yet the manuscript provides only centralized empirical observations without targeted ablation under realistic FL participation schedules.

Authors: The initial discovery of collision suppression was indeed shown via centralized experiments. However, all performance results in Section 5 were obtained from full federated simulations that already incorporate random client sampling and non-i.i.d. local datasets on standard benchmarks. These results demonstrate both the inverted detection paradigm and reduced OOD bias under the low-disruptive intervention. To directly verify transfer of the suppression mechanism itself, we will add a dedicated ablation subsection that varies participation rates and data heterogeneity while measuring collision strength. revision: partial
Referee: Section 5 (Experiments): performance tables report SOTA results and reduced OOD bias, but no error bars, client participation schedules, or controls for post-hoc data selection are described. These omissions directly affect confidence in the reproducibility of the alleviation-of-OOD-bias claim under the low-disruptive intervention constraint.

Authors: We agree that these details are necessary for reproducibility. In the revised manuscript we will (i) report error bars computed over at least five independent runs with different random seeds, (ii) explicitly tabulate or describe the client participation schedules used in each experiment, and (iii) add a control experiment that fixes the data selection procedure in advance to rule out post-hoc bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical discovery supports independent design

full rationale

The paper's central contribution rests on an empirical observation of multi-backdoor collision effects (consecutively planted distinct backdoors suppressing earlier ones) and a subsequent engineering intervention via regulated dual-mapping watermark injection on OOD data. No equations, fitted parameters, or predictions reduce the claimed detection performance directly to inputs by construction. The derivation chain is self-contained against external benchmarks, with performance claims validated experimentally rather than through self-definitional loops, load-bearing self-citations, or ansatz smuggling. This is the expected honest non-finding for an empirical method paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the transferability of the multi-backdoor collision observation to federated training and on design choices for the dual-mapping procedure; these are treated as empirical findings rather than derived quantities.

free parameters (1)

watermark injection strength and dual-mapping regulation parameters
Hand-chosen or tuned values that control how strongly the collided watermark is embedded while limiting disruption to normal training.

axioms (1)

domain assumption Consecutively planted distinct backdoors significantly suppress earlier ones in the federated global model
Core empirical discovery invoked to justify the watermark design and inverted detection.

invented entities (1)

backdoor-collided watermark no independent evidence
purpose: Enables inverted proactive detection that counteracts OOD prediction bias
New construct introduced via regulated dual-mapping on OOD data; no independent falsifiable handle outside the method itself is described.

pith-pipeline@v0.9.0 · 5794 in / 1437 out tokens · 32520 ms · 2026-05-19T01:24:53.389475+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones... regulated dual-mapping learning on OOD data
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

inverted detection paradigm... watermark accuracy falls below a specified threshold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 1 internal anchor

[1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampsonet al., “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

work page 2017
[2]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021
[3]

Federated learning for generalization, robustness, fairness: A survey and benchmark,

W. Huang, M. Ye, Z. Shi, G. Wan, H. Li, B. Du, and Q. Yang, “Federated learning for generalization, robustness, fairness: A survey and benchmark,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9387–9406, 2024

work page 2024
[4]

Federated one-shot learning with data privacy and objective-hiding,

M. Egger, R. Urbanke, and R. Bitar, “Federated one-shot learning with data privacy and objective-hiding,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[5]

Towards federated foundation models: Scalable dataset pipelines for group- structured learning,

Z. Charles, N. Mitchell, K. Pillutla, M. Reneer, and Z. Garrett, “Towards federated foundation models: Scalable dataset pipelines for group- structured learning,” inNeurIPS, 2023, pp. 32 299–32 327

work page 2023
[6]

Ten challenging problems in federated founda- tion models,

T. Fan, H. Gu, X. Cao, C. S. Chan, Q. Chen, Y . Chen, Y . Feng, Y . Gu, J. Geng, B. Luoet al., “Ten challenging problems in federated founda- tion models,”IEEE Transactions on Knowledge and Data Engineering, 2025

work page 2025
[7]

A federated learning system for precision oncology in europe: Digione,

P. Mahon, I. Chatzitheofilou, A. Dekker, X. Fern ´andez, G. Hall, A. Hel- land, A. Traverso, C. Van Marcke, J. Vehreschild, G. Cilibertoet al., “A federated learning system for precision oncology in europe: Digione,” nature medicine, vol. 30, no. 2, pp. 334–337, 2024

work page 2024
[8]

Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,

W. Ahmad, A. Vashist, N. Sinha, M. Prasad, V . Shrivastava, and J. H. Muzamal, “Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,” in SEDE, 2024, pp. 139–156

work page 2024
[9]

Topology- aware federated learning in edge computing: A comprehensive survey,

J. Wu, F. Dong, H. Leung, Z. Zhu, J. Zhou, and S. Drew, “Topology- aware federated learning in edge computing: A comprehensive survey,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–41, 2024. PREPRINT 13

work page 2024
[10]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 35, no. 1, pp. 5–22, 2022

work page 2022
[11]

Chameleon: Adapting to peer images for planting durable backdoors in federated learning,

Y . Dai and S. Li, “Chameleon: Adapting to peer images for planting durable backdoors in federated learning,” inICML, 2023

work page 2023
[12]

Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,

S. Yang, J. Bai, K. Gao, Y . Yang, Y . Li, and S.-T. Xia, “Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,” inCVPR, 2024, pp. 24 431–24 441

work page 2024
[13]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

work page 2020
[14]

Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,

Q. Li, D. Wu, D. Zhou, C. Lin, S. Liu, C. Wang, and C. Shen, “Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[15]

Federated learning minimal model replacement attack using optimal transport: an attacker perspective,

K. N. Kumar, C. K. Mohan, and L. R. Cenkeramaddi, “Federated learning minimal model replacement attack using optimal transport: an attacker perspective,”IEEE Transactions on Information Forensics and Security, 2024

work page 2024
[16]

Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,

P. Rieger, T. D. Nguyen, M. Miettinen, and A.-R. Sadeghi, “Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,” inNDSS, 2022

work page 2022
[17]

{FLAME}: Taming backdoors in federated learning,

T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M ¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouniet al., “{FLAME}: Taming backdoors in federated learning,” inUSENIX Security, 2022, pp. 1415–1432

work page 2022
[18]

{BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,

S. Li and Y . Dai, “{BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,” inUSENIX Security, 2024, pp. 4193–4210

work page 2024
[19]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” inCVPR, 2015, pp. 427–436

work page 2015
[20]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inICML, 2017, pp. 1321–1330

work page 2017
[21]

Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,

M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” inCVPR, 2019, pp. 41–50

work page 2019
[22]

Federated machine learning: Concept and applications,

Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019

work page 2019
[23]

Badnets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019
[24]

Wanet-imperceptible warping-based backdoor attack,

T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2021

work page 2021
[25]

Backdoor attack with sparse and invisible trigger,

Y . Gao, Y . Li, X. Gong, Z. Li, S.-T. Xia, and Q. Wang, “Backdoor attack with sparse and invisible trigger,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 6364–6376, 2024

work page 2024
[26]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

work page 2021
[27]

Csba: Covert semantic backdoor attack against intelligent connected vehicles,

X. Xu, Y . Chen, B. Wang, Z. Bian, S. Han, C. Dong, C. Sun, W. Zhang, L. Xu, and P. Zhang, “Csba: Covert semantic backdoor attack against intelligent connected vehicles,”IEEE Transactions on Vehicular Technology, 2024

work page 2024
[28]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping, L. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,”arXiv preprint arXiv:2009.02276, 2020

work page arXiv 2009
[29]

Backdoor contrastive learning via bi-level trigger optimization,

W. Sun, X. Zhang, H. Lu, Y . Chen, T. Wang, J. Chen, and L. Lin, “Backdoor contrastive learning via bi-level trigger optimization,” in ICLR, 2024

work page 2024
[30]

Input-aware dynamic backdoor attack,

T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in NeurIPS, 2020, pp. 3454–3464

work page 2020
[31]

Toward stealthy backdoor attacks against speech recognition via elements of sound,

H. Cai, P. Zhang, H. Dong, Y . Xiao, S. Koffas, and Y . Li, “Toward stealthy backdoor attacks against speech recognition via elements of sound,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5852–5866, 2024

work page 2024
[32]

Towards sample- specific backdoor attack with clean labels via attribute trigger,

M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, and Z. Qin, “Towards sample- specific backdoor attack with clean labels via attribute trigger,”IEEE Transactions on Dependable and Secure Computing, 2025

work page 2025
[33]

Can you really backdoor federated learning?

Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911
[34]

Neurotoxin: Durable backdoors in federated learning,

Z. Zhang, A. Panda, L. Song, Y . Yang, M. Mahoney, P. Mittal, R. Kan- nan, and J. Gonzalez, “Neurotoxin: Durable backdoors in federated learning,” inICML, 2022, pp. 26 429–26 446

work page 2022
[35]

Edge-case backdoor attacks against federated learning,

B. Wang, Y . Yao, X. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Edge-case backdoor attacks against federated learning,” inNeurIPS Workshop, 2020

work page 2020
[36]

Dba: Distributed backdoor attacks against federated learning,

C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2019

work page 2019
[37]

Coordinated backdoor attacks against federated learning with model- dependent triggers,

X. Gong, Y . Chen, H. Huang, Y . Liao, S. Wang, and Q. Wang, “Coordinated backdoor attacks against federated learning with model- dependent triggers,”IEEE network, vol. 36, no. 1, pp. 84–90, 2022

work page 2022
[38]

Non- cooperative backdoor attacks in federated learning: A new threat land- scape,

T. Nguyen, D. T. Nguyen, K. D. Doan, and K.-S. Wong, “Non- cooperative backdoor attacks in federated learning: A new threat land- scape,”arXiv preprint arXiv:2407.07917, 2024

work page arXiv 2024
[39]

Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,

B. Li, Y . Cai, H. Li, F. Xue, Z. Li, and Y . Li, “Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,” inCVPR, 2024, pp. 24 523–24 533

work page 2024
[40]

Towards reliable and efficient backdoor trigger inversion via decoupling benign features,

X. Xu, K. Huang, Y . Li, Z. Qin, and K. Ren, “Towards reliable and efficient backdoor trigger inversion via decoupling benign features,” in ICLR, 2024

work page 2024
[41]

Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,

H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,” inAsiaCCS, 2021, pp. 363–377

work page 2021
[42]

Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,

B. Yi, T. Huang, S. Chen, T. Li, Z. Liu, Z. Chu, and Y . Li, “Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,” inICLR, 2025

work page 2025
[43]

Robust aggregation for federated learning,

K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, vol. 70, pp. 1142–1154, 2022

work page 2022
[45]

Defending against backdoors in federated learning with robust learning rate,

M. S. Ozdayi, M. Kantarcioglu, and Y . R. Gel, “Defending against backdoors in federated learning with robust learning rate,” inAAAI, 2021, pp. 9268–9276

work page 2021
[46]

Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,

J. Zhang, C. Zhu, X. Sun, C. Ge, B. Chen, W. Susilo, and S. Yu, “Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,”IEEE Transactions on Information Forensics and Security, 2024

work page 2024
[47]

Ma- chine learning with adversaries: Byzantine tolerant gradient descent,

P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Ma- chine learning with adversaries: Byzantine tolerant gradient descent,” in NeurIPS, 2017

work page 2017
[48]

The limitations of federated learning in sybil settings,

C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings,” inRAID, 2020, pp. 301–316

work page 2020
[49]

Fltrust: Byzantine-robust federated learning via trust bootstrapping,

X. Cao, M. Fang, J. Liu, and N. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

work page 2021
[50]

Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,

Z. Ma, J. Ma, Y . Miao, Y . Li, and R. H. Deng, “Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,”IEEE Transactions on Information Forensics and Security, 2022

work page 2022
[51]

Rflbat: A robust fed- erated learning algorithm against backdoor attack,

Y . Wang, D. Zhai, Y . Zhan, and Y . Xia, “Rflbat: A robust fed- erated learning algorithm against backdoor attack,”arXiv preprint arXiv:2201.03772, 2022

work page arXiv 2022
[52]

Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,

X. Zhou, X. Chen, S. Liu, X. Fan, Q. Sun, L. Chen, M. Qiu, and T. Xiang, “Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025
[53]

Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,

Y . Li, J. He, H. Huang, J. Sun, and X. Ma, “Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,”arXiv preprint arXiv:2401.15295, 2024

work page arXiv 2024
[54]

Provably robust multi-bit watermarking for ai-generated text,

W. Qu, W. Zheng, T. Tao, D. Yin, Y . Jiang, Z. Tian, W. Zou, J. Jia, and J. Zhang, “Provably robust multi-bit watermarking for ai-generated text,” inUSENIX Security, 2025

work page 2025
[55]

Rethinking data protection in the (generative) artificial intelligence era,

Y . Li, S. Shao, Y . He, J. Guo, T. Zhang, Z. Qin, P.-Y . Chen, M. Backes, P. Torr, D. Taoet al., “Rethinking data protection in the (generative) artificial intelligence era,”arXiv preprint arXiv:2507.03034, 2025

work page arXiv 2025
[56]

Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,

S. Shao, Y . Li, H. Yao, Y . He, Z. Qin, and K. Ren, “Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,” inNDSS, 2025

work page 2025
[57]

Towards robust model watermark via reducing parametric vulnerability,

G. Gan, Y . Li, D. Wu, and S.-T. Xia, “Towards robust model watermark via reducing parametric vulnerability,” inICCV, 2023, pp. 4751–4761

work page 2023
[58]

Emnist: Extending mnist to handwritten letters,

G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” inIJCNN, 2017, pp. 2921–2926

work page 2017
[59]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep. TR-2009, 2009

work page 2009
[60]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

work page 2016
[61]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inMLSys, 2020

work page 2020
[62]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[63]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020, pp. 2938–2948. PREPRINT 1 1000 1100 1200 0.0 0.5 1.0 Attacker 0 1000 1100 1200 0.0 0.5 1.0 Benign 3 1000 1100 1200 0.0 0.5 1.0 Benign 6 1000 1100 1200 0.0 0.5 1.0 Benign 9 1000 1100 1200 0.0 0.5 1.0 Benign 12 1000 1100 1200 0.0 0.5 1.0 Be...

work page 2020

[1] [1]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampsonet al., “Communication-efficient learning of deep networks from decentralized data,” inAISTATS, 2017

work page 2017

[2] [2]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and trends® in machine learning, vol. 14, no. 1–2, pp. 1–210, 2021

work page 2021

[3] [3]

Federated learning for generalization, robustness, fairness: A survey and benchmark,

W. Huang, M. Ye, Z. Shi, G. Wan, H. Li, B. Du, and Q. Yang, “Federated learning for generalization, robustness, fairness: A survey and benchmark,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9387–9406, 2024

work page 2024

[4] [4]

Federated one-shot learning with data privacy and objective-hiding,

M. Egger, R. Urbanke, and R. Bitar, “Federated one-shot learning with data privacy and objective-hiding,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[5] [5]

Towards federated foundation models: Scalable dataset pipelines for group- structured learning,

Z. Charles, N. Mitchell, K. Pillutla, M. Reneer, and Z. Garrett, “Towards federated foundation models: Scalable dataset pipelines for group- structured learning,” inNeurIPS, 2023, pp. 32 299–32 327

work page 2023

[6] [6]

Ten challenging problems in federated founda- tion models,

T. Fan, H. Gu, X. Cao, C. S. Chan, Q. Chen, Y . Chen, Y . Feng, Y . Gu, J. Geng, B. Luoet al., “Ten challenging problems in federated founda- tion models,”IEEE Transactions on Knowledge and Data Engineering, 2025

work page 2025

[7] [7]

A federated learning system for precision oncology in europe: Digione,

P. Mahon, I. Chatzitheofilou, A. Dekker, X. Fern ´andez, G. Hall, A. Hel- land, A. Traverso, C. Van Marcke, J. Vehreschild, G. Cilibertoet al., “A federated learning system for precision oncology in europe: Digione,” nature medicine, vol. 30, no. 2, pp. 334–337, 2024

work page 2024

[8] [8]

Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,

W. Ahmad, A. Vashist, N. Sinha, M. Prasad, V . Shrivastava, and J. H. Muzamal, “Enhancing transparency and privacy in financial fraud detection: The integration of explainable ai and federated learning,” in SEDE, 2024, pp. 139–156

work page 2024

[9] [9]

Topology- aware federated learning in edge computing: A comprehensive survey,

J. Wu, F. Dong, H. Leung, Z. Zhu, J. Zhou, and S. Drew, “Topology- aware federated learning in edge computing: A comprehensive survey,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–41, 2024. PREPRINT 13

work page 2024

[10] [10]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 35, no. 1, pp. 5–22, 2022

work page 2022

[11] [11]

Chameleon: Adapting to peer images for planting durable backdoors in federated learning,

Y . Dai and S. Li, “Chameleon: Adapting to peer images for planting durable backdoors in federated learning,” inICML, 2023

work page 2023

[12] [12]

Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,

S. Yang, J. Bai, K. Gao, Y . Yang, Y . Li, and S.-T. Xia, “Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers,” inCVPR, 2024, pp. 24 431–24 441

work page 2024

[13] [13]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020

work page 2020

[14] [14]

Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,

Q. Li, D. Wu, D. Zhou, C. Lin, S. Liu, C. Wang, and C. Shen, “Robust adversarial defenses in federated learning: Exploring the impact of data heterogeneity,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[15] [15]

Federated learning minimal model replacement attack using optimal transport: an attacker perspective,

K. N. Kumar, C. K. Mohan, and L. R. Cenkeramaddi, “Federated learning minimal model replacement attack using optimal transport: an attacker perspective,”IEEE Transactions on Information Forensics and Security, 2024

work page 2024

[16] [16]

Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,

P. Rieger, T. D. Nguyen, M. Miettinen, and A.-R. Sadeghi, “Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection,” inNDSS, 2022

work page 2022

[17] [17]

{FLAME}: Taming backdoors in federated learning,

T. D. Nguyen, P. Rieger, H. Chen, H. Yalame, H. M ¨ollering, H. Fer- eidooni, S. Marchal, M. Miettinen, A. Mirhoseini, S. Zeitouniet al., “{FLAME}: Taming backdoors in federated learning,” inUSENIX Security, 2022, pp. 1415–1432

work page 2022

[18] [18]

{BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,

S. Li and Y . Dai, “{BackdoorIndicator}: Leveraging{OOD}data for proactive backdoor detection in federated learning,” inUSENIX Security, 2024, pp. 4193–4210

work page 2024

[19] [19]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” inCVPR, 2015, pp. 427–436

work page 2015

[20] [20]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inICML, 2017, pp. 1321–1330

work page 2017

[21] [21]

Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,

M. Hein, M. Andriushchenko, and J. Bitterwolf, “Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem,” inCVPR, 2019, pp. 41–50

work page 2019

[22] [22]

Federated machine learning: Concept and applications,

Q. Yang, Y . Liu, T. Chen, and Y . Tong, “Federated machine learning: Concept and applications,”ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019

work page 2019

[23] [23]

Badnets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

work page 2019

[24] [24]

Wanet-imperceptible warping-based backdoor attack,

T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2021

work page 2021

[25] [25]

Backdoor attack with sparse and invisible trigger,

Y . Gao, Y . Li, X. Gong, Z. Li, S.-T. Xia, and Q. Wang, “Backdoor attack with sparse and invisible trigger,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 6364–6376, 2024

work page 2024

[26] [26]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

work page 2021

[27] [27]

Csba: Covert semantic backdoor attack against intelligent connected vehicles,

X. Xu, Y . Chen, B. Wang, Z. Bian, S. Han, C. Dong, C. Sun, W. Zhang, L. Xu, and P. Zhang, “Csba: Covert semantic backdoor attack against intelligent connected vehicles,”IEEE Transactions on Vehicular Technology, 2024

work page 2024

[28] [28]

Witches’ brew: Industrial scale data poisoning via gradient matching,

J. Geiping, L. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein, “Witches’ brew: Industrial scale data poisoning via gradient matching,”arXiv preprint arXiv:2009.02276, 2020

work page arXiv 2009

[29] [29]

Backdoor contrastive learning via bi-level trigger optimization,

W. Sun, X. Zhang, H. Lu, Y . Chen, T. Wang, J. Chen, and L. Lin, “Backdoor contrastive learning via bi-level trigger optimization,” in ICLR, 2024

work page 2024

[30] [30]

Input-aware dynamic backdoor attack,

T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in NeurIPS, 2020, pp. 3454–3464

work page 2020

[31] [31]

Toward stealthy backdoor attacks against speech recognition via elements of sound,

H. Cai, P. Zhang, H. Dong, Y . Xiao, S. Koffas, and Y . Li, “Toward stealthy backdoor attacks against speech recognition via elements of sound,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5852–5866, 2024

work page 2024

[32] [32]

Towards sample- specific backdoor attack with clean labels via attribute trigger,

M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, and Z. Qin, “Towards sample- specific backdoor attack with clean labels via attribute trigger,”IEEE Transactions on Dependable and Secure Computing, 2025

work page 2025

[33] [33]

Can you really backdoor federated learning?

Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?”arXiv preprint arXiv:1911.07963, 2019

work page arXiv 1911

[34] [34]

Neurotoxin: Durable backdoors in federated learning,

Z. Zhang, A. Panda, L. Song, Y . Yang, M. Mahoney, P. Mittal, R. Kan- nan, and J. Gonzalez, “Neurotoxin: Durable backdoors in federated learning,” inICML, 2022, pp. 26 429–26 446

work page 2022

[35] [35]

Edge-case backdoor attacks against federated learning,

B. Wang, Y . Yao, X. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Edge-case backdoor attacks against federated learning,” inNeurIPS Workshop, 2020

work page 2020

[36] [36]

Dba: Distributed backdoor attacks against federated learning,

C. Xie, K. Huang, P.-Y . Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” inICLR, 2019

work page 2019

[37] [37]

Coordinated backdoor attacks against federated learning with model- dependent triggers,

X. Gong, Y . Chen, H. Huang, Y . Liao, S. Wang, and Q. Wang, “Coordinated backdoor attacks against federated learning with model- dependent triggers,”IEEE network, vol. 36, no. 1, pp. 84–90, 2022

work page 2022

[38] [38]

Non- cooperative backdoor attacks in federated learning: A new threat land- scape,

T. Nguyen, D. T. Nguyen, K. D. Doan, and K.-S. Wong, “Non- cooperative backdoor attacks in federated learning: A new threat land- scape,”arXiv preprint arXiv:2407.07917, 2024

work page arXiv 2024

[39] [39]

Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,

B. Li, Y . Cai, H. Li, F. Xue, Z. Li, and Y . Li, “Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks,” inCVPR, 2024, pp. 24 523–24 533

work page 2024

[40] [40]

Towards reliable and efficient backdoor trigger inversion via decoupling benign features,

X. Xu, K. Huang, Y . Li, Z. Qin, and K. Ren, “Towards reliable and efficient backdoor trigger inversion via decoupling benign features,” in ICLR, 2024

work page 2024

[41] [41]

Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,

H. Qiu, Y . Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation,” inAsiaCCS, 2021, pp. 363–377

work page 2021

[42] [42]

Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,

B. Yi, T. Huang, S. Chen, T. Li, Z. Liu, Z. Chu, and Y . Li, “Probe before you talk: Towards black-box defense against backdoor unalignment for large language models,” inICLR, 2025

work page 2025

[43] [43]

Robust aggregation for federated learning,

K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,”IEEE Transactions on Signal Processing, vol. 70, pp. 1142–1154, 2022

work page 2022

[44] [45]

Defending against backdoors in federated learning with robust learning rate,

M. S. Ozdayi, M. Kantarcioglu, and Y . R. Gel, “Defending against backdoors in federated learning with robust learning rate,” inAAAI, 2021, pp. 9268–9276

work page 2021

[45] [46]

Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,

J. Zhang, C. Zhu, X. Sun, C. Ge, B. Chen, W. Susilo, and S. Yu, “Flpu- rifier: backdoor defense in federated learning via decoupled contrastive training,”IEEE Transactions on Information Forensics and Security, 2024

work page 2024

[46] [47]

Ma- chine learning with adversaries: Byzantine tolerant gradient descent,

P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Ma- chine learning with adversaries: Byzantine tolerant gradient descent,” in NeurIPS, 2017

work page 2017

[47] [48]

The limitations of federated learning in sybil settings,

C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings,” inRAID, 2020, pp. 301–316

work page 2020

[48] [49]

Fltrust: Byzantine-robust federated learning via trust bootstrapping,

X. Cao, M. Fang, J. Liu, and N. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” inNDSS, 2021

work page 2021

[49] [50]

Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,

Z. Ma, J. Ma, Y . Miao, Y . Li, and R. H. Deng, “Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning,”IEEE Transactions on Information Forensics and Security, 2022

work page 2022

[50] [51]

Rflbat: A robust fed- erated learning algorithm against backdoor attack,

Y . Wang, D. Zhai, Y . Zhan, and Y . Xia, “Rflbat: A robust fed- erated learning algorithm against backdoor attack,”arXiv preprint arXiv:2201.03772, 2022

work page arXiv 2022

[51] [52]

Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,

X. Zhou, X. Chen, S. Liu, X. Fan, Q. Sun, L. Chen, M. Qiu, and T. Xiang, “Flguardian: Defending against model poisoning attacks via fine-grained detection in federated learning,”IEEE Transactions on Information Forensics and Security, 2025

work page 2025

[52] [53]

Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,

Y . Li, J. He, H. Huang, J. Sun, and X. Ma, “Shortcuts everywhere and nowhere: Exploring multi-trigger backdoor attacks,”arXiv preprint arXiv:2401.15295, 2024

work page arXiv 2024

[53] [54]

Provably robust multi-bit watermarking for ai-generated text,

W. Qu, W. Zheng, T. Tao, D. Yin, Y . Jiang, Z. Tian, W. Zou, J. Jia, and J. Zhang, “Provably robust multi-bit watermarking for ai-generated text,” inUSENIX Security, 2025

work page 2025

[54] [55]

Rethinking data protection in the (generative) artificial intelligence era,

Y . Li, S. Shao, Y . He, J. Guo, T. Zhang, Z. Qin, P.-Y . Chen, M. Backes, P. Torr, D. Taoet al., “Rethinking data protection in the (generative) artificial intelligence era,”arXiv preprint arXiv:2507.03034, 2025

work page arXiv 2025

[55] [56]

Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,

S. Shao, Y . Li, H. Yao, Y . He, Z. Qin, and K. Ren, “Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution,” inNDSS, 2025

work page 2025

[56] [57]

Towards robust model watermark via reducing parametric vulnerability,

G. Gan, Y . Li, D. Wu, and S.-T. Xia, “Towards robust model watermark via reducing parametric vulnerability,” inICCV, 2023, pp. 4751–4761

work page 2023

[57] [58]

Emnist: Extending mnist to handwritten letters,

G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” inIJCNN, 2017, pp. 2921–2926

work page 2017

[58] [59]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep. TR-2009, 2009

work page 2009

[59] [60]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

work page 2016

[60] [61]

Federated optimization in heterogeneous networks,

T. Li, A. K. Sahu, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,” inMLSys, 2020

work page 2020

[61] [62]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [63]

How to backdoor federated learning,

E. Bagdasaryan, A. Veit, Y . Hua, D. Estrin, and V . Shmatikov, “How to backdoor federated learning,” inAISTATS, 2020, pp. 2938–2948. PREPRINT 1 1000 1100 1200 0.0 0.5 1.0 Attacker 0 1000 1100 1200 0.0 0.5 1.0 Benign 3 1000 1100 1200 0.0 0.5 1.0 Benign 6 1000 1100 1200 0.0 0.5 1.0 Benign 9 1000 1100 1200 0.0 0.5 1.0 Benign 12 1000 1100 1200 0.0 0.5 1.0 Be...

work page 2020