arxiv: 2604.21416 · v1 · submitted 2026-04-23 · 💻 cs.CR · cs.AI

Recognition: unknown

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi , Xin Guo , Huajie Chen , Tianqing Zhu , Bo Liu , Wanlei Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:46 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords backdoor defensepoisoning attackscluster segregationDBSCAN clusteringvirtual class relabelinglatent space analysisneural network securitytrigger concealment

0 comments

The pith

Poisoned training samples form isolated clusters in latent space early on, enabling a defense that identifies them via density clustering and neutralizes the backdoor by relabeling to a virtual class.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that backdoor poisoning attacks cause poisoned samples to separate into distinct clusters in a model's latent feature space during the initial training epochs, because the trigger patterns quickly become the dominant distinguishing features. Building on this observation, the authors introduce Cluster Segregation Concealment, which extracts features from those early epochs, applies DBSCAN to find anomalous clusters using density and class-diversity measures, and then conceals the identified poisons by assigning them to a new virtual class. The model is subsequently fine-tuned so that the trigger pattern links only to this harmless virtual label rather than the adversary's target label. This process keeps clean-data accuracy nearly intact while driving attack success rates close to zero across multiple datasets and attack variants, outperforming prior defenses that rely on unlearning or detection alone.

Core claim

The central claim is that poisoned samples from backdoor attacks form isolated clusters in latent space from the earliest training stages, with triggers acting as dominant features; Cluster Segregation Concealment exploits this by using DBSCAN on early-epoch features to segregate the anomalous clusters, then relabels those samples to a virtual class and fine-tunes the classifier via cross-entropy loss to replace the malicious backdoor association with a benign virtual linkage.

What carries the argument

Cluster Segregation Concealment (CSC), which performs feature extraction in early epochs, applies DBSCAN clustering with class-diversity and density metrics to identify poisoned clusters, and then conceals them through virtual-class relabeling followed by classifier fine-tuning.

If this is right

Training pipelines can continue to use standard supervised learning on potentially poisoned data and still achieve strong backdoor resistance without separate unlearning stages.
The same early-epoch clustering step can be repeated periodically during longer training runs to catch any new poisons introduced mid-training.
Because the concealment stage only modifies the final classifier, the method preserves the feature extractor learned on the full (mostly clean) data.
The defense does not require knowledge of the specific trigger pattern or target label, allowing it to address unknown attack variants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standard training workflows could be modified to checkpoint features after a small number of epochs and run the clustering step automatically, turning backdoor defense into a routine monitoring task.
If the early-cluster separation holds for attacks that insert multiple triggers or use dynamic triggers, the same pipeline could extend to those harder cases without redesign.
The virtual-class concealment idea might combine with existing certified defenses to provide both empirical suppression and formal guarantees on the remaining clean data.

Load-bearing premise

Poisoned samples will form isolated clusters in latent space early in training, with the trigger patterns acting as dominant features clearly distinct from those of benign samples.

What would settle it

A test on any dataset and attack variant in which DBSCAN applied to early-epoch latent features either misses most poisoned samples or incorrectly flags a large number of clean samples as anomalous, causing the concealment step to fail.

Figures

Figures reproduced from arXiv: 2604.21416 by Bo Liu, Huajie Chen, Tianqing Zhu, Wanlei Zhou, Xin Guo, Yuchen Shi.

**Figure 2.** Figure 2: Visualization of training samples in latent space at the early training [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: The time consumption of BadNets as an attack method on the CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluation of DBSCAN performance under different hyperparameters. [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CSC's early-epoch DBSCAN clustering plus virtual-class concealment is a distinct defense idea that claims near-zero ASR on many attacks, but the reliability of the cluster separation step remains the open question.

read the letter

The paper's core move is to train normally for a few epochs, pull features, run DBSCAN, and flag poisoned clusters by low class diversity and high density before relabeling those samples to a virtual class and fine-tuning the head. That two-stage structure, especially the concealment trick, is not the usual unlearning or trigger-reversal approach and gives the method its novelty. The evaluation covers twelve attacks on four datasets and reports beating nine existing defenses while keeping clean accuracy loss small, which is a practical result worth noting if the numbers hold up under inspection. The authors also supply an analysis of training dynamics showing that triggers tend to dominate early features, which grounds the design choice. The soft spot is exactly the one the stress-test flags: the method needs poisoned samples to form cleanly separable clusters early on. If that separation is weak for low-visibility triggers, adaptive attacks, or certain datasets, DBSCAN will either miss poisons or tag clean data, and the paper does not appear to include cluster purity, precision-recall curves, or ablation numbers for the identification stage. The DBSCAN eps, min-samples, and diversity thresholds are also free parameters that will need tuning per setting. This work is aimed at researchers who build or evaluate backdoor defenses in deployed networks. A reader who wants concrete empirical comparisons and a fresh mitigation angle will get value from it. It deserves peer review so referees can check the clustering robustness and run targeted failure cases.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes backdoor attack dynamics during DNN training and concludes that poisoned samples form isolated clusters in latent space after only a few epochs, with triggers acting as dominant features. It proposes Cluster Segregation Concealment (CSC): standard supervised training for early epochs, feature extraction, DBSCAN clustering, identification of anomalous clusters via class-diversity and density heuristics, relabeling of detected poisons to a virtual class, and fine-tuning of the classifier with cross-entropy loss to replace the backdoor mapping. The paper claims that CSC, evaluated on four benchmark datasets against twelve poisoning attacks, outperforms nine state-of-the-art defenses by driving average attack success rates to near zero while incurring only minimal clean-accuracy degradation.

Significance. If the empirical claims hold, CSC would constitute a useful advance in poison-restraint defenses by converting the adversary's early-epoch separability signal into a practical suppression mechanism that avoids the utility loss typical of unlearning-based methods. The breadth of the evaluation (twelve attacks, four datasets) is a positive feature that, once supported by detailed per-attack metrics and detection-quality statistics, could strengthen the case for deploying such dynamics-aware defenses in trustworthy AI pipelines.

major comments (2)

[Abstract / CSC method] Abstract and CSC method description: the central claim that poisoned samples form isolated, high-density clusters with low class diversity after only a few epochs (enabling reliable DBSCAN identification) is load-bearing for the entire defense, yet no cluster-purity, precision/recall, or false-positive rates for the identification step are reported across the twelve attacks. Without these numbers it is impossible to confirm that the subsequent concealment stage actually drives ASR to near zero rather than missing poisons or relabeling clean samples.
[Experimental evaluation] Experimental evaluation: although the abstract asserts that CSC reduces average ASR to near zero and outperforms nine SOTA defenses with minimal clean-accuracy loss, the manuscript supplies no per-attack ASR tables, clean-accuracy deltas, ablation results on DBSCAN eps/min_samples or the class-diversity/density thresholds, or failure-case analysis. These omissions make the superiority claim unverifiable and leave open the possibility that the reported performance does not survive detection errors or adaptive attacks.

minor comments (2)

[CSC concealment stage] The definition and handling of the invented 'virtual class' during fine-tuning and at inference time should be formalized with an equation or pseudocode to clarify its effect on the output layer and on clean-data predictions.
[Method] Notation for the early-epoch feature extractor and the exact DBSCAN parameters used in the reported experiments should be stated explicitly rather than left as free parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments correctly identify areas where additional empirical detail would strengthen the manuscript. We have revised the paper to incorporate the requested metrics, tables, and analyses, which we believe fully address the concerns while preserving the core contributions.

read point-by-point responses

Referee: [Abstract / CSC method] Abstract and CSC method description: the central claim that poisoned samples form isolated, high-density clusters with low class diversity after only a few epochs (enabling reliable DBSCAN identification) is load-bearing for the entire defense, yet no cluster-purity, precision/recall, or false-positive rates for the identification step are reported across the twelve attacks. Without these numbers it is impossible to confirm that the subsequent concealment stage actually drives ASR to near zero rather than missing poisons or relabeling clean samples.

Authors: We agree that reporting cluster-purity, precision, recall, and false-positive rates for the identification step is essential to substantiate the defense pipeline. The original manuscript prioritized end-to-end ASR and accuracy results, but we acknowledge this omission limits verification of the clustering stage. In the revised manuscript we have added a new subsection in the Experimental Evaluation section that tabulates these detection-quality statistics for every attack across all four datasets. The added results show consistently high precision and recall with low false-positive rates on clean samples, confirming that the concealment mechanism operates on reliably identified poisons rather than artifacts of mislabeling. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: although the abstract asserts that CSC reduces average ASR to near zero and outperforms nine SOTA defenses with minimal clean-accuracy loss, the manuscript supplies no per-attack ASR tables, clean-accuracy deltas, ablation results on DBSCAN eps/min_samples or the class-diversity/density thresholds, or failure-case analysis. These omissions make the superiority claim unverifiable and leave open the possibility that the reported performance does not survive detection errors or adaptive attacks.

Authors: We concur that per-attack granularity, ablation studies, and failure-case analysis are necessary for full verifiability. The original manuscript presented aggregate averages to emphasize overall performance; we have now expanded the evaluation section with complete per-attack, per-dataset tables for both ASR and clean-accuracy deltas. New appendix material includes ablations on DBSCAN eps and min_samples as well as the class-diversity and density thresholds, demonstrating stable performance across reasonable parameter ranges. A dedicated failure-case subsection analyzes instances of imperfect detection and shows that residual ASR remains near zero thanks to the concealment step. While we did not run new adaptive-attack experiments, the revised discussion addresses how the early-epoch clustering signal may limit certain adaptive strategies; we believe the breadth of the twelve-attack evaluation still supports the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical defense with independent experimental validation

full rationale

The paper's core contribution is an empirical defense (CSC) that first observes clustering of poisoned samples in early-epoch latent features, then applies DBSCAN plus heuristics to identify and conceal them via relabeling and fine-tuning. No equations, uniqueness theorems, or first-principles derivations are presented that reduce by construction to fitted parameters, self-citations, or ansatzes imported from prior work. The reported results consist of attack-success-rate measurements on 12 attacks across 4 datasets, which are falsifiable external benchmarks rather than tautological outputs of the method itself. Minor self-reference exists only in the sense that the clustering observation is used to motivate the algorithm, but this does not force the empirical outcomes.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Review is abstract-only, so the ledger is necessarily incomplete; the main unstated elements are the clustering hyper-parameters and the assumption that early separation is attack-invariant.

free parameters (2)

DBSCAN eps and min_samples
Control cluster formation; must be chosen or tuned and directly affect which samples are flagged as poisoned.
class-diversity and density thresholds
Used to label clusters as anomalous; chosen to separate poisoned from clean behavior.

axioms (1)

domain assumption Poisoned samples form isolated clusters in latent space early in training, with triggers as dominant features.
Invoked to justify the segregation stage; if false for some attacks, detection fails.

invented entities (1)

virtual class no independent evidence
purpose: New label assigned to detected poisoned samples so the classifier learns a benign association instead of the backdoor mapping.
Introduced by the paper; no independent evidence outside the defense itself.

pith-pipeline@v0.9.0 · 5550 in / 1624 out tokens · 37117 ms · 2026-05-09T21:46:50.260125+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 14 canonical work pages · 2 internal anchors

[1]

In: 2019 IEEE International Conference on Image Processing (ICIP)

Barni, M., Kallas, K., Tondi, B.: A new backdoor attack in cnns by training set corruption without label poisoning. In: 2019 IEEE International Conference on Image Processing (ICIP). pp. 101–105. IEEE (2019)

2019
[2]

In: 2021 IEEE symposium on security and privacy (SP)

Bourtoule, L., Chandrasekaran, V., Choquette-Choo, C.A., Jia, H., Travers, A., Zhang, B., Lie, D., Papernot, N.: Machine unlearning. In: 2021 IEEE symposium on security and privacy (SP). pp. 141–159. IEEE (2021)

2021
[3]

arXiv preprint arXiv:1811.03728 (2018)

Chen, B., Carvalho, W., Baracaldo, N., Ludwig, H., Edwards, B., Lee, T., Molloy, I.M., Srivastava, B.: Detecting backdoor attacks on deep neural networks by acti- vation clustering. CoRRabs/1811.03728(2018),http://arxiv.org/abs/1811. 03728

work page arXiv 2018
[4]

IEEE Transactions on Services Computing17(4), 1565–1579 (2024)

Chen, H., Zhu, T., Liu, C., Yu, S., Zhou, W.: High-frequency matters: attack and defense for image-processing model watermarking. IEEE Transactions on Services Computing17(4), 1565–1579 (2024)

2024
[5]

arXiv preprint arXiv:2603.01067 (2026)

Chen, H., Zhu, T., Yang, H., Zhong, Y., Zhang, Y., Sun, H., Xu, H., Ying, Z., Yin, L., Zhou, W.: Hide&seek: Remove image watermarks with negligible cost via pixel-wise reconstruction. arXiv preprint arXiv:2603.01067 (2026)

work page arXiv 2026
[6]

IEEE Transactions on Information Forensics and Security (2025)

Chen, H., Zhu, T., Zhang, L., Liu, B., Wang, D., Zhou, W., Xue, M.: Queen: Query unlearning against model extraction. IEEE Transactions on Information Forensics and Security (2025)

2025
[7]

arXiv preprint arXiv:2603.01053 (2026)

Chen, H., Zhu, T., Zhong, Y., Zhang, Y., Wang, S., He, F., Zhang, L., Shen, J., Wang, M., Zhou, W.: Turning black box into white box: Dataset distillation leaks. arXiv preprint arXiv:2603.01053 (2026)

work page arXiv 2026
[8]

In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K

Chen, W., Wu, B., Wang, H.: Effective backdoor defense by exploiting sensitivity of poisoned samples. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Ad- vances in Neural Information Processing Systems (2022),https://openreview. net/forum?id=AsH-Tx2U0Ug CSC: Turning the Adversary’s Poison against Itself 19

2022
[9]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)

work page internal anchor Pith review arXiv 2017
[10]

In: 2009 IEEE conference on computer vision and pattern recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. IEEE (2009)

2009
[11]

In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discov- ering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. p. 226–231. KDD’96, AAAI Press (1996)

1996
[13]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: Identifying vulnerabilities in the ma- chine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)

work page internal anchor Pith review arXiv 2017
[14]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016)

2016
[15]

ArXivabs/2002.11497(2020),https://api.semanticscholar.org/CorpusID: 211506328

Hong, S., Chandrasekaran, V., Kaya, Y., Dumitras, T., Papernot, N.: On the effectiveness of mitigating data poisoning attacks with gradient shaping. ArXivabs/2002.11497(2020),https://api.semanticscholar.org/CorpusID: 211506328

work page arXiv 2002
[16]

In: International Conference on Learning Representations (ICLR) (2022)

Huang, K., Li, Y., Wu, B., Qin, Z., Ren, K.: Backdoor defense via decoupling the training process. In: International Conference on Learning Representations (ICLR) (2022)

2022
[17]

ACM Comput

Kombrink, M.H., Geradts, Z.J.M.H., Worring, M.: Image steganography ap- proaches and their detection strategies: A survey. ACM Comput. Surv.57(2) (Oct 2024).https://doi.org/10.1145/3694965,https://doi.org/10.1145/3694965

work page doi:10.1145/3694965 2024
[18]

Krizhevsky, A.: Learning multiple layers of features from tiny images (2009), https://api.semanticscholar.org/CorpusID:18268744

2009
[19]

IEEE Transactions on Intelligent Transportation Systems22(2), 712–733 (2020)

Kuutti, S., Bowden, R., Jin, Y., Barber, P., Fallah, S.: A survey of deep learn- ing applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems22(2), 712–733 (2020)

2020
[20]

In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Anti-backdoor learning: Train- ing clean models on poisoned data. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021), https://openreview.net/forum?id=cAw860ncLRW

2021
[21]

In: International Con- ference on Learning Representations (2021),https://openreview.net/forum?id= 9l0K4OM-oXE

Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distilla- tion: Erasing backdoor triggers from deep neural networks. In: International Con- ference on Learning Representations (2021),https://openreview.net/forum?id= 9l0K4OM-oXE

2021
[22]

In: NeurIPS ML Safety Workshop (2022),https://openreview.net/forum?id=kwlkmbebcqP

Li, Y., Zhu, M., Luo, C., Weng, H., Jiang, Y., Wei, T., Xia, S.T.: BAAT: To- wards sample-specific backdoor attack with clean labels. In: NeurIPS ML Safety Workshop (2022),https://openreview.net/forum?id=kwlkmbebcqP

2022
[23]

Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample- specifictriggers.In:2021IEEE/CVFInternationalConferenceonComputerVision (ICCV). pp. 16443–16452 (2021).https://doi.org/10.1109/ICCV48922.2021. 01615

work page doi:10.1109/iccv48922.2021 2021
[24]

Milajerdi, Birhanu Eshete, Rigel Gjomemo, and V.N

Liu, Y., Lee, W.C., Tao, G., Ma, S., Aafer, Y., Zhang, X.: Abs: Scanning neu- ral networks for back-doors by artificial brain stimulation. In: Proceedings of 20 Yuchen et al. the 2019 ACM SIGSAC Conference on Computer and Communications Security. p. 1265–1282. CCS ’19, Association for Computing Machinery, New York, NY, USA(2019).https://doi.org/10.1145/33...

work page doi:10.1145/3319535.3363216 2019
[25]

In: 25th Annual Network and Distributed System Secu- rity Symposium, NDSS 2018, San Diego, California, USA, February 18-221, 2018

Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks. In: 25th Annual Network and Distributed System Secu- rity Symposium, NDSS 2018, San Diego, California, USA, February 18-221, 2018. The Internet Society (2018)

2018
[26]

Advances in Neural Information Processing Systems33, 3454–3464 (2020)

Nguyen, T.A., Tran, A.: Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems33, 3454–3464 (2020)

2020
[27]

In: International Conference on Learning Representations (2021),https: //openreview.net/forum?id=eEn8KTtJOx

Nguyen, T.A., Tran, A.T.: Wanet - imperceptible warping-based backdoor at- tack. In: International Conference on Learning Representations (2021),https: //openreview.net/forum?id=eEn8KTtJOx

2021
[28]

In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= _wSHsgrVali

Qi, X., Xie, T., Li, Y., Mahloujifar, S., Mittal, P.: Revisiting the assumption of latent separability for backdoor defenses. In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= _wSHsgrVali

2023
[29]

Annual review of biomedical engineering19(1), 221–248 (2017)

Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annual review of biomedical engineering19(1), 221–248 (2017)

2017
[30]

arXiv preprint arXiv:2603.04859 (2026)

Shi, Y., Chen, H., Xu, H., Liu, Z., Shen, J., Liu, C., Zhou, S., Zhu, T., Zhou, W.: Osmosis distillation: Model hijacking with the fewest samples. arXiv preprint arXiv:2603.04859 (2026)

work page arXiv 2026
[31]

computer: Bench- marking machine learning algorithms for traffic sign recognition

Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: Bench- marking machine learning algorithms for traffic sign recognition. Neural Net- works (0), – (2012).https://doi.org/10.1016/j.neunet.2012.02.016,http: //www.sciencedirect.com/science/article/pii/S0893608012000457

work page doi:10.1016/j.neunet.2012.02.016 2012
[32]

Tran,B.,Li,J.,Mądry,A.:Spectralsignaturesinbackdoorattacks.In:Proceedings of the 32nd International Conference on Neural Information Processing Systems. p. 8011–8021. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)

2018
[33]

Label-consistent back- door attacks,

Turner, A., Tsipras, D., Madry, A.: Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771 (2019)

work page arXiv 1912
[34]

In: 2019 IEEE symposium on security and privacy (SP)

Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., Zhao, B.Y.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE symposium on security and privacy (SP). pp. 707–723. IEEE (2019)

2019
[35]

Advances in Neural Information Processing Systems35, 36396–36410 (2022)

Wang, Z., Ding, H., Zhai, J., Ma, S.: Training with more confidence: Mitigating injected and natural backdoors during training. Advances in Neural Information Processing Systems35, 36396–36410 (2022)

2022
[36]

doi: 10.1145/3603620

Xu,H.,Zhu,T.,Zhang,L.,Zhou,W.,Yu,P.S.:Machineunlearning:Asurvey.ACM Comput. Surv.56(1) (Aug 2023).https://doi.org/10.1145/3603620,https:// doi.org/10.1145/3603620

work page doi:10.1145/3603620 2023
[37]

Assran, Q

Zhang, Z., Liu, Q., Wang, Z., Lu, Z., Hu, Q.: Backdoor defense via deconfounded representation learning. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12228–12238 (2023).https://doi.org/10. 1109/CVPR52729.2023.01177

work page arXiv 2023
[38]

In: International conference on machine learning

Zhu, C., Huang, W.R., Li, H., Taylor, G., Studer, C., Goldstein, T.: Transferable clean-label poisoning attacks on deep neural nets. In: International conference on machine learning. pp. 7614–7623. PMLR (2019)

2019