pith. sign in

arxiv: 2605.18908 · v1 · pith:IB3HZG7Bnew · submitted 2026-05-17 · 💻 cs.CR · cs.AI· cs.LG

Fast and Lightweight Backdoor Detection via Head Random Probing

Pith reviewed 2026-05-20 12:35 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords backdoor detectiondata-free detectionprediction head probingneural network auditingDNN securitypost-training detectionrandom latent probes
0
0 comments X

The pith

Backdoored neural networks concentrate responses on the target class when random latent probes are sent directly into the prediction head.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HTell, a detector that identifies backdoors in already-trained deep neural networks without any clean data, surrogate data, gradients, or trigger reconstruction. It creates architecture-aware random latent probes and routes them straight into the prediction head, then measures whether responses concentrate abnormally on one class. Backdoored models reliably display this concentration while clean models do not. The method reports 99.03 percent true-positive rate and 2.11 percent false-positive rate on a benchmark of more than 6,000 backdoored models and over 700 clean ones spanning four datasets, 14 architectures, and 21 attack types. It runs in 12.69 milliseconds per model, more than 30,000 times faster than representative gradient-based detectors.

Core claim

HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics; backdoored models exhibit abnormal response concentration on the target class under these probes.

What carries the argument

Head random probing: random latent inputs fed only to the prediction head followed by class-wise response concentration analysis.

Load-bearing premise

Backdoored models exhibit abnormal response concentration on the target class under random latent probes to the prediction head.

What would settle it

A backdoored model that produces evenly distributed class responses instead of target-class concentration when the prediction head receives random latent probes would invalidate the detection rule.

Figures

Figures reproduced from arXiv: 2605.18908 by Chunwei Tian, Daoqiang Zhang, Jiajia Liu, Jing Fang, Qi Zhu, Xueyu Yin, Yinbo Yu.

Figure 1
Figure 1. Figure 1: Method comparisons. Existing post-training methods [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: T-SNE visualization of clean and poisoned latent features extracted by backdoored models and random backdoor probes. All backdoored [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Badnets with different patch triggers. attacks (TBT, HPT). In patch-based attacks, besides white patches, we also introduce 8 new patches with various patching locations, colors, and textures (see [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Detection accuracies under different σ of backdoor probes. The x-axis coordinate value represents multiples of |Snoise| max . D. Ablation and Sensitivity Analysis We further analyze key design choices of HTell as follows: 1) Probe Distribution Analysis: HTell employs either uni￾form or Gaussian probes according to the coarse latent ac￾tivation range. To validate this design, we compare different probe dist… view at source ↗
Figure 5
Figure 5. Figure 5: Applying HTell to object detection and sequential decision [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes HTell, a data-free backdoor detector that generates architecture-aware random latent probes, feeds them directly into the prediction head, and detects backdoors via class-wise response concentration statistics on the target label. It reports 99.03% true positive rate and 2.11% false positive rate on a benchmark of more than 6,000 backdoored models and over 700 clean models spanning 4 datasets, 14 architectures, and 21 attack types, with 12.69 ms/model latency (over 30,000× faster than gradient-based detectors).

Significance. If the central empirical observation holds—that backdoored models reliably exhibit detectable response concentration on the target class under random head probes across the evaluated attacks and architectures—the approach would represent a substantial practical advance for scalable, data-free model auditing in security-critical settings. The scale of the benchmark and the extreme efficiency are clear strengths that could enable large-scale deployment where existing methods are prohibitive.

major comments (3)
  1. [§3] §3 (Head Random Probing): The claim that all 21 attack types produce a unified, detectable head-level bias (abnormal concentration on the target class for random non-trigger probes) is load-bearing for the general applicability, yet the manuscript provides no per-attack analysis or mechanistic explanation of why attacks primarily modifying earlier layers must induce this specific head statistic; without it, the 99.03% aggregate TPR may not generalize beyond the benchmark.
  2. [§4.2] §4.2 (Evaluation): The reported TPR/FPR figures are aggregates only; absent a breakdown table by attack type or architecture showing uniform separation, it remains possible that a subset of the 21 attacks evades the concentration signal, undermining the cross-attack robustness asserted in the abstract.
  3. [§3.2] §3.2 (Probe Generation and Threshold): The concentration metric and decision threshold are presented as fixed, but no sensitivity study to probe distribution parameters or threshold choice is reported; this leaves open whether the separation is an intrinsic property or partly an artifact of benchmark-specific tuning.
minor comments (2)
  1. [Figure 3] Figure 3 (response distribution plots): axis labels and legend entries for the clean vs. backdoored histograms could be enlarged for readability.
  2. The manuscript cites prior detectors but could add a short related-work paragraph explicitly contrasting HTell with other recent data-free or head-only methods.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback and for highlighting the strengths of our large-scale benchmark and the practical efficiency of HTell. We address each major comment below, proposing targeted revisions to improve clarity and robustness where appropriate.

read point-by-point responses
  1. Referee: [§3] §3 (Head Random Probing): The claim that all 21 attack types produce a unified, detectable head-level bias (abnormal concentration on the target class for random non-trigger probes) is load-bearing for the general applicability, yet the manuscript provides no per-attack analysis or mechanistic explanation of why attacks primarily modifying earlier layers must induce this specific head statistic; without it, the 99.03% aggregate TPR may not generalize beyond the benchmark.

    Authors: We agree that a per-attack breakdown would strengthen the presentation of cross-attack robustness. The manuscript's core contribution is the empirical demonstration that backdoored models exhibit this head-level concentration bias across the 21 evaluated attack types, supported by the aggregate results on over 6,000 models. A comprehensive mechanistic account of how every attack variant (including those primarily affecting earlier layers) propagates to produce this specific head statistic lies beyond the empirical scope of the current work. In the revision we will add a table reporting TPR/FPR per attack type to confirm consistency of the signal. revision: partial

  2. Referee: [§4.2] §4.2 (Evaluation): The reported TPR/FPR figures are aggregates only; absent a breakdown table by attack type or architecture showing uniform separation, it remains possible that a subset of the 21 attacks evades the concentration signal, undermining the cross-attack robustness asserted in the abstract.

    Authors: We accept this observation. While the aggregate metrics reflect strong overall performance, disaggregated results will better address potential concerns about non-uniform behavior. We will include a breakdown table by attack type and architecture in the revised manuscript. revision: yes

  3. Referee: [§3.2] §3.2 (Probe Generation and Threshold): The concentration metric and decision threshold are presented as fixed, but no sensitivity study to probe distribution parameters or threshold choice is reported; this leaves open whether the separation is an intrinsic property or partly an artifact of benchmark-specific tuning.

    Authors: The concentration statistic is computed directly from class-wise response distributions under architecture-aware random probes, and the threshold is calibrated on clean-model statistics to control FPR. We will add a sensitivity study in the revision examining variations in probe distribution parameters and threshold values to demonstrate that the separation is robust rather than benchmark-specific. revision: yes

standing simulated objections not resolved
  • Mechanistic explanation of why attacks that primarily modify earlier layers reliably induce the specific head-level response concentration on the target class

Circularity Check

0 steps flagged

No significant circularity; detection rests on observable empirical property.

full rationale

The paper's core claim is that backdoored models exhibit abnormal response concentration on the target class when random latent probes are fed to the prediction head. This property is presented as a unified manifestation observed across attacks, not derived by fitting parameters to the target detection result or by self-referential definition. HTell simply measures class-wise statistics on architecture-aware random probes without data, gradients, or optimization. The large-scale evaluation (6000+ backdoored models, 700+ clean models across 21 attacks and 14 architectures) serves as independent validation rather than a closed loop. No self-citation chains, uniqueness theorems, or ansatz smuggling appear in the derivation; the method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical premise that backdoor triggers produce a detectable concentration pattern at the head; no explicit free parameters, new physical entities, or additional axioms beyond standard neural-network assumptions are stated in the abstract.

axioms (1)
  • domain assumption Backdoored models exhibit abnormal response concentration on the target class under random latent probes
    This premise is invoked to justify data-free detection without trigger reconstruction.

pith-pipeline@v0.9.0 · 5795 in / 1296 out tokens · 31990 ms · 2026-05-20T12:35:07.620785+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 3 internal anchors

  1. [1]

    Backdoor learning: A survey,

    Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE TNNLS, vol. 35, no. 1, pp. 5–22, 2022

  2. [2]

    Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

    B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE S&P, 2019, pp. 707–723

  3. [3]

    Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,

    H. Wang, Z. Xiang, D. J. Miller, and G. Kesidis, “Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,” inIEEE S&P, 2024, pp. 1994–2012

  4. [4]

    Rethinking the reverse- engineering of trojan triggers,

    Z. Wang, K. Mei, H. Ding, J. Zhai, and S. Ma, “Rethinking the reverse- engineering of trojan triggers,” vol. 35, pp. 9738–9753, 2022

  5. [5]

    Freeea- gle: Detecting complex neural trojans in data-free cases,

    C. Fu, X. Zhang, S. Ji, T. Wang, P. Lin, Y . Feng, and J. Yin, “Freeea- gle: Detecting complex neural trojans in data-free cases,” inUSENIX Security, 2023, pp. 6399–6416

  6. [6]

    Barbie: Robust backdoor detection based on latent separability,

    H. Zhang, Y . Bai, Y . Chen, Z. Ma, and W. Xu, “Barbie: Robust backdoor detection based on latent separability,” inNDSS, 2025

  7. [7]

    Detecting backdoor attacks on deep neural networks by activation clustering,

    B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” inAAAI Workshop, 2019

  8. [8]

    Spectral signatures in backdoor attacks,

    B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” NeurIPS, vol. 31, 2018

  9. [9]

    Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,

    J.-L. Yin, W. Wang, W. Lin, X. Liuet al., “Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,” inAAAI, vol. 39, no. 9, 2025, pp. 9508–9516

  10. [10]

    Need for speed: Taming backdoor attacks with speed and precision,

    Z. Ma, Y . Yang, Y . Liu, T. Yang, X. Liu, T. Li, and Z. Qin, “Need for speed: Taming backdoor attacks with speed and precision,” inIEEE S&P, 2024, pp. 1217–1235

  11. [11]

    Test-time backdoor detection for object detection models,

    H. Zhang, Y . Wang, S. Yan, C. Zhu, Z. Zhou, L. Hou, S. Hu, M. Li, Y . Zhang, and L. Y . Zhang, “Test-time backdoor detection for object detection models,” inCVPR, 2025, pp. 24 377–24 386. 12

  12. [12]

    Trojan signatures in dnn weights,

    G. Fields, M. Samragh, M. Javaheripi, F. Koushanfar, and T. Javidi, “Trojan signatures in dnn weights,” inICCV, 2021, pp. 12–20

  13. [13]

    Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,

    Q. Zhou, W. Luo, Z. Ye, and Y . Tang, “Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,” inIJCNN. IEEE, 2024, pp. 1–9

  14. [14]

    Data free backdoor attacks,

    B. Cao, J. Jia, C. Hu, W. Guo, Z. Xiang, J. Chen, B. Li, and D. Song, “Data free backdoor attacks,”NeurIPS, vol. 37, pp. 23 881–23 911, 2024

  15. [15]

    Practical detection of trojan neural networks: Data-limited and data- free cases,

    R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” inECCV, 2020, pp. 222–238

  16. [16]

    Tbt: Targeted neural network attack with bit trojan,

    A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” inCPVR, 2020, pp. 13 198–13 207

  17. [17]

    Model x- ray: Detecting backdoored models via decision boundary,

    Y . Su, J. Zhang, T. Xu, T. Zhang, W. Zhang, and N. Yu, “Model x- ray: Detecting backdoored models via decision boundary,” inACM MM, 2024, pp. 10 296–10 305

  18. [18]

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

    H. Karimi, T. Derr, and J. Tang, “Characterizing the decision boundary of deep neural networks,”arXiv preprint arXiv:1912.11460, 2019

  19. [19]

    Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,

    G. Somepalli, L. Fowl, A. Bansal, P. Yeh-Chiang, Y . Dar, R. Baraniuk, M. Goldblum, and T. Goldstein, “Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,” inCVPR, 2022, pp. 13 699–13 708

  20. [20]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,”arXiv preprint arXiv:1708.06733, 2017

  21. [21]

    Revisiting the assumption of latent separability for backdoor defenses,

    X. Qi, T. Xie, Y . Li, S. Mahloujifar, and P. Mittal, “Revisiting the assumption of latent separability for backdoor defenses,” inICLR, 2023

  22. [22]

    Lotus: Evasive and resilient backdoor attacks through sub-partitioning,

    S. Cheng, G. Tao, Y . Liu, G. Shen, S. An, S. Feng, X. Xu, K. Zhang, S. Ma, and X. Zhang, “Lotus: Evasive and resilient backdoor attacks through sub-partitioning,” inCVPR, 2024, pp. 24 798–24 809

  23. [23]

    Input-aware dynamic backdoor attack,

    T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” NeurIPS, vol. 33, pp. 3454–3464, 2020

  24. [24]

    Invisible backdoor attack with sample-specific triggers,

    Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

  25. [25]

    Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,

    Z. Wang, J. Zhai, and S. Ma, “Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,” inCVPR, 2022, pp. 15 074–15 084

  26. [26]

    Blind backdoors in deep learning models,

    E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

  27. [27]

    Hardly perceptible trojan attack against neural networks with bit flips,

    J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly perceptible trojan attack against neural networks with bit flips,” in ECCV. Springer, 2022, pp. 104–121

  28. [28]

    Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,

    J. Jia, Y . Liu, and N. Z. Gong, “Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,” inIEEE S&P, 2022, pp. 2043–2059

  29. [29]

    Distribution preserving backdoor attack in self-supervised learning,

    G. Tao, Z. Wang, S. Feng, G. Shen, S. Ma, and X. Zhang, “Distribution preserving backdoor attack in self-supervised learning,” inIEEE S&P, 2024, pp. 2029–2047

  30. [30]

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

    F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,”arXiv preprint arXiv:1602.07360, 2016

  31. [31]

    Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

    X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

  32. [32]

    Wanet-imperceptible warping-based backdoor attack,

    T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2020

  33. [33]

    Lira: Learnable, imperceptible and robust backdoor attacks,

    K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

  34. [34]

    Trojaning attack on neural networks,

    Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS, 2018

  35. [35]

    Rethinking the backdoor attacks’ triggers: A frequency perspective,

    Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

  36. [36]

    A data-free backdoor injection approach in neural networks,

    P. Lv, C. Yue, R. Liang, Y . Yang, S. Zhang, H. Ma, and K. Chen, “A data-free backdoor injection approach in neural networks,” inUSENIX Security, 2023, pp. 2671–2688

  37. [37]

    A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,

    Y . Yu, J. Liu, H. Guo, B. Mao, and N. Kato, “A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,”IEEE JSAC, vol. 42, no. 4, pp. 948–962, 2024

  38. [38]

    Live trojan attacks on deep neural networks,

    R. Costales, C. Mao, R. Norwitz, B. Kim, and J. Yang, “Live trojan attacks on deep neural networks,” inCVPR, 2020, pp. 796–797

  39. [39]

    Bit-flip attack: Crushing neural network with progressive bit search,

    A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network with progressive bit search,” inICCV, 2019, pp. 1211–1220

  40. [40]

    A new backdoor attack in cnns by training set corruption without label poisoning,

    M. Barni, K. Kallas, and B. Tondi, “A new backdoor attack in cnns by training set corruption without label poisoning,” inIEEE ICIP, 2019

  41. [41]

    Label-Consistent Backdoor Attacks, December 2019

    A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor at- tacks,”arXiv preprint arXiv:1912.02771, 2019

  42. [42]

    Narcissus: A practical clean-label backdoor attack with limited information,

    Y . Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia, “Narcissus: A practical clean-label backdoor attack with limited information,” inCCS, 2023, pp. 771–785

  43. [43]

    Backdoor defense via decoupling the training process,

    K. Huang, Y . Li, B. Wu, Z. Qin, and K. Ren, “Backdoor defense via decoupling the training process,”arXiv preprint arXiv:2202.03423, 2022

  44. [44]

    Strip: A defence against trojan attacks on deep neural networks,

    Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019, pp. 113–125

  45. [45]

    Februus: Input purification defense against trojan attacks on deep neural network systems,

    B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” inACSAC, 2020, pp. 897–912

  46. [46]

    Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,

    J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, and C. Liu, “Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,” inICLR, 2023

  47. [47]

    Detecting backdoors during the inference stage based on corruption robustness consistency,

    X. Liu, M. Li, H. Wang, S. Hu, D. Ye, H. Jin, L. Wu, and C. Xiao, “Detecting backdoors during the inference stage based on corruption robustness consistency,” inCVPR, 2023, pp. 16 363–16 372

  48. [48]

    Detection of backdoors in trained classifiers without access to the training set,

    Z. Xiang, D. J. Miller, and G. Kesidis, “Detection of backdoors in trained classifiers without access to the training set,”IEEE TNNLS, vol. 33, no. 3, pp. 1177–1191, 2020

  49. [49]

    Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,

    D. Popovic, A. Sadeghi, T. Yu, S. Chawla, and I. Khalil, “Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,” inUSENIX Security, 2025

  50. [50]

    Abs: Scanning neural networks for back-doors by artificial brain stimulation,

    Y . Liu, W.-C. Lee, G. Tao, S. Ma, Y . Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” inACM CCS, 2019, pp. 1265–1282

  51. [51]

    Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks

    H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks.” in IJCAI, vol. 2, no. 5, 2019, p. 8

  52. [52]

    Detecting ai trojans using meta neural analysis,

    X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting ai trojans using meta neural analysis,” inIEEE S&P, 2021, pp. 103–120

  53. [53]

    Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,

    D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,” inUSENIX Security, 2021, pp. 1541–1558

  54. [54]

    Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,

    R. Cai, Z. Zhang, T. Chen, X. Chen, and Z. Wang, “Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,” inNeurIPS, 2022, pp. 33 876–33 889

  55. [55]

    Universal litmus patterns: Revealing backdoor attacks in cnns,

    S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” inCVPR, 2020, pp. 301– 310

  56. [56]

    Data-free backdoor removal based on channel lipschitzness,

    R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free backdoor removal based on channel lipschitzness,” inECCV, 2022, pp. 175–191

  57. [57]

    Exploring the orthogonality and linearity of backdoor attacks,

    K. Zhang, S. Cheng, G. Shen, G. Tao, S. An, A. Makur, S. Ma, and X. Zhang, “Exploring the orthogonality and linearity of backdoor attacks,” inIEEE S&P, 2024, pp. 2105–2123

  58. [58]

    Robust backdoor detection for deep learning via topological evolution dynamics,

    X. Mo, Y . Zhang, L. Y . Zhang, W. Luo, N. Sun, S. Hu, S. Gao, and Y . Xiang, “Robust backdoor detection for deep learning via topological evolution dynamics,” inIEEE S&P. IEEE, 2024, pp. 2048–2066

  59. [59]

    Backdoorbench: A comprehensive benchmark of backdoor learning,

    B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, and C. Shen, “Backdoorbench: A comprehensive benchmark of backdoor learning,” NeurIPS, vol. 35, pp. 10 546–10 559, 2022

  60. [60]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inICML, 2020, pp. 1597–1607

  61. [61]

    Reading digits in natural images with unsupervised feature learning,

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A. Y . Nget al., “Reading digits in natural images with unsupervised feature learning,” inNIPS workshop on deep learning and unsupervised feature learning, vol. 2011, no. 5, 2011, p. 7

  62. [62]

    Odscan: Backdoor scanning for object detection models,

    S. Cheng, G. Shen, G. Tao, K. Zhang, Z. Zhang, S. An, X. Xu, Y . Li, S. Ma, and X. Zhang, “Odscan: Backdoor scanning for object detection models,” inIEEE S&P, 2024, pp. 1703–1721

  63. [63]

    A temporal-pattern backdoor attack to deep reinforcement learning,

    Y . Yu, J. Liu, S. Li, K. Huang, and X. Feng, “A temporal-pattern backdoor attack to deep reinforcement learning,” inIEEE GLOBECOM, 2022, pp. 2710–2715

  64. [64]

    Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,

    Y . Chen, Z. Zheng, and X. Gong, “Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,”IEEE TDSC, vol. 20, no. 5, pp. 4188–4198, 2022