pith. sign in

arxiv: 2605.18907 · v1 · pith:RACEEXFFnew · submitted 2026-05-17 · 💻 cs.CR · cs.AI

Lightweight and Fast Backdoor Model Detection

Pith reviewed 2026-05-20 12:40 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords backdoor detectionneural network securityparameter anomalytrojan cluemodel inspectiondeep learning defensestatic analysis
0
0 comments X

The pith

DFBScanner detects backdoors by scoring anomalous parameter updates in the final classification layer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that backdoor attacks leave a consistent signature of anomalous updates specifically in the parameters of the final classification layer of a neural network. Instead of hunting for particular triggers or needing clean reference data, DFBScanner builds several statistical indicators of these anomalies and merges them into a single Trojan clue score. Maximum anomaly scoring on this clue then flags backdoored models. A reader would care because the method runs in one millisecond per model while maintaining a 97 percent true-positive rate across thousands of models, many architectures, and many attack variants.

Core claim

Backdoor-induced feature perturbations produce distinctive and anomalous parameter updates in the final classification layer. By constructing multiple anomaly indicators from these parameters and combining them strategically into a Trojan clue, DFBScanner detects backdoors through maximum anomaly scoring. This yields an attack-agnostic detector that works without clean samples or trigger knowledge.

What carries the argument

The Trojan clue, a composite of multiple anomaly indicators computed on the final-layer parameters, which produces a maximum anomaly score used for backdoor classification.

If this is right

  • Detection time drops to one millisecond per model, enabling real-time scanning of large model repositories.
  • The same indicators work across twenty trigger types, three injection methods, and both all-to-one and all-to-all attack strategies.
  • No clean reference samples or prior trigger knowledge are required for operation.
  • Performance holds over twelve network architectures and four datasets in a benchmark of more than five thousand models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Model marketplaces could run this check automatically on every uploaded checkpoint before distribution.
  • If final-layer anomalies prove reliable, similar lightweight inspections might extend to detecting other forms of model tampering such as weight poisoning or adversarial fine-tuning.
  • The observation suggests that many attacks converge on the same last-layer vulnerability, which could guide future defense design toward protecting or monitoring that layer specifically.

Load-bearing premise

Backdoor attacks of different types and injection methods all produce parameter anomalies in the final layer that remain separable from the parameter distributions of clean models.

What would settle it

A backdoor attack that succeeds while leaving the final-layer parameters statistically indistinguishable from those of clean models, or a set of clean models that consistently receive high anomaly scores under the proposed indicators.

Figures

Figures reproduced from arXiv: 2605.18907 by Chunwei Tian, Daoqiang Zhang, Jiajia Liu, Jing Fang, Qi Zhu, Xuewen Zhang, Yinbo Yu.

Figure 1
Figure 1. Figure 1: T-SNE visualization of backdoor (in red dots) and clean (in blue dots) latent features and violin plot of final-layer weights of different classes (including the poison class and other clean classes) under different attacks. The violin plot demonstrates the probability density of the weight distribution through kernel density estimation. All models are trained on CIFAR-10, and the poison label is 4. R18=Re… view at source ↗
Figure 2
Figure 2. Figure 2: Bias value of the clean and backdoor models’ final layer. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Different trigger patches. and Wanet. In total, we construct 382 clean models and 4,320 all-to-one backdoor models as the full benchmark. Besides, we further train 120 all-to-all backdoor models on CIFAR10 and GTSRB using 6 architectures and 10 attacks. For each attack, we follow a loop permutation to generate target and source class pairs, i.e., each class k ∈ K is a poison target class with a source clas… view at source ↗
Figure 4
Figure 4. Figure 4: Backdoor detection accuracy of parameter anomaly indicators using maximum anomaly indices. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Backdoor detection F1-score curve with different numbers [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cosine similarity of the anomaly score between benign and backdoor models using all indicators and selected indicators. CNN6- [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Deep neural networks (DNN), despite their remarkable performance, are highly vulnerable to backdoor attacks. Existing defenses mainly rely on activation anomaly analysis or trigger reverse engineering and often require clean samples or prior knowledge of trigger patterns, resulting in limited efficacy, practicability, and generalizability. More critically, while advanced attacks can implement backdoor implantation in milliseconds, current detection approaches typically demand minutes or even hours. To this end, we propose DFBScanner, a lightweight static parameter inspection framework for fast backdoor scanning. DFBScanner leverages our key observation that backdoor-induced feature perturbations can lead to distinctive and anomalous parameter updates in the final classification layer. Hence, we shift our detection focus from recognizing diverse and attack-specific trigger patterns targeted by prior work, to identifying the unified backdoor manifestation within the final layer, thereby enabling efficient and attack-agnostic detection. Specifically, by constructing and strategically combining multiple anomaly indicators of the final-layer parameters into a Trojan clue, DFBScanner detects backdoors through maximum anomaly scoring. DFBScanner is evaluated on a large-scale backdoor benchmark, including over 5,000 backdoor models trained on 4 datasets, 12 network architectures, 20 types of backdoor triggers, 2 attack strategies (all-to-one and -all), and 3 backdoor injection methods (data poisoning, training pipeline manipulation, and bit-flips). Numerical results show that DFBScanner achieves a 97.17% true-positive rate, 0.95% false-positive rate, and an average detection time of only 1 ms per model, significantly outperforming prior methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes DFBScanner, a lightweight static parameter inspection framework for detecting backdoored DNN models. It is based on the observation that backdoor-induced feature perturbations lead to distinctive anomalous parameter updates in the final classification layer. The method constructs and combines multiple anomaly indicators into a 'Trojan clue' and detects backdoors via maximum anomaly scoring. Evaluation is performed on a large-scale benchmark of over 5,000 models across 4 datasets, 12 network architectures, 20 trigger types, 2 attack strategies, and 3 injection methods (data poisoning, training pipeline manipulation, and bit-flips), reporting 97.17% true-positive rate, 0.95% false-positive rate, and 1 ms average detection time per model.

Significance. If the central observation and detection performance hold under scrutiny, this would represent a meaningful contribution to backdoor defense by enabling fast, attack-agnostic, and sample-free detection that scales to large model repositories, addressing the speed and generality limitations of activation-analysis or trigger-reversal approaches.

major comments (3)
  1. [§3.1] §3.1: The manuscript refers to 'constructing and strategically combining multiple anomaly indicators' of the final-layer parameters but provides no explicit mathematical definitions, normalization steps, or formulas for these indicators (e.g., no equations for statistical measures or weight perturbation quantification). This detail is load-bearing for reproducing the claimed 0.95% FPR and verifying that the indicators do not overlap with clean-model variation across the 12 architectures.
  2. [§4.2] §4.2 and Table 3: While the evaluation includes bit-flip attacks and reports high TPR, there is no per-injection-method breakdown or ablation showing that anomalous updates are concentrated in the final classification layer rather than distributed or sparse across earlier layers. Bit-flip attacks can target arbitrary weights, so the 'unified backdoor manifestation within the final layer' claim requires explicit evidence that the indicators capture these cases without significant clean-model overlap.
  3. [§3.3] §3.3: The threshold selection and weighting for the maximum anomaly scoring are not described (e.g., whether thresholds are fixed, cross-validated, or architecture-specific). Without this, it is difficult to assess the robustness of the reported metrics on the diverse benchmark of 5,000+ models.
minor comments (2)
  1. [§3.2] The notation for the 'Trojan clue' combination step could be formalized with a short equation or pseudocode for clarity.
  2. Figure 2 (or equivalent) showing example parameter distributions would benefit from explicit comparison between clean and backdoored models for each injection method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have highlighted important areas where additional clarity and evidence will strengthen the manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses
  1. Referee: [§3.1] §3.1: The manuscript refers to 'constructing and strategically combining multiple anomaly indicators' of the final-layer parameters but provides no explicit mathematical definitions, normalization steps, or formulas for these indicators (e.g., no equations for statistical measures or weight perturbation quantification). This detail is load-bearing for reproducing the claimed 0.95% FPR and verifying that the indicators do not overlap with clean-model variation across the 12 architectures.

    Authors: We agree that explicit mathematical definitions are required for reproducibility. In the revised manuscript we will expand §3.1 with precise formulas for each anomaly indicator, including the statistical measures (mean, standard deviation, and deviation from clean-model baselines), the normalization procedure applied to each, and the exact combination rule used to form the Trojan clue score. These additions will allow direct verification that the indicators exhibit limited overlap with clean-model parameter variation across the 12 architectures. revision: yes

  2. Referee: [§4.2] §4.2 and Table 3: While the evaluation includes bit-flip attacks and reports high TPR, there is no per-injection-method breakdown or ablation showing that anomalous updates are concentrated in the final classification layer rather than distributed or sparse across earlier layers. Bit-flip attacks can target arbitrary weights, so the 'unified backdoor manifestation within the final layer' claim requires explicit evidence that the indicators capture these cases without significant clean-model overlap.

    Authors: We acknowledge that a per-injection-method breakdown and layer-specific ablation would provide stronger support for the final-layer claim, especially for bit-flip attacks. In the revision we will add a new table in §4.2 reporting TPR/FPR separately for data poisoning, training-pipeline manipulation, and bit-flips. We will also include an ablation that quantifies the concentration of anomalous updates in the final layer versus earlier layers for the bit-flip subset, together with direct comparison against clean-model score distributions to confirm limited overlap. revision: yes

  3. Referee: [§3.3] §3.3: The threshold selection and weighting for the maximum anomaly scoring are not described (e.g., whether thresholds are fixed, cross-validated, or architecture-specific). Without this, it is difficult to assess the robustness of the reported metrics on the diverse benchmark of 5,000+ models.

    Authors: We thank the referee for noting this omission. The thresholds are fixed values derived from the 99th-percentile anomaly scores of clean models and are applied uniformly across all architectures; the indicators receive equal weight in the maximum anomaly score. In the revised §3.3 we will explicitly state this selection procedure, report the exact percentile used, and add a short sensitivity analysis demonstrating that the reported metrics remain stable across the 5,000-model benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity; detection method is empirically validated on independent benchmark

full rationale

The paper defines DFBScanner by constructing anomaly indicators on final-layer parameters and combining them into a Trojan clue for maximum anomaly scoring. This is presented as a shift to unified backdoor manifestation, then evaluated directly on a large-scale benchmark of over 5,000 models across multiple datasets, architectures, triggers, and injection methods. No equations, self-citations, or steps are shown that reduce the anomaly score or detection output to a fitted parameter or input defined by the method itself. The performance metrics (TPR, FPR, speed) are reported as empirical results rather than tautological consequences of the construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests primarily on the domain assumption that backdoors produce detectable final-layer anomalies and on any scoring thresholds used to combine indicators.

free parameters (1)
  • anomaly scoring thresholds or weights
    Parameters likely needed to combine multiple indicators into the Trojan clue and set the maximum anomaly decision boundary, though not detailed in the abstract.
axioms (1)
  • domain assumption Backdoor-induced feature perturbations lead to distinctive and anomalous parameter updates in the final classification layer.
    This is the central observation stated in the abstract upon which the entire detection approach is built.

pith-pipeline@v0.9.0 · 5832 in / 1443 out tokens · 53069 ms · 2026-05-20T12:40:29.137701+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 4 internal anchors

  1. [1]

    Badmerging: Backdoor attacks against model merging,

    J. Zhang, J. Chi, Z. Li, K. Cai, Y . Zhang, and Y . Tian, “Badmerging: Backdoor attacks against model merging,” inACM CCS, 2024, pp. 4450– 4464

  2. [2]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,”arXiv preprint arXiv:1708.06733, 2017

  3. [3]

    Spectral signatures in backdoor attacks,

    B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” NeurIPS, vol. 31, 2018

  4. [4]

    Detecting backdoor attacks on deep neural networks by activation clustering,

    B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” inAAAI Workshop, 2019

  5. [5]

    Poison as a cure: Detecting & neutralizing variable-sized backdoor attacks in deep neural networks,

    A. Chan and Y .-S. Ong, “Poison as a cure: Detecting & neutralizing variable-sized backdoor attacks in deep neural networks,”arXiv preprint arXiv:1911.08040, 2019

  6. [6]

    Detecting ai trojans using meta neural analysis,

    X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting ai trojans using meta neural analysis,” inIEEE S&P, 2021, pp. 103–120

  7. [7]

    Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks

    H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks.” in IJCAI, vol. 2, no. 5, 2019, p. 8

  8. [8]

    Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

    B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE S&P, 2019, pp. 707–723

  9. [9]

    Defending neural backdoors via generative distribution modeling,

    X. Qiao, Y . Yang, and H. Li, “Defending neural backdoors via generative distribution modeling,” inNeurIPS, vol. 32, 2019

  10. [10]

    Rethinking the reverse- engineering of trojan triggers,

    Z. Wang, K. Mei, H. Ding, J. Zhai, and S. Ma, “Rethinking the reverse- engineering of trojan triggers,” vol. 35, pp. 9738–9753, 2022

  11. [11]

    Need for speed: Taming backdoor attacks with speed and precision,

    Z. Ma, Y . Yang, Y . Liu, T. Yang, X. Liu, T. Li, and Z. Qin, “Need for speed: Taming backdoor attacks with speed and precision,” inIEEE S&P, 2024, pp. 1217–1235

  12. [12]

    Practical detection of trojan neural networks: Data-limited and data- free cases,

    R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” inECCV, 2020, pp. 222–238

  13. [13]

    Freeea- gle: Detecting complex neural trojans in data-free cases,

    C. Fu, X. Zhang, S. Ji, T. Wang, P. Lin, Y . Feng, and J. Yin, “Freeea- gle: Detecting complex neural trojans in data-free cases,” inUSENIX Security, 2023, pp. 6399–6416

  14. [14]

    Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,

    Q. Zhou, W. Luo, Z. Ye, and Y . Tang, “Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,” inIJCNN. IEEE, 2024, pp. 1–9

  15. [15]

    Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,

    H. Wang, Z. Xiang, D. J. Miller, and G. Kesidis, “Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,” inIEEE S&P, 2024, pp. 1994–2012. 12

  16. [16]

    Barbie: Robust backdoor detection based on latent separability,

    H. Zhang, Y . Bai, Y . Chen, Z. Ma, and W. Xu, “Barbie: Robust backdoor detection based on latent separability,” inNDSS, 2025

  17. [17]

    Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning,

    Z. Sun, T. Cong, Y . Liu, C. Lin, X. He, R. Chen, X. Han, and X. Huang, “Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning,” inIEEE S&P, 2025, pp. 1713–1731

  18. [18]

    Data free backdoor attacks,

    B. Cao, J. Jia, C. Hu, W. Guo, Z. Xiang, J. Chen, B. Li, and D. Song, “Data free backdoor attacks,”NeurIPS, vol. 37, pp. 23 881–23 911, 2024

  19. [19]

    Tbt: Targeted neural network attack with bit trojan,

    A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” inCPVR, 2020, pp. 13 198–13 207

  20. [20]

    Live trojan attacks on deep neural networks,

    R. Costales, C. Mao, R. Norwitz, B. Kim, and J. Yang, “Live trojan attacks on deep neural networks,” inCVPR, 2020, pp. 796–797

  21. [21]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  22. [22]

    Blind backdoors in deep learning models,

    E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

  23. [23]

    Hardly perceptible trojan attack against neural networks with bit flips,

    J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly perceptible trojan attack against neural networks with bit flips,” in ECCV. Springer, 2022, pp. 104–121

  24. [24]

    Trojaning attack on neural networks,

    Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS, 2018

  25. [25]

    Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

    X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

  26. [26]

    Lira: Learnable, imperceptible and robust backdoor attacks,

    K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

  27. [27]

    Invisible backdoor attack with sample-specific triggers,

    Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

  28. [28]

    Rethinking the backdoor attacks’ triggers: A frequency perspective,

    Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

  29. [29]

    Revisiting the assumption of latent separability for backdoor defenses,

    X. Qi, T. Xie, Y . Li, S. Mahloujifar, and P. Mittal, “Revisiting the assumption of latent separability for backdoor defenses,” inICLR, 2023

  30. [30]

    Distribution preserving backdoor attack in self-supervised learning,

    G. Tao, Z. Wang, S. Feng, G. Shen, S. Ma, and X. Zhang, “Distribution preserving backdoor attack in self-supervised learning,” inIEEE S&P, 2024, pp. 2029–2047

  31. [31]

    Wanet-imperceptible warping-based backdoor attack,

    T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2020

  32. [32]

    Practical attacks on deep neural networks by memory trojaning,

    X. Hu, Y . Zhao, L. Deng, L. Liang, P. Zuo, J. Ye, Y . Lin, and Y . Xie, “Practical attacks on deep neural networks by memory trojaning,”IEEE TCAD, vol. 40, no. 6, pp. 1230–1243, 2020

  33. [33]

    Bit-flip attack: Crushing neural network with progressive bit search,

    A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network with progressive bit search,” inICCV, 2019, pp. 1211–1220

  34. [34]

    Input-aware dynamic backdoor attack,

    T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” NeurIPS, vol. 33, pp. 3454–3464, 2020

  35. [35]

    Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,

    Z. Wang, J. Zhai, and S. Ma, “Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,” inCVPR, 2022, pp. 15 074–15 084

  36. [36]

    Poison ink: Robust and invisible backdoor attack,

    J. Zhang, C. Dongdong, Q. Huang, J. Liao, W. Zhang, H. Feng, G. Hua, and N. Yu, “Poison ink: Robust and invisible backdoor attack,”IEEE TIP, vol. 31, pp. 5691–5705, 2022

  37. [37]

    Backdoor defense via decoupling the training process,

    K. Huang, Y . Li, B. Wu, Z. Qin, and K. Ren, “Backdoor defense via decoupling the training process,”arXiv preprint arXiv:2202.03423, 2022

  38. [38]

    Strip: A defence against trojan attacks on deep neural networks,

    Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019, pp. 113–125

  39. [39]

    Februus: Input purification defense against trojan attacks on deep neural network systems,

    B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” inACSAC, 2020, pp. 897–912

  40. [40]

    Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,

    J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, and C. Liu, “Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,” inICLR, 2023

  41. [41]

    Detection of backdoors in trained classifiers without access to the training set,

    Z. Xiang, D. J. Miller, and G. Kesidis, “Detection of backdoors in trained classifiers without access to the training set,”IEEE TNNLS, vol. 33, no. 3, pp. 1177–1191, 2020

  42. [42]

    Abs: Scanning neural networks for back-doors by artificial brain stimulation,

    Y . Liu, W.-C. Lee, G. Tao, S. Ma, Y . Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” inACM CCS, 2019, pp. 1265–1282

  43. [43]

    Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,

    D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,” inUSENIX Security, 2021, pp. 1541–1558

  44. [44]

    Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,

    R. Cai, Z. Zhang, T. Chen, X. Chen, and Z. Wang, “Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,” inNeurIPS, 2022, pp. 33 876–33 889

  45. [45]

    Universal litmus patterns: Revealing backdoor attacks in cnns,

    S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” inCVPR, 2020, pp. 301– 310

  46. [46]

    Trojan signatures in dnn weights,

    G. Fields, M. Samragh, M. Javaheripi, F. Koushanfar, and T. Javidi, “Trojan signatures in dnn weights,” inICCV, 2021, pp. 12–20

  47. [47]

    Deephammer: Depleting the intelli- gence of deep neural networks through targeted chain of bit flips,

    F. Yao, A. S. Rakin, and D. Fan, “Deephammer: Depleting the intelli- gence of deep neural networks through targeted chain of bit flips,” in USENIX Security, 2020, pp. 1463–1480

  48. [48]

    Hugging face – the ai community building the future,

    H. Face, “Hugging face – the ai community building the future,” https: //huggingface.co

  49. [49]

    Proflip: Targeted trojan attack with progressive bit flips,

    H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Proflip: Targeted trojan attack with progressive bit flips,” inICCV, 2021, pp. 7718–7727

  50. [50]

    Contrastive neuron pruning for backdoor defense,

    Y . Feng, B. Ma, D. Liu, Y . Zhang, W. Cai, and Y . Xia, “Contrastive neuron pruning for backdoor defense,”IEEE TIP, vol. 34, pp. 1234– 1245, 2025

  51. [51]

    Evidential deep learning to quantify classification uncertainty,

    M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,”NeurIPS, vol. 31, 2018

  52. [52]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998

  53. [53]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

  54. [54]

    Detection of traffic signs in real-world images: The german traffic sign detection benchmark,

    S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The german traffic sign detection benchmark,” inIJCNN. IEEE, 2013, pp. 1–8

  55. [55]

    Tiny imagenet visual recognition challenge,

    Y . Le and X. Yang, “Tiny imagenet visual recognition challenge,”CS 231N, vol. 7, no. 7, p. 3, 2015

  56. [56]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inCVPR, 2009, pp. 248–255

  57. [57]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

  58. [58]

    Inception-v4, inception-resnet and the impact of residual connections on learning,

    C. Szegedy, S. Ioffe, V . Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” inAAAI, vol. 31, no. 1, 2017

  59. [59]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014

  60. [60]

    Efficientnet: Rethinking model scaling for convo- lutional neural networks,

    M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convo- lutional neural networks,” inICML. PMLR, 2019, pp. 6105–6114

  61. [61]

    Searching for mobilenetv3,

    A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevanet al., “Searching for mobilenetv3,” in ICCV, 2019, pp. 1314–1324

  62. [62]

    Backdoorbench: A comprehensive benchmark of backdoor learning,

    B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, and C. Shen, “Backdoorbench: A comprehensive benchmark of backdoor learning,” NeurIPS, vol. 35, pp. 10 546–10 559, 2022

  63. [63]

    Pyod: A python toolbox for scalable outlier detection,

    Y . Zhao, Z. Nasrullah, and Z. Li, “Pyod: A python toolbox for scalable outlier detection,”Journal of machine learning research, vol. 20, no. 96, pp. 1–7, 2019

  64. [64]

    Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,

    G. Somepalli, L. Fowl, A. Bansal, P. Yeh-Chiang, Y . Dar, R. Baraniuk, M. Goldblum, and T. Goldstein, “Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,” inCVPR, 2022, pp. 13 699–13 708

  65. [65]

    Exploring the orthogonality and linearity of backdoor attacks,

    K. Zhang, S. Cheng, G. Shen, G. Tao, S. An, A. Makur, S. Ma, and X. Zhang, “Exploring the orthogonality and linearity of backdoor attacks,” inIEEE S&P, 2024, pp. 2105–2123

  66. [66]

    Clean & compact: Efficient data-free backdoor defense with model compactness,

    H. Phan, J. Xiao, Y . Sui, T. Zhang, Z. Tang, C. Shi, Y . Wang, Y . Chen, and B. Yuan, “Clean & compact: Efficient data-free backdoor defense with model compactness,” inECCV, 2024