Fast and Lightweight Backdoor Detection via Head Random Probing

Chunwei Tian; Daoqiang Zhang; Jiajia Liu; Jing Fang; Qi Zhu; Xueyu Yin; Yinbo Yu

arxiv: 2605.18908 · v1 · pith:IB3HZG7Bnew · submitted 2026-05-17 · 💻 cs.CR · cs.AI· cs.LG

Fast and Lightweight Backdoor Detection via Head Random Probing

Yinbo Yu , Xueyu Yin , Jing Fang , Chunwei Tian , Qi Zhu , Jiajia Liu , Daoqiang Zhang This is my paper

Pith reviewed 2026-05-20 12:35 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords backdoor detectiondata-free detectionprediction head probingneural network auditingDNN securitypost-training detectionrandom latent probes

0 comments

The pith

Backdoored neural networks concentrate responses on the target class when random latent probes are sent directly into the prediction head.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HTell, a detector that identifies backdoors in already-trained deep neural networks without any clean data, surrogate data, gradients, or trigger reconstruction. It creates architecture-aware random latent probes and routes them straight into the prediction head, then measures whether responses concentrate abnormally on one class. Backdoored models reliably display this concentration while clean models do not. The method reports 99.03 percent true-positive rate and 2.11 percent false-positive rate on a benchmark of more than 6,000 backdoored models and over 700 clean ones spanning four datasets, 14 architectures, and 21 attack types. It runs in 12.69 milliseconds per model, more than 30,000 times faster than representative gradient-based detectors.

Core claim

HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics; backdoored models exhibit abnormal response concentration on the target class under these probes.

What carries the argument

Head random probing: random latent inputs fed only to the prediction head followed by class-wise response concentration analysis.

Load-bearing premise

Backdoored models exhibit abnormal response concentration on the target class under random latent probes to the prediction head.

What would settle it

A backdoored model that produces evenly distributed class responses instead of target-class concentration when the prediction head receives random latent probes would invalidate the detection rule.

Figures

Figures reproduced from arXiv: 2605.18908 by Chunwei Tian, Daoqiang Zhang, Jiajia Liu, Jing Fang, Qi Zhu, Xueyu Yin, Yinbo Yu.

**Figure 2.** Figure 2: T-SNE visualization of clean and poisoned latent features extracted by backdoored models and random backdoor probes. All backdoored [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Badnets with different patch triggers. attacks (TBT, HPT). In patch-based attacks, besides white patches, we also introduce 8 new patches with various patching locations, colors, and textures (see [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Detection accuracies under different σ of backdoor probes. The x-axis coordinate value represents multiples of |Snoise| max . D. Ablation and Sensitivity Analysis We further analyze key design choices of HTell as follows: 1) Probe Distribution Analysis: HTell employs either uniform or Gaussian probes according to the coarse latent activation range. To validate this design, we compare different probe dist… view at source ↗

**Figure 5.** Figure 5: Applying HTell to object detection and sequential decision [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Deep neural networks (DNNs) remain critically vulnerable to backdoor attacks. Existing post-training detectors often require clean or surrogate data, gradients, or iterative trigger reconstruction, leading to high computational costs and limited robustness under practical model-auditing scenarios. In this paper, we propose HTell, a fast and lightweight data-free backdoor detector based on head random probing. Instead of reconstructing diverse trigger patterns, HTell inspects their unified manifestation in the prediction head: backdoored models tend to exhibit abnormal response concentration on the target class under random latent probes. HTell generates architecture-aware random latent probes, feeds them directly into the model head, and detects backdoors by analyzing class-wise response statistics, without accessing real or surrogate data, model gradients, or parameter optimization. We evaluate HTell on a large-scale benchmark containing more than 6,000 backdoored models and over 700 clean models, covering 4 datasets, 14 architectures, and 21 types of backdoor attacks. HTell achieves 99.03% true positive rate and 2.11% false positive rate with only 12.69 ms/model detection latency, reducing the time cost by over 30,000$\times$ compared with representative gradient-based detectors. These results demonstrate that head random probing provides an accurate, robust, and efficient solution for large-scale data-free backdoor model auditing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HTell gives a fast data-free backdoor detector by random-probing the prediction head for target-class concentration, with strong scale and speed on their benchmark but the core assumption needs checking across attacks.

read the letter

HTell detects backdoors by sending architecture-aware random vectors into the prediction head and flagging abnormal concentration on the target class in the responses. That single idea lets it skip data, gradients, and trigger search entirely, which is the main practical advance here. The paper shows this works at 99% true positive and 2% false positive on more than 6000 backdoored models plus 700 clean ones, across 4 datasets, 14 architectures, and 21 attack types, all at roughly 13 ms per model. That is a real operational win over slower gradient-based detectors. The evaluation size and the reported speedup stand out as solid evidence that the method scales for auditing tasks. The approach itself is new in its focus on head-level statistics from random probes rather than reconstruction or surrogate data. The results look consistent enough on the reported benchmark to support the claim for the tested cases. The soft spot is the assumption that every backdoor attack produces this detectable head-level bias even for non-trigger random inputs. Their numbers suggest it holds across the 21 attacks they tried, but without per-attack breakdowns or failure-case analysis it is hard to tell how universal the signal really is or whether the probe distribution and thresholds were tuned to the evaluation set. Minor transparency items like exact probe generation details would also help reproducibility. This paper is for security engineers and auditors who need a quick first filter on large model collections. A reader who cares about deployable detection tools will find usable numbers and a straightforward implementation path. It deserves peer review because the empirical scale is substantial and the method is simple enough to test or extend directly.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes HTell, a data-free backdoor detector that generates architecture-aware random latent probes, feeds them directly into the prediction head, and detects backdoors via class-wise response concentration statistics on the target label. It reports 99.03% true positive rate and 2.11% false positive rate on a benchmark of more than 6,000 backdoored models and over 700 clean models spanning 4 datasets, 14 architectures, and 21 attack types, with 12.69 ms/model latency (over 30,000× faster than gradient-based detectors).

Significance. If the central empirical observation holds—that backdoored models reliably exhibit detectable response concentration on the target class under random head probes across the evaluated attacks and architectures—the approach would represent a substantial practical advance for scalable, data-free model auditing in security-critical settings. The scale of the benchmark and the extreme efficiency are clear strengths that could enable large-scale deployment where existing methods are prohibitive.

major comments (3)

[§3] §3 (Head Random Probing): The claim that all 21 attack types produce a unified, detectable head-level bias (abnormal concentration on the target class for random non-trigger probes) is load-bearing for the general applicability, yet the manuscript provides no per-attack analysis or mechanistic explanation of why attacks primarily modifying earlier layers must induce this specific head statistic; without it, the 99.03% aggregate TPR may not generalize beyond the benchmark.
[§4.2] §4.2 (Evaluation): The reported TPR/FPR figures are aggregates only; absent a breakdown table by attack type or architecture showing uniform separation, it remains possible that a subset of the 21 attacks evades the concentration signal, undermining the cross-attack robustness asserted in the abstract.
[§3.2] §3.2 (Probe Generation and Threshold): The concentration metric and decision threshold are presented as fixed, but no sensitivity study to probe distribution parameters or threshold choice is reported; this leaves open whether the separation is an intrinsic property or partly an artifact of benchmark-specific tuning.

minor comments (2)

[Figure 3] Figure 3 (response distribution plots): axis labels and legend entries for the clean vs. backdoored histograms could be enlarged for readability.
The manuscript cites prior detectors but could add a short related-work paragraph explicitly contrasting HTell with other recent data-free or head-only methods.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback and for highlighting the strengths of our large-scale benchmark and the practical efficiency of HTell. We address each major comment below, proposing targeted revisions to improve clarity and robustness where appropriate.

read point-by-point responses

Referee: [§3] §3 (Head Random Probing): The claim that all 21 attack types produce a unified, detectable head-level bias (abnormal concentration on the target class for random non-trigger probes) is load-bearing for the general applicability, yet the manuscript provides no per-attack analysis or mechanistic explanation of why attacks primarily modifying earlier layers must induce this specific head statistic; without it, the 99.03% aggregate TPR may not generalize beyond the benchmark.

Authors: We agree that a per-attack breakdown would strengthen the presentation of cross-attack robustness. The manuscript's core contribution is the empirical demonstration that backdoored models exhibit this head-level concentration bias across the 21 evaluated attack types, supported by the aggregate results on over 6,000 models. A comprehensive mechanistic account of how every attack variant (including those primarily affecting earlier layers) propagates to produce this specific head statistic lies beyond the empirical scope of the current work. In the revision we will add a table reporting TPR/FPR per attack type to confirm consistency of the signal. revision: partial
Referee: [§4.2] §4.2 (Evaluation): The reported TPR/FPR figures are aggregates only; absent a breakdown table by attack type or architecture showing uniform separation, it remains possible that a subset of the 21 attacks evades the concentration signal, undermining the cross-attack robustness asserted in the abstract.

Authors: We accept this observation. While the aggregate metrics reflect strong overall performance, disaggregated results will better address potential concerns about non-uniform behavior. We will include a breakdown table by attack type and architecture in the revised manuscript. revision: yes
Referee: [§3.2] §3.2 (Probe Generation and Threshold): The concentration metric and decision threshold are presented as fixed, but no sensitivity study to probe distribution parameters or threshold choice is reported; this leaves open whether the separation is an intrinsic property or partly an artifact of benchmark-specific tuning.

Authors: The concentration statistic is computed directly from class-wise response distributions under architecture-aware random probes, and the threshold is calibrated on clean-model statistics to control FPR. We will add a sensitivity study in the revision examining variations in probe distribution parameters and threshold values to demonstrate that the separation is robust rather than benchmark-specific. revision: yes

standing simulated objections not resolved

Mechanistic explanation of why attacks that primarily modify earlier layers reliably induce the specific head-level response concentration on the target class

Circularity Check

0 steps flagged

No significant circularity; detection rests on observable empirical property.

full rationale

The paper's core claim is that backdoored models exhibit abnormal response concentration on the target class when random latent probes are fed to the prediction head. This property is presented as a unified manifestation observed across attacks, not derived by fitting parameters to the target detection result or by self-referential definition. HTell simply measures class-wise statistics on architecture-aware random probes without data, gradients, or optimization. The large-scale evaluation (6000+ backdoored models, 700+ clean models across 21 attacks and 14 architectures) serves as independent validation rather than a closed loop. No self-citation chains, uniqueness theorems, or ansatz smuggling appear in the derivation; the method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical premise that backdoor triggers produce a detectable concentration pattern at the head; no explicit free parameters, new physical entities, or additional axioms beyond standard neural-network assumptions are stated in the abstract.

axioms (1)

domain assumption Backdoored models exhibit abnormal response concentration on the target class under random latent probes
This premise is invoked to justify data-free detection without trigger reconstruction.

pith-pipeline@v0.9.0 · 5795 in / 1296 out tokens · 31990 ms · 2026-05-20T12:35:07.620785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 3 internal anchors

[1]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE TNNLS, vol. 35, no. 1, pp. 5–22, 2022

work page 2022
[2]

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE S&P, 2019, pp. 707–723

work page 2019
[3]

Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,

H. Wang, Z. Xiang, D. J. Miller, and G. Kesidis, “Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,” inIEEE S&P, 2024, pp. 1994–2012

work page 2024
[4]

Rethinking the reverse- engineering of trojan triggers,

Z. Wang, K. Mei, H. Ding, J. Zhai, and S. Ma, “Rethinking the reverse- engineering of trojan triggers,” vol. 35, pp. 9738–9753, 2022

work page 2022
[5]

Freeea- gle: Detecting complex neural trojans in data-free cases,

C. Fu, X. Zhang, S. Ji, T. Wang, P. Lin, Y . Feng, and J. Yin, “Freeea- gle: Detecting complex neural trojans in data-free cases,” inUSENIX Security, 2023, pp. 6399–6416

work page 2023
[6]

Barbie: Robust backdoor detection based on latent separability,

H. Zhang, Y . Bai, Y . Chen, Z. Ma, and W. Xu, “Barbie: Robust backdoor detection based on latent separability,” inNDSS, 2025

work page 2025
[7]

Detecting backdoor attacks on deep neural networks by activation clustering,

B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” inAAAI Workshop, 2019

work page 2019
[8]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” NeurIPS, vol. 31, 2018

work page 2018
[9]

Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,

J.-L. Yin, W. Wang, W. Lin, X. Liuet al., “Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,” inAAAI, vol. 39, no. 9, 2025, pp. 9508–9516

work page 2025
[10]

Need for speed: Taming backdoor attacks with speed and precision,

Z. Ma, Y . Yang, Y . Liu, T. Yang, X. Liu, T. Li, and Z. Qin, “Need for speed: Taming backdoor attacks with speed and precision,” inIEEE S&P, 2024, pp. 1217–1235

work page 2024
[11]

Test-time backdoor detection for object detection models,

H. Zhang, Y . Wang, S. Yan, C. Zhu, Z. Zhou, L. Hou, S. Hu, M. Li, Y . Zhang, and L. Y . Zhang, “Test-time backdoor detection for object detection models,” inCVPR, 2025, pp. 24 377–24 386. 12

work page 2025
[12]

Trojan signatures in dnn weights,

G. Fields, M. Samragh, M. Javaheripi, F. Koushanfar, and T. Javidi, “Trojan signatures in dnn weights,” inICCV, 2021, pp. 12–20

work page 2021
[13]

Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,

Q. Zhou, W. Luo, Z. Ye, and Y . Tang, “Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,” inIJCNN. IEEE, 2024, pp. 1–9

work page 2024
[14]

Data free backdoor attacks,

B. Cao, J. Jia, C. Hu, W. Guo, Z. Xiang, J. Chen, B. Li, and D. Song, “Data free backdoor attacks,”NeurIPS, vol. 37, pp. 23 881–23 911, 2024

work page 2024
[15]

Practical detection of trojan neural networks: Data-limited and data- free cases,

R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” inECCV, 2020, pp. 222–238

work page 2020
[16]

Tbt: Targeted neural network attack with bit trojan,

A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” inCPVR, 2020, pp. 13 198–13 207

work page 2020
[17]

Model x- ray: Detecting backdoored models via decision boundary,

Y . Su, J. Zhang, T. Xu, T. Zhang, W. Zhang, and N. Yu, “Model x- ray: Detecting backdoored models via decision boundary,” inACM MM, 2024, pp. 10 296–10 305

work page 2024
[18]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

H. Karimi, T. Derr, and J. Tang, “Characterizing the decision boundary of deep neural networks,”arXiv preprint arXiv:1912.11460, 2019

work page arXiv 1912
[19]

Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,

G. Somepalli, L. Fowl, A. Bansal, P. Yeh-Chiang, Y . Dar, R. Baraniuk, M. Goldblum, and T. Goldstein, “Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,” inCVPR, 2022, pp. 13 699–13 708

work page 2022
[20]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,”arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Revisiting the assumption of latent separability for backdoor defenses,

X. Qi, T. Xie, Y . Li, S. Mahloujifar, and P. Mittal, “Revisiting the assumption of latent separability for backdoor defenses,” inICLR, 2023

work page 2023
[22]

Lotus: Evasive and resilient backdoor attacks through sub-partitioning,

S. Cheng, G. Tao, Y . Liu, G. Shen, S. An, S. Feng, X. Xu, K. Zhang, S. Ma, and X. Zhang, “Lotus: Evasive and resilient backdoor attacks through sub-partitioning,” inCVPR, 2024, pp. 24 798–24 809

work page 2024
[23]

Input-aware dynamic backdoor attack,

T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” NeurIPS, vol. 33, pp. 3454–3464, 2020

work page 2020
[24]

Invisible backdoor attack with sample-specific triggers,

Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

work page 2021
[25]

Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,

Z. Wang, J. Zhai, and S. Ma, “Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,” inCVPR, 2022, pp. 15 074–15 084

work page 2022
[26]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

work page 2021
[27]

Hardly perceptible trojan attack against neural networks with bit flips,

J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly perceptible trojan attack against neural networks with bit flips,” in ECCV. Springer, 2022, pp. 104–121

work page 2022
[28]

Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,

J. Jia, Y . Liu, and N. Z. Gong, “Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,” inIEEE S&P, 2022, pp. 2043–2059

work page 2022
[29]

Distribution preserving backdoor attack in self-supervised learning,

G. Tao, Z. Wang, S. Feng, G. Shen, S. Ma, and X. Zhang, “Distribution preserving backdoor attack in self-supervised learning,” inIEEE S&P, 2024, pp. 2029–2047

work page 2024
[30]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,”arXiv preprint arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Wanet-imperceptible warping-based backdoor attack,

T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2020

work page 2020
[33]

Lira: Learnable, imperceptible and robust backdoor attacks,

K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

work page 2021
[34]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS, 2018

work page 2018
[35]

Rethinking the backdoor attacks’ triggers: A frequency perspective,

Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

work page 2021
[36]

A data-free backdoor injection approach in neural networks,

P. Lv, C. Yue, R. Liang, Y . Yang, S. Zhang, H. Ma, and K. Chen, “A data-free backdoor injection approach in neural networks,” inUSENIX Security, 2023, pp. 2671–2688

work page 2023
[37]

A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,

Y . Yu, J. Liu, H. Guo, B. Mao, and N. Kato, “A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,”IEEE JSAC, vol. 42, no. 4, pp. 948–962, 2024

work page 2024
[38]

Live trojan attacks on deep neural networks,

R. Costales, C. Mao, R. Norwitz, B. Kim, and J. Yang, “Live trojan attacks on deep neural networks,” inCVPR, 2020, pp. 796–797

work page 2020
[39]

Bit-flip attack: Crushing neural network with progressive bit search,

A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network with progressive bit search,” inICCV, 2019, pp. 1211–1220

work page 2019
[40]

A new backdoor attack in cnns by training set corruption without label poisoning,

M. Barni, K. Kallas, and B. Tondi, “A new backdoor attack in cnns by training set corruption without label poisoning,” inIEEE ICIP, 2019

work page 2019
[41]

Label-Consistent Backdoor Attacks, December 2019

A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor at- tacks,”arXiv preprint arXiv:1912.02771, 2019

work page arXiv 1912
[42]

Narcissus: A practical clean-label backdoor attack with limited information,

Y . Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia, “Narcissus: A practical clean-label backdoor attack with limited information,” inCCS, 2023, pp. 771–785

work page 2023
[43]

Backdoor defense via decoupling the training process,

K. Huang, Y . Li, B. Wu, Z. Qin, and K. Ren, “Backdoor defense via decoupling the training process,”arXiv preprint arXiv:2202.03423, 2022

work page arXiv 2022
[44]

Strip: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019, pp. 113–125

work page 2019
[45]

Februus: Input purification defense against trojan attacks on deep neural network systems,

B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” inACSAC, 2020, pp. 897–912

work page 2020
[46]

Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,

J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, and C. Liu, “Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,” inICLR, 2023

work page 2023
[47]

Detecting backdoors during the inference stage based on corruption robustness consistency,

X. Liu, M. Li, H. Wang, S. Hu, D. Ye, H. Jin, L. Wu, and C. Xiao, “Detecting backdoors during the inference stage based on corruption robustness consistency,” inCVPR, 2023, pp. 16 363–16 372

work page 2023
[48]

Detection of backdoors in trained classifiers without access to the training set,

Z. Xiang, D. J. Miller, and G. Kesidis, “Detection of backdoors in trained classifiers without access to the training set,”IEEE TNNLS, vol. 33, no. 3, pp. 1177–1191, 2020

work page 2020
[49]

Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,

D. Popovic, A. Sadeghi, T. Yu, S. Chawla, and I. Khalil, “Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,” inUSENIX Security, 2025

work page 2025
[50]

Abs: Scanning neural networks for back-doors by artificial brain stimulation,

Y . Liu, W.-C. Lee, G. Tao, S. Ma, Y . Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” inACM CCS, 2019, pp. 1265–1282

work page 2019
[51]

Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks

H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks.” in IJCAI, vol. 2, no. 5, 2019, p. 8

work page 2019
[52]

Detecting ai trojans using meta neural analysis,

X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting ai trojans using meta neural analysis,” inIEEE S&P, 2021, pp. 103–120

work page 2021
[53]

Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,

D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,” inUSENIX Security, 2021, pp. 1541–1558

work page 2021
[54]

Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,

R. Cai, Z. Zhang, T. Chen, X. Chen, and Z. Wang, “Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,” inNeurIPS, 2022, pp. 33 876–33 889

work page 2022
[55]

Universal litmus patterns: Revealing backdoor attacks in cnns,

S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” inCVPR, 2020, pp. 301– 310

work page 2020
[56]

Data-free backdoor removal based on channel lipschitzness,

R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free backdoor removal based on channel lipschitzness,” inECCV, 2022, pp. 175–191

work page 2022
[57]

Exploring the orthogonality and linearity of backdoor attacks,

K. Zhang, S. Cheng, G. Shen, G. Tao, S. An, A. Makur, S. Ma, and X. Zhang, “Exploring the orthogonality and linearity of backdoor attacks,” inIEEE S&P, 2024, pp. 2105–2123

work page 2024
[58]

Robust backdoor detection for deep learning via topological evolution dynamics,

X. Mo, Y . Zhang, L. Y . Zhang, W. Luo, N. Sun, S. Hu, S. Gao, and Y . Xiang, “Robust backdoor detection for deep learning via topological evolution dynamics,” inIEEE S&P. IEEE, 2024, pp. 2048–2066

work page 2024
[59]

Backdoorbench: A comprehensive benchmark of backdoor learning,

B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, and C. Shen, “Backdoorbench: A comprehensive benchmark of backdoor learning,” NeurIPS, vol. 35, pp. 10 546–10 559, 2022

work page 2022
[60]

A simple framework for contrastive learning of visual representations,

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inICML, 2020, pp. 1597–1607

work page 2020
[61]

Reading digits in natural images with unsupervised feature learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A. Y . Nget al., “Reading digits in natural images with unsupervised feature learning,” inNIPS workshop on deep learning and unsupervised feature learning, vol. 2011, no. 5, 2011, p. 7

work page 2011
[62]

Odscan: Backdoor scanning for object detection models,

S. Cheng, G. Shen, G. Tao, K. Zhang, Z. Zhang, S. An, X. Xu, Y . Li, S. Ma, and X. Zhang, “Odscan: Backdoor scanning for object detection models,” inIEEE S&P, 2024, pp. 1703–1721

work page 2024
[63]

A temporal-pattern backdoor attack to deep reinforcement learning,

Y . Yu, J. Liu, S. Li, K. Huang, and X. Feng, “A temporal-pattern backdoor attack to deep reinforcement learning,” inIEEE GLOBECOM, 2022, pp. 2710–2715

work page 2022
[64]

Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,

Y . Chen, Z. Zheng, and X. Gong, “Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,”IEEE TDSC, vol. 20, no. 5, pp. 4188–4198, 2022

work page 2022

[1] [1]

Backdoor learning: A survey,

Y . Li, Y . Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE TNNLS, vol. 35, no. 1, pp. 5–22, 2022

work page 2022

[2] [2]

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE S&P, 2019, pp. 707–723

work page 2019

[3] [3]

Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,

H. Wang, Z. Xiang, D. J. Miller, and G. Kesidis, “Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic,” inIEEE S&P, 2024, pp. 1994–2012

work page 2024

[4] [4]

Rethinking the reverse- engineering of trojan triggers,

Z. Wang, K. Mei, H. Ding, J. Zhai, and S. Ma, “Rethinking the reverse- engineering of trojan triggers,” vol. 35, pp. 9738–9753, 2022

work page 2022

[5] [5]

Freeea- gle: Detecting complex neural trojans in data-free cases,

C. Fu, X. Zhang, S. Ji, T. Wang, P. Lin, Y . Feng, and J. Yin, “Freeea- gle: Detecting complex neural trojans in data-free cases,” inUSENIX Security, 2023, pp. 6399–6416

work page 2023

[6] [6]

Barbie: Robust backdoor detection based on latent separability,

H. Zhang, Y . Bai, Y . Chen, Z. Ma, and W. Xu, “Barbie: Robust backdoor detection based on latent separability,” inNDSS, 2025

work page 2025

[7] [7]

Detecting backdoor attacks on deep neural networks by activation clustering,

B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” inAAAI Workshop, 2019

work page 2019

[8] [8]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” NeurIPS, vol. 31, 2018

work page 2018

[9] [9]

Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,

J.-L. Yin, W. Wang, W. Lin, X. Liuet al., “Adversarial-inspired backdoor defense via bridging backdoor and adversarial attacks,” inAAAI, vol. 39, no. 9, 2025, pp. 9508–9516

work page 2025

[10] [10]

Need for speed: Taming backdoor attacks with speed and precision,

Z. Ma, Y . Yang, Y . Liu, T. Yang, X. Liu, T. Li, and Z. Qin, “Need for speed: Taming backdoor attacks with speed and precision,” inIEEE S&P, 2024, pp. 1217–1235

work page 2024

[11] [11]

Test-time backdoor detection for object detection models,

H. Zhang, Y . Wang, S. Yan, C. Zhu, Z. Zhou, L. Hou, S. Hu, M. Li, Y . Zhang, and L. Y . Zhang, “Test-time backdoor detection for object detection models,” inCVPR, 2025, pp. 24 377–24 386. 12

work page 2025

[12] [12]

Trojan signatures in dnn weights,

G. Fields, M. Samragh, M. Javaheripi, F. Koushanfar, and T. Javidi, “Trojan signatures in dnn weights,” inICCV, 2021, pp. 12–20

work page 2021

[13] [13]

Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,

Q. Zhou, W. Luo, Z. Ye, and Y . Tang, “Data-free backdoor model inspection: Masking and reverse engineering loops for feature counting,” inIJCNN. IEEE, 2024, pp. 1–9

work page 2024

[14] [14]

Data free backdoor attacks,

B. Cao, J. Jia, C. Hu, W. Guo, Z. Xiang, J. Chen, B. Li, and D. Song, “Data free backdoor attacks,”NeurIPS, vol. 37, pp. 23 881–23 911, 2024

work page 2024

[15] [15]

Practical detection of trojan neural networks: Data-limited and data- free cases,

R. Wang, G. Zhang, S. Liu, P.-Y . Chen, J. Xiong, and M. Wang, “Practical detection of trojan neural networks: Data-limited and data- free cases,” inECCV, 2020, pp. 222–238

work page 2020

[16] [16]

Tbt: Targeted neural network attack with bit trojan,

A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” inCPVR, 2020, pp. 13 198–13 207

work page 2020

[17] [17]

Model x- ray: Detecting backdoored models via decision boundary,

Y . Su, J. Zhang, T. Xu, T. Zhang, W. Zhang, and N. Yu, “Model x- ray: Detecting backdoored models via decision boundary,” inACM MM, 2024, pp. 10 296–10 305

work page 2024

[18] [18]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

H. Karimi, T. Derr, and J. Tang, “Characterizing the decision boundary of deep neural networks,”arXiv preprint arXiv:1912.11460, 2019

work page arXiv 1912

[19] [19]

Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,

G. Somepalli, L. Fowl, A. Bansal, P. Yeh-Chiang, Y . Dar, R. Baraniuk, M. Goldblum, and T. Goldstein, “Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective,” inCVPR, 2022, pp. 13 699–13 708

work page 2022

[20] [20]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,”arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Revisiting the assumption of latent separability for backdoor defenses,

X. Qi, T. Xie, Y . Li, S. Mahloujifar, and P. Mittal, “Revisiting the assumption of latent separability for backdoor defenses,” inICLR, 2023

work page 2023

[22] [22]

Lotus: Evasive and resilient backdoor attacks through sub-partitioning,

S. Cheng, G. Tao, Y . Liu, G. Shen, S. An, S. Feng, X. Xu, K. Zhang, S. Ma, and X. Zhang, “Lotus: Evasive and resilient backdoor attacks through sub-partitioning,” inCVPR, 2024, pp. 24 798–24 809

work page 2024

[23] [23]

Input-aware dynamic backdoor attack,

T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” NeurIPS, vol. 33, pp. 3454–3464, 2020

work page 2020

[24] [24]

Invisible backdoor attack with sample-specific triggers,

Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

work page 2021

[25] [25]

Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,

Z. Wang, J. Zhai, and S. Ma, “Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and con- trastive adversarial learning,” inCVPR, 2022, pp. 15 074–15 084

work page 2022

[26] [26]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” inUSENIX Security, 2021, pp. 1505–1521

work page 2021

[27] [27]

Hardly perceptible trojan attack against neural networks with bit flips,

J. Bai, K. Gao, D. Gong, S.-T. Xia, Z. Li, and W. Liu, “Hardly perceptible trojan attack against neural networks with bit flips,” in ECCV. Springer, 2022, pp. 104–121

work page 2022

[28] [28]

Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,

J. Jia, Y . Liu, and N. Z. Gong, “Badencoder: Backdoor attacks to pre- trained encoders in self-supervised learning,” inIEEE S&P, 2022, pp. 2043–2059

work page 2022

[29] [29]

Distribution preserving backdoor attack in self-supervised learning,

G. Tao, Z. Wang, S. Feng, G. Shen, S. Ma, and X. Zhang, “Distribution preserving backdoor attack in self-supervised learning,” inIEEE S&P, 2024, pp. 2029–2047

work page 2024

[30] [30]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5 mb model size,”arXiv preprint arXiv:1602.07360, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

Wanet-imperceptible warping-based backdoor attack,

T. A. Nguyen and A. T. Tran, “Wanet-imperceptible warping-based backdoor attack,” inICLR, 2020

work page 2020

[33] [33]

Lira: Learnable, imperceptible and robust backdoor attacks,

K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

work page 2021

[34] [34]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS, 2018

work page 2018

[35] [35]

Rethinking the backdoor attacks’ triggers: A frequency perspective,

Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

work page 2021

[36] [36]

A data-free backdoor injection approach in neural networks,

P. Lv, C. Yue, R. Liang, Y . Yang, S. Zhang, H. Ma, and K. Chen, “A data-free backdoor injection approach in neural networks,” inUSENIX Security, 2023, pp. 2671–2688

work page 2023

[37] [37]

A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,

Y . Yu, J. Liu, H. Guo, B. Mao, and N. Kato, “A spatiotemporal backdoor attack against behavior-oriented decision makers in metaverse: From perspective of autonomous driving,”IEEE JSAC, vol. 42, no. 4, pp. 948–962, 2024

work page 2024

[38] [38]

Live trojan attacks on deep neural networks,

R. Costales, C. Mao, R. Norwitz, B. Kim, and J. Yang, “Live trojan attacks on deep neural networks,” inCVPR, 2020, pp. 796–797

work page 2020

[39] [39]

Bit-flip attack: Crushing neural network with progressive bit search,

A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network with progressive bit search,” inICCV, 2019, pp. 1211–1220

work page 2019

[40] [40]

A new backdoor attack in cnns by training set corruption without label poisoning,

M. Barni, K. Kallas, and B. Tondi, “A new backdoor attack in cnns by training set corruption without label poisoning,” inIEEE ICIP, 2019

work page 2019

[41] [41]

Label-Consistent Backdoor Attacks, December 2019

A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor at- tacks,”arXiv preprint arXiv:1912.02771, 2019

work page arXiv 1912

[42] [42]

Narcissus: A practical clean-label backdoor attack with limited information,

Y . Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia, “Narcissus: A practical clean-label backdoor attack with limited information,” inCCS, 2023, pp. 771–785

work page 2023

[43] [43]

Backdoor defense via decoupling the training process,

K. Huang, Y . Li, B. Wu, Z. Qin, and K. Ren, “Backdoor defense via decoupling the training process,”arXiv preprint arXiv:2202.03423, 2022

work page arXiv 2022

[44] [44]

Strip: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019, pp. 113–125

work page 2019

[45] [45]

Februus: Input purification defense against trojan attacks on deep neural network systems,

B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” inACSAC, 2020, pp. 897–912

work page 2020

[46] [46]

Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,

J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, and C. Liu, “Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency,” inICLR, 2023

work page 2023

[47] [47]

Detecting backdoors during the inference stage based on corruption robustness consistency,

X. Liu, M. Li, H. Wang, S. Hu, D. Ye, H. Jin, L. Wu, and C. Xiao, “Detecting backdoors during the inference stage based on corruption robustness consistency,” inCVPR, 2023, pp. 16 363–16 372

work page 2023

[48] [48]

Detection of backdoors in trained classifiers without access to the training set,

Z. Xiang, D. J. Miller, and G. Kesidis, “Detection of backdoors in trained classifiers without access to the training set,”IEEE TNNLS, vol. 33, no. 3, pp. 1177–1191, 2020

work page 2020

[49] [49]

Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,

D. Popovic, A. Sadeghi, T. Yu, S. Chawla, and I. Khalil, “Debackdoor: A deductive framework for detecting backdoor attacks on deep models with limited data,” inUSENIX Security, 2025

work page 2025

[50] [50]

Abs: Scanning neural networks for back-doors by artificial brain stimulation,

Y . Liu, W.-C. Lee, G. Tao, S. Ma, Y . Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” inACM CCS, 2019, pp. 1265–1282

work page 2019

[51] [51]

Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks

H. Chen, C. Fu, J. Zhao, and F. Koushanfar, “Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks.” in IJCAI, vol. 2, no. 5, 2019, p. 8

work page 2019

[52] [52]

Detecting ai trojans using meta neural analysis,

X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting ai trojans using meta neural analysis,” inIEEE S&P, 2021, pp. 103–120

work page 2021

[53] [53]

Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,

D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,” inUSENIX Security, 2021, pp. 1541–1558

work page 2021

[54] [54]

Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,

R. Cai, Z. Zhang, T. Chen, X. Chen, and Z. Wang, “Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets,” inNeurIPS, 2022, pp. 33 876–33 889

work page 2022

[55] [55]

Universal litmus patterns: Revealing backdoor attacks in cnns,

S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” inCVPR, 2020, pp. 301– 310

work page 2020

[56] [56]

Data-free backdoor removal based on channel lipschitzness,

R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free backdoor removal based on channel lipschitzness,” inECCV, 2022, pp. 175–191

work page 2022

[57] [57]

Exploring the orthogonality and linearity of backdoor attacks,

K. Zhang, S. Cheng, G. Shen, G. Tao, S. An, A. Makur, S. Ma, and X. Zhang, “Exploring the orthogonality and linearity of backdoor attacks,” inIEEE S&P, 2024, pp. 2105–2123

work page 2024

[58] [58]

Robust backdoor detection for deep learning via topological evolution dynamics,

X. Mo, Y . Zhang, L. Y . Zhang, W. Luo, N. Sun, S. Hu, S. Gao, and Y . Xiang, “Robust backdoor detection for deep learning via topological evolution dynamics,” inIEEE S&P. IEEE, 2024, pp. 2048–2066

work page 2024

[59] [59]

Backdoorbench: A comprehensive benchmark of backdoor learning,

B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, and C. Shen, “Backdoorbench: A comprehensive benchmark of backdoor learning,” NeurIPS, vol. 35, pp. 10 546–10 559, 2022

work page 2022

[60] [60]

A simple framework for contrastive learning of visual representations,

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” inICML, 2020, pp. 1597–1607

work page 2020

[61] [61]

Reading digits in natural images with unsupervised feature learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A. Y . Nget al., “Reading digits in natural images with unsupervised feature learning,” inNIPS workshop on deep learning and unsupervised feature learning, vol. 2011, no. 5, 2011, p. 7

work page 2011

[62] [62]

Odscan: Backdoor scanning for object detection models,

S. Cheng, G. Shen, G. Tao, K. Zhang, Z. Zhang, S. An, X. Xu, Y . Li, S. Ma, and X. Zhang, “Odscan: Backdoor scanning for object detection models,” inIEEE S&P, 2024, pp. 1703–1721

work page 2024

[63] [63]

A temporal-pattern backdoor attack to deep reinforcement learning,

Y . Yu, J. Liu, S. Li, K. Huang, and X. Feng, “A temporal-pattern backdoor attack to deep reinforcement learning,” inIEEE GLOBECOM, 2022, pp. 2710–2715

work page 2022

[64] [64]

Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,

Y . Chen, Z. Zheng, and X. Gong, “Marnet: Backdoor attacks against cooperative multi-agent reinforcement learning,”IEEE TDSC, vol. 20, no. 5, pp. 4188–4198, 2022

work page 2022