DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning

Fen Xiao; Weiping Wen; Wenhan Yao; Xiarun Chen; Yueming Huang

arxiv: 2607.01729 · v1 · pith:HVCZATWDnew · submitted 2026-07-02 · 💻 cs.AI · cs.SD

DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning

Yueming Huang , Wenhan Yao , Fen Xiao , Xiarun Chen , Weiping Wen This is my paper

Pith reviewed 2026-07-03 14:11 UTC · model grok-4.3

classification 💻 cs.AI cs.SD

keywords clean label backdoor attackspeech classificationreinforcement learningDDPGaudio steganographylatent space optimizationbackdoor defensemodel poisoning

0 comments

The pith

Reinforcement learning plants clean-label backdoors in speech classifiers that survive fine-tuning and pruning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a backdoor attack method for speech classification that avoids changing labels on any poisoned training samples. It hides triggers inside audio using steganography and then applies a DDPG reinforcement learning agent to shift target samples toward those triggers inside the model's latent feature space. This produces poisoned data that trains the model to respond to the triggers while keeping labels correct, so standard label checks miss it. If the approach holds, it indicates that speech systems can be compromised during training in ways that current defenses do not catch.

Core claim

DRL-CLBA embeds sample-specific triggers via deep audio steganography and uses DDPG to optimize target samples toward trigger-bearing anchor points in deep latent space, enabling label-migration-free poisoning that achieves high attack success rates across three datasets and four DNNs while resisting fine-tuning, pruning, and spectral signature defenses.

What carries the argument

DDPG reinforcement learning framework that optimizes target samples toward trigger-bearing anchor points in the model's deep latent space.

If this is right

The attack reaches high success rates on multiple speech datasets and model architectures.
It bypasses defenses that rely on detecting mismatched labels in the training set.
The backdoor remains effective after fine-tuning, pruning, and spectral signature removal.
Speech-controlled systems become vulnerable to training-time poisoning that evades manual inspection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-space alignment idea could be tested on image or text models by replacing the audio steganography step with an equivalent embedding method.
Defenses might need to inspect alignments between samples in feature space during training rather than relying only on labels or input statistics.
Applying the attack to real-world voice assistant training pipelines would reveal whether the reported resistance holds outside controlled datasets.

Load-bearing premise

The reinforcement learning agent can reliably move target audio samples to match hidden trigger features in latent space so the backdoor forms without any label changes.

What would settle it

Training a model on data poisoned by the described method and finding that inputs containing the steganographic trigger produce the target class at rates no higher than unpoisoned baseline models.

Figures

Figures reproduced from arXiv: 2607.01729 by Fen Xiao, Weiping Wen, Wenhan Yao, Xiarun Chen, Yueming Huang.

**Figure 2.** Figure 2: Feature collisions in H src trg . Consequently, the DNN is compelled to learn the following mapping: h(ξ(x s i , δtrg)) ≈ h( ˆx t i ). Thus, the model consequently misidentifies the predicted labels of source-class samples embedded with triggers as highly similar to the optimized target-class labels in the deep latent space. Based on this property, an attacker can exploit the optimized target-class sam… view at source ↗

**Figure 3.** Figure 3: The DRL-CLBA attack pipeline By penalizing the distance to the anchor features and the original target sample while rewarding correct target class predictions ytar, the reward function enables the agent to learn sophisticated attack strategies that are often difficult to optimize using traditional gradient descent. Action and Value Network. The attack strategy employs a Deep Deterministic Policy Gradien… view at source ↗

**Figure 4.** Figure 4: Impact of different poisoning rates on DRL-CLBA tack effectiveness and stealth while demonstrating strong crossmodel generalization [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of different target labels on DRL-CLBA Impact of Target Labels. To assess generality, five target labels were randomly selected on SCD. ASR varies notably across targets, peaking at 89.6% for Left and dropping to 78.9% for No, reflecting differences in class feature separability and alignment with model decision boundaries. These results indicate that target label selection plays a critical role in… view at source ↗

**Figure 7.** Figure 7: Resistance of DRL-CLBA to model pruning defenses. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Resistance of DRL-CLBA to STRIP defenses. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 6.** Figure 6: Resistance of DRL-CLBA to fine-tuning defenses. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 9.** Figure 9: t-SNE visualization of the model attacked by DRL-CLBA [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

read the original abstract

Deep learning models for speech classification are vulnerable to backdoor attacks, where malicious triggers cause misclassification at inference time. While sample-specific attacks can bypass many defenses, they often rely on poisoned label attack, making them detectable via manual data defense. In this paper, we propose DRL-CLBA, a novel clean label backdoor attack for speech classification that leverages Deep Deterministic Policy Gradient (DDPG) reinforcement learning. We also utilize deep audio steganography to embed sample-specific triggers into source audio, creating feature-space anchors. The proposed reinforcement learning framework effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning of target samples. Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses. The attack demonstrates strong resistance against fine-tuning, pruning, and spectral signature defenses, exposing critical vulnerabilities in speech-controlled systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies DDPG reinforcement learning plus audio steganography to build a clean-label backdoor attack on speech classifiers, but the abstract supplies no numbers or ablations so the actual performance remains unverified.

read the letter

The main point is that the authors describe a clean-label backdoor attack on speech models that uses DDPG to optimize sample-specific triggers embedded through deep audio steganography. The goal is to move target samples toward trigger-bearing points in latent space without flipping their labels, which should help it slip past label-based checks.

What is new here is the direct application of this RL-plus-steganography combination to the speech domain. Prior clean-label attacks exist in vision, and RL has been used for trigger optimization elsewhere, but the paper positions the latent-space anchor optimization as a way to make the poisoning work for audio without label migration.

The paper does a reasonable job framing the practical stakes for voice-controlled systems and lists resistance to fine-tuning, pruning, and spectral signature as outcomes. Covering three datasets and four DNNs is a standard experimental scope for this kind of work.

The soft spot is the complete absence of any quantitative results, error bars, or ablation numbers in the abstract. Claims of high attack success and defense evasion are stated but not supported with data, so it is impossible to tell whether the RL component actually contributes beyond the steganography or whether the defense resistance holds under standard evaluation. That leaves the central assumption about effective latent-space optimization untested from the outside.

This is the sort of paper security researchers who track backdoor attacks in audio would want to read. A reader already familiar with the backdoor literature could extract the method sketch and decide whether to replicate. It deserves a serious referee because the topic matters and the approach is coherent on its own terms, even if the current write-up is thin on evidence. I would send it to review so the experiments can be checked directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes DRL-CLBA, a clean-label backdoor attack on speech classification models. It uses Deep Deterministic Policy Gradient (DDPG) reinforcement learning to optimize target samples toward trigger-bearing anchor points in the model's latent space, combined with deep audio steganography to embed sample-specific triggers without requiring label changes. The central claim is that this achieves high attack success rates across three datasets and four DNNs while resisting fine-tuning, pruning, and spectral signature defenses.

Significance. If the results hold with proper validation, the work would highlight a novel RL-driven approach to clean-label attacks in the audio domain, potentially exposing vulnerabilities in speech-controlled systems and motivating new defenses. The integration of DDPG for latent-space optimization and steganography represents a methodological contribution worth exploring further.

major comments (2)

[Abstract] Abstract: The central claim that 'Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses' is unsupported by any quantitative values, tables, figures, error bars, dataset details, or ablation studies in the manuscript.
[Abstract] The description of the DDPG framework 'effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning' lacks any equations, algorithm details, loss functions, or state/action definitions, which is load-bearing for assessing the weakest assumption and overall soundness.

minor comments (1)

The manuscript would benefit from adding the full experimental section, including specific ASR percentages, defense evasion rates, hyperparameter settings for DDPG, and comparisons to prior backdoor attacks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract accordingly to strengthen clarity while preserving the manuscript's existing content.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses' is unsupported by any quantitative values, tables, figures, error bars, dataset details, or ablation studies in the manuscript.

Authors: The manuscript body (Section 4) contains the supporting experimental results, including tables reporting attack success rates, figures on defense resistance, dataset details in Section 3.1, and ablation studies. The abstract summarizes these findings at a high level. To directly address the comment, we will revise the abstract to include key quantitative highlights drawn from the existing results in the paper. revision: yes
Referee: [Abstract] The description of the DDPG framework 'effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning' lacks any equations, algorithm details, loss functions, or state/action definitions, which is load-bearing for assessing the weakest assumption and overall soundness.

Authors: The abstract is a concise summary; the full DDPG framework details—including state/action definitions, reward function, algorithm, and loss functions—are provided in Section 3.2 with equations and pseudocode. We will revise the abstract to briefly reference these components for improved accessibility without altering the technical content. revision: yes

Circularity Check

0 steps flagged

No circularity detectable; abstract-only text supplies no equations or derivations

full rationale

The supplied document consists solely of the abstract, which describes a DDPG-based clean-label attack at a conceptual level but contains no equations, optimization derivations, self-citations, fitted parameters presented as predictions, or any other load-bearing steps that could be inspected for circularity. Without access to methods, results, or explicit chains of reasoning, no reduction of outputs to inputs by construction can be exhibited. This is the default honest non-finding when the paper text provides no verifiable derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all fields left empty due to insufficient information.

pith-pipeline@v0.9.1-grok · 5708 in / 1069 out tokens · 25621 ms · 2026-07-03T14:11:39.297696+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 6 canonical work pages · 2 internal anchors

[1]

A. Berg, M. O’Connor, M. T. Cruz, Keyword transformer: A self-attention model for keyword spotting, in: Inter- speech 2021, ISCA, 2021, pp. 4249–4253

2021
[2]

Ecapa- tdnn: Emphasized channel attention, propagation and ag- gregation in tdnn based speaker verification,

B. Desplanques, J. Thienpondt, K. Demuynck, Ecapa- tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification, arXiv preprint arXiv:2005.07143 (2020)

work page arXiv 2005
[3]

Y . Li, Y . Jiang, Z. Li, S.-T. Xia, Backdoor learning: A survey, IEEE transactions on neural networks and learning systems 35 (1) (2022) 5–22

2022
[4]

T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: Eval- uating backdooring attacks on deep neural networks, Ieee Access 7 (2019) 47230–47244

2019
[5]

Y . Liu, X. Ma, J. Bailey, F. Lu, Reflection backdoor: A natural backdoor attack on deep neural networks, in: Eu- ropean Conference on Computer Vision, Springer, 2020, pp. 182–199

2020
[6]

H. A. A. K. Hammoud, B. Ghanem, Check your other door! creating backdoor attacks in the frequency domain, in: 33rd British Machine Vision Conference Proceedings, BMVC 2022, 2022

2022
[7]

X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, Y . Zhang, Badnl: Backdoor attacks against nlp models with semantic-preserving improvements, in: Pro- ceedings of the 37th Annual Computer Security Applica- tions Conference, 2021, pp. 554–569

2021
[8]

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y . Zhao, Neural cleanse: Identifying and mitigating backdoor attacks in neural networks, in: 2019 IEEE sym- posium on security and privacy (SP), IEEE, 2019, pp. 707–723

2019
[9]

Z. Shen, W. Hou, Y . Li, Cssba: A clean label sample- specific backdoor attack, in: 2023 IEEE International Conference on Image Processing (ICIP), IEEE, 2023, pp. 965–969

2023
[10]

H. Cai, P. Zhang, Y . Xiao, S. Ji, M. Xiao, L. Cheng, Clean- label backdoor attack based on robust feature attenuation for speech recognition, Expert Systems with Applications (2025) 127546

2025
[11]

URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11

360 Digital Security Group, Security vulnerability report on large language models: A comprehensive study from the perspective of real-world vulnerabilities, Technical report, 360 Digital Security Group, Beijing (2024). URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11

2024
[12]

A Small Number of Samples Can Poison LLMs of Any Size

A. Souly, J. Rando, E. Chapman, X. Davies, B. Hasir- cioglu, E. Shereen, C. Mougan, V . Mavroudis, E. Jones, C. Hicks, et al., Poisoning attacks on llms require a near-constant number of poison samples, arXiv preprint arXiv:2510.07192 (2025)

work page arXiv 2025
[13]

R. Wang, H. Chen, Z. Zhu, L. Liu, B. Wu, Versatile back- door attack with visible, semantic, sample-specific and compatible triggers, IEEE Transactions on Pattern Analy- sis and Machine Intelligence (2025)

2025
[14]

X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep learning systems using data poisoning, in: arXiv preprint arXiv:1712.05526, 2017, presented at the Workshop on Machine Learning and Computer Security

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Jiang, H

W. Jiang, H. Li, G. Xu, H. Ren, H. Yang, T. Zhang, S. Yu, Rethinking the design of backdoor triggers and adversarial perturbations: A color space perspective, IEEE Transac- tions on Dependable and Secure Computing 22 (3) (2024) 2823–2840

2024
[16]

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, S. Nepal, Strip: A defence against trojan attacks on deep neural networks, in: Proceedings of the 35th annual com- puter security applications conference, 2019, pp. 113– 125

2019
[17]

K. Liu, B. Dolan-Gavitt, S. Garg, Fine-pruning: Defend- ing against backdooring attacks on deep neural networks, in: International symposium on research in attacks, intru- sions, and defenses, Springer, 2018, pp. 273–294

2018
[18]

M. Xue, C. He, J. Wang, W. Liu, One-to-n & n-to- one: Two advanced backdoor attacks against deep learn- ing models, IEEE Transactions on Dependable and Secure Computing 19 (3) (2020) 1562–1578

2020
[19]

K. D. Doan, Y . Lao, P. Li, Marksman backdoor: Backdoor attacks with arbitrary target class, Advances in Neural In- formation Processing Systems 35 (2022) 38260–38273

2022
[20]

Salem, R

A. Salem, R. Wen, M. Backes, S. Ma, Y . Zhang, Dy- namic backdoor attacks against machine learning models, in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), IEEE, 2022, pp. 703–718

2022
[21]

L. Hou, Z. Hua, Y . Li, Y . Zheng, L. Y . Zhang, M-to-n backdoor paradigm: A multi-trigger and multi-target at- tack to deep learning models, IEEE Transactions on Cir- cuits and Systems for Video Technology 34 (11) (2024) 11299–11312

2024
[22]

Y . Li, X. Ma, J. He, H. Huang, Y .-G. Jiang, Multi-trigger backdoor attacks: More triggers, more threats, CoRR (2024)

2024
[23]

X. Gong, B. Tian, M. Xue, S. Li, Y . Chen, Q. Wang, Megatron: Evasive clean-label backdoor attacks against vision transformer, IEEE Transactions on Dependable and Secure Computing (2025)

2025
[24]

Y . Wang, H. Li, L. Zhang, Y . Hu, A. C. Kot, Clean-label attack on face authentication systems through rolling shut- ter mechanism, IEEE Signal Processing Letters 32 (2024) 36–40

2024
[25]

H. L. Xinyuan, S. Joshi, T. Thebaud, J. Villalba, N. De- hak, S. Khudanpur, Clean label attacks against slu sys- tems, in: 2024 IEEE Spoken Language Technology Work- shop (SLT), IEEE, 2024, pp. 1107–1114

2024
[26]

Y . Yin, H. Chen, Y . Gao, P. Sun, L. Wu, Z. Li, W. Liu, Ffcba: Feature-based full-target clean-label backdoor at- tacks, in: Proceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 3884–3892

2025
[27]

M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, Z. Qin, Towards sample-specific backdoor attack with clean labels via at- tribute trigger, IEEE Transactions on Dependable and Se- cure Computing (2025)

2025
[28]

W. You, D. Lowd, The ultimate cookbook for invisible poison: Crafting subtle clean-label text backdoors with style attributes, in: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), IEEE, 2025, pp. 222–246

2025
[29]

L. Xie, P. Kang, H. Yang, J. Hu, Clean label backdoor attack based on feature distance guided sample selection and noise optimization, in: 2025 IEEE 24th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, 2025, pp. 950– 957

2025
[30]

C. Yuan, J. Bai, S. Yuan, N. Wei, Stealthy and effec- tive clean-label backdoor attack via adaptive frequency- domain suppression and trigger combination, IEEE Trans- actions on Information Forensics and Security (2025)

2025
[31]

Z. Wu, H. Li, D. Wu, S. Pang, Clear: A clean-label backdoor attack via representation-guided trigger embed- ding, in: 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, 2025, pp. 5534– 5539

2025
[32]

Y . Tang, X. Xu, L. Sun, Cats: Clean-label backdoor at- tack on speech recognition via speech synthesis, Journal of Systems Architecture (2025) 103596

2025
[33]

Zhang, S

C. Zhang, S. Sun, J. Tu, X. Chen, D. Wang, Clean-label backdoor attack via sample-customized feature alignment, Expert Systems with Applications (2025) 129481

2025
[34]

S. Choi, S. Seo, B. Shin, H. Byun, M. Kersner, B. Kim, D. Kim, S. Ha, Temporal convolution for real-time key- word spotting on mobile devices, in: Proc. Interspeech 2019, 2019, pp. 3372–3376

2019
[35]

Huang, T

L. Huang, T. Yuan, Y . Liang, Z. Chen, C. Wen, Y . Xie, J. Zhang, D. Ke, Limi-vc: A light weight voice con- version model with mutual information disentanglement, in: ICASSP 2023-2023 IEEE International Conference 12 on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2023, pp. 1–5

2023
[36]

Bartoli, T

P. Bartoli, T. Bondini, C. Veronesi, A. Giudici, N. An- tonello, F. Zappa, et al., End-to-end efficiency in keyword spotting: a system-level approach for embedded micro- controllers, in: Proceedings of IEEE Sensors 2025, 2025, pp. 1–4

2025
[37]

Y . Xi, H. Li, H. Li, J. Guo, X. Li, W. Ding, K. Yu, Ntc-kws: Noise-aware ctc for robust keyword spotting, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2025, pp. 1–5

2025
[38]

X. Ge, X. Zhang, M. Sun, Y . Wang, L. Li, K. SongGong, Cross-domain redundancy exploration by a deep encoder– decoder network for speech steganography, Journal of In- formation Security and Applications 93 (2025) 104150

2025
[39]

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

P. Warden, Speech commands: A dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Becker, J

S. Becker, J. Vielhaben, M. Ackermann, K.-R. Müller, S. Lapuschkin, W. Samek, Audiomnist: Exploring ex- plainable artificial intelligence for audio analysis on a sim- ple benchmark, Journal of the Franklin Institute 361 (1) (2024) 418–428

2024
[41]

Y . Xi, H. Li, B. Yang, H. Li, H. Xu, K. Yu, Tdt-kws: Fast and accurate keyword spotting using token-and-duration transducer, in: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2024, pp. 11351–11355

2024
[42]

Nagrani, J

A. Nagrani, J. S. Chung, A. Zisserman, V oxceleb: A large-scale speaker identification dataset, Interspeech 2017 (2017) 2616

2017
[43]

K. Zhou, B. Sisman, R. Liu, H. Li, Emotional voice con- version: Theory, databases and esd, Speech Communica- tion 137 (2022) 1–18

2022
[44]

Y . Chen, S. Zheng, H. Wang, L. Cheng, Q. Chen, J. Qi, An enhanced res2net with local and global feature fusion for speaker verification, in: INTERSPEECH, 2023

2023
[45]

Gazneli, G

A. Gazneli, G. Zimerman, T. Ridnik, G. Sharir, A. Noy, End-to-end audio strikes back: Boosting augmentations towards an efficient audio classification network, arXiv preprint arXiv:2204.11479 (2022)

work page arXiv 2022
[46]

H. Wang, S. Zheng, Y . Chen, L. Cheng, Q. Chen, Cam++: A fast and efficient network for speaker verification using context-aware masking, arXiv preprint arXiv:2303.00332 (2023)

work page arXiv 2023
[47]

Koffas, J

S. Koffas, J. Xu, M. Conti, S. Picek, Can you hear it? backdoor attacks via ultrasonic triggers, in: Proceedings of the 2022 ACM workshop on wireless security and ma- chine learning, 2022, pp. 57–62

2022
[48]

T. Zhai, Y . Li, Z. Zhang, B. Wu, Y . Jiang, S.-T. Xia, Backdoor attack against speaker verification, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 2560–2564

2021
[49]

Turner, D

A. Turner, D. Tsipras, A. Madry, Clean-label backdoor attacks (2018)

2018
[50]

J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, C. Liu, Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency, in: ICLR, 2023

2023
[51]

Xiang, Z

Z. Xiang, Z. Xiong, B. Li, Umd: Unsupervised model de- tection for x2x backdoor attacks, in: International Con- ference on Machine Learning, PMLR, 2023, pp. 38013– 38038

2023
[52]

N. M. Jebreel, J. Domingo-Ferrer, Y . Li, Defending against backdoor attacks by layer-wise feature analysis, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2023, pp. 428–440

2023
[53]

Y . Liu, Y . Xie, A. Srivastava, Neural trojans, in: 2017 IEEE international conference on computer design (ICCD), IEEE, 2017, pp. 45–48

2017
[54]

B. Tran, J. Li, A. Madry, Spectral signatures in backdoor attacks, Advances in neural information processing sys- tems 31 (2018)

2018
[55]

L. v. d. Maaten, G. Hinton, Visualizing data using t- sne, Journal of machine learning research 9 (Nov) (2008) 2579–2605. 13

2008

[1] [1]

A. Berg, M. O’Connor, M. T. Cruz, Keyword transformer: A self-attention model for keyword spotting, in: Inter- speech 2021, ISCA, 2021, pp. 4249–4253

2021

[2] [2]

Ecapa- tdnn: Emphasized channel attention, propagation and ag- gregation in tdnn based speaker verification,

B. Desplanques, J. Thienpondt, K. Demuynck, Ecapa- tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification, arXiv preprint arXiv:2005.07143 (2020)

work page arXiv 2005

[3] [3]

Y . Li, Y . Jiang, Z. Li, S.-T. Xia, Backdoor learning: A survey, IEEE transactions on neural networks and learning systems 35 (1) (2022) 5–22

2022

[4] [4]

T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: Eval- uating backdooring attacks on deep neural networks, Ieee Access 7 (2019) 47230–47244

2019

[5] [5]

Y . Liu, X. Ma, J. Bailey, F. Lu, Reflection backdoor: A natural backdoor attack on deep neural networks, in: Eu- ropean Conference on Computer Vision, Springer, 2020, pp. 182–199

2020

[6] [6]

H. A. A. K. Hammoud, B. Ghanem, Check your other door! creating backdoor attacks in the frequency domain, in: 33rd British Machine Vision Conference Proceedings, BMVC 2022, 2022

2022

[7] [7]

X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, Y . Zhang, Badnl: Backdoor attacks against nlp models with semantic-preserving improvements, in: Pro- ceedings of the 37th Annual Computer Security Applica- tions Conference, 2021, pp. 554–569

2021

[8] [8]

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y . Zhao, Neural cleanse: Identifying and mitigating backdoor attacks in neural networks, in: 2019 IEEE sym- posium on security and privacy (SP), IEEE, 2019, pp. 707–723

2019

[9] [9]

Z. Shen, W. Hou, Y . Li, Cssba: A clean label sample- specific backdoor attack, in: 2023 IEEE International Conference on Image Processing (ICIP), IEEE, 2023, pp. 965–969

2023

[10] [10]

H. Cai, P. Zhang, Y . Xiao, S. Ji, M. Xiao, L. Cheng, Clean- label backdoor attack based on robust feature attenuation for speech recognition, Expert Systems with Applications (2025) 127546

2025

[11] [11]

URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11

360 Digital Security Group, Security vulnerability report on large language models: A comprehensive study from the perspective of real-world vulnerabilities, Technical report, 360 Digital Security Group, Beijing (2024). URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11

2024

[12] [12]

A Small Number of Samples Can Poison LLMs of Any Size

A. Souly, J. Rando, E. Chapman, X. Davies, B. Hasir- cioglu, E. Shereen, C. Mougan, V . Mavroudis, E. Jones, C. Hicks, et al., Poisoning attacks on llms require a near-constant number of poison samples, arXiv preprint arXiv:2510.07192 (2025)

work page arXiv 2025

[13] [13]

R. Wang, H. Chen, Z. Zhu, L. Liu, B. Wu, Versatile back- door attack with visible, semantic, sample-specific and compatible triggers, IEEE Transactions on Pattern Analy- sis and Machine Intelligence (2025)

2025

[14] [14]

X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep learning systems using data poisoning, in: arXiv preprint arXiv:1712.05526, 2017, presented at the Workshop on Machine Learning and Computer Security

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Jiang, H

W. Jiang, H. Li, G. Xu, H. Ren, H. Yang, T. Zhang, S. Yu, Rethinking the design of backdoor triggers and adversarial perturbations: A color space perspective, IEEE Transac- tions on Dependable and Secure Computing 22 (3) (2024) 2823–2840

2024

[16] [16]

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, S. Nepal, Strip: A defence against trojan attacks on deep neural networks, in: Proceedings of the 35th annual com- puter security applications conference, 2019, pp. 113– 125

2019

[17] [17]

K. Liu, B. Dolan-Gavitt, S. Garg, Fine-pruning: Defend- ing against backdooring attacks on deep neural networks, in: International symposium on research in attacks, intru- sions, and defenses, Springer, 2018, pp. 273–294

2018

[18] [18]

M. Xue, C. He, J. Wang, W. Liu, One-to-n & n-to- one: Two advanced backdoor attacks against deep learn- ing models, IEEE Transactions on Dependable and Secure Computing 19 (3) (2020) 1562–1578

2020

[19] [19]

K. D. Doan, Y . Lao, P. Li, Marksman backdoor: Backdoor attacks with arbitrary target class, Advances in Neural In- formation Processing Systems 35 (2022) 38260–38273

2022

[20] [20]

Salem, R

A. Salem, R. Wen, M. Backes, S. Ma, Y . Zhang, Dy- namic backdoor attacks against machine learning models, in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), IEEE, 2022, pp. 703–718

2022

[21] [21]

L. Hou, Z. Hua, Y . Li, Y . Zheng, L. Y . Zhang, M-to-n backdoor paradigm: A multi-trigger and multi-target at- tack to deep learning models, IEEE Transactions on Cir- cuits and Systems for Video Technology 34 (11) (2024) 11299–11312

2024

[22] [22]

Y . Li, X. Ma, J. He, H. Huang, Y .-G. Jiang, Multi-trigger backdoor attacks: More triggers, more threats, CoRR (2024)

2024

[23] [23]

X. Gong, B. Tian, M. Xue, S. Li, Y . Chen, Q. Wang, Megatron: Evasive clean-label backdoor attacks against vision transformer, IEEE Transactions on Dependable and Secure Computing (2025)

2025

[24] [24]

Y . Wang, H. Li, L. Zhang, Y . Hu, A. C. Kot, Clean-label attack on face authentication systems through rolling shut- ter mechanism, IEEE Signal Processing Letters 32 (2024) 36–40

2024

[25] [25]

H. L. Xinyuan, S. Joshi, T. Thebaud, J. Villalba, N. De- hak, S. Khudanpur, Clean label attacks against slu sys- tems, in: 2024 IEEE Spoken Language Technology Work- shop (SLT), IEEE, 2024, pp. 1107–1114

2024

[26] [26]

Y . Yin, H. Chen, Y . Gao, P. Sun, L. Wu, Z. Li, W. Liu, Ffcba: Feature-based full-target clean-label backdoor at- tacks, in: Proceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 3884–3892

2025

[27] [27]

M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, Z. Qin, Towards sample-specific backdoor attack with clean labels via at- tribute trigger, IEEE Transactions on Dependable and Se- cure Computing (2025)

2025

[28] [28]

W. You, D. Lowd, The ultimate cookbook for invisible poison: Crafting subtle clean-label text backdoors with style attributes, in: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), IEEE, 2025, pp. 222–246

2025

[29] [29]

L. Xie, P. Kang, H. Yang, J. Hu, Clean label backdoor attack based on feature distance guided sample selection and noise optimization, in: 2025 IEEE 24th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, 2025, pp. 950– 957

2025

[30] [30]

C. Yuan, J. Bai, S. Yuan, N. Wei, Stealthy and effec- tive clean-label backdoor attack via adaptive frequency- domain suppression and trigger combination, IEEE Trans- actions on Information Forensics and Security (2025)

2025

[31] [31]

Z. Wu, H. Li, D. Wu, S. Pang, Clear: A clean-label backdoor attack via representation-guided trigger embed- ding, in: 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, 2025, pp. 5534– 5539

2025

[32] [32]

Y . Tang, X. Xu, L. Sun, Cats: Clean-label backdoor at- tack on speech recognition via speech synthesis, Journal of Systems Architecture (2025) 103596

2025

[33] [33]

Zhang, S

C. Zhang, S. Sun, J. Tu, X. Chen, D. Wang, Clean-label backdoor attack via sample-customized feature alignment, Expert Systems with Applications (2025) 129481

2025

[34] [34]

S. Choi, S. Seo, B. Shin, H. Byun, M. Kersner, B. Kim, D. Kim, S. Ha, Temporal convolution for real-time key- word spotting on mobile devices, in: Proc. Interspeech 2019, 2019, pp. 3372–3376

2019

[35] [35]

Huang, T

L. Huang, T. Yuan, Y . Liang, Z. Chen, C. Wen, Y . Xie, J. Zhang, D. Ke, Limi-vc: A light weight voice con- version model with mutual information disentanglement, in: ICASSP 2023-2023 IEEE International Conference 12 on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2023, pp. 1–5

2023

[36] [36]

Bartoli, T

P. Bartoli, T. Bondini, C. Veronesi, A. Giudici, N. An- tonello, F. Zappa, et al., End-to-end efficiency in keyword spotting: a system-level approach for embedded micro- controllers, in: Proceedings of IEEE Sensors 2025, 2025, pp. 1–4

2025

[37] [37]

Y . Xi, H. Li, H. Li, J. Guo, X. Li, W. Ding, K. Yu, Ntc-kws: Noise-aware ctc for robust keyword spotting, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2025, pp. 1–5

2025

[38] [38]

X. Ge, X. Zhang, M. Sun, Y . Wang, L. Li, K. SongGong, Cross-domain redundancy exploration by a deep encoder– decoder network for speech steganography, Journal of In- formation Security and Applications 93 (2025) 104150

2025

[39] [39]

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

P. Warden, Speech commands: A dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[40] [40]

Becker, J

S. Becker, J. Vielhaben, M. Ackermann, K.-R. Müller, S. Lapuschkin, W. Samek, Audiomnist: Exploring ex- plainable artificial intelligence for audio analysis on a sim- ple benchmark, Journal of the Franklin Institute 361 (1) (2024) 418–428

2024

[41] [41]

Y . Xi, H. Li, B. Yang, H. Li, H. Xu, K. Yu, Tdt-kws: Fast and accurate keyword spotting using token-and-duration transducer, in: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2024, pp. 11351–11355

2024

[42] [42]

Nagrani, J

A. Nagrani, J. S. Chung, A. Zisserman, V oxceleb: A large-scale speaker identification dataset, Interspeech 2017 (2017) 2616

2017

[43] [43]

K. Zhou, B. Sisman, R. Liu, H. Li, Emotional voice con- version: Theory, databases and esd, Speech Communica- tion 137 (2022) 1–18

2022

[44] [44]

Y . Chen, S. Zheng, H. Wang, L. Cheng, Q. Chen, J. Qi, An enhanced res2net with local and global feature fusion for speaker verification, in: INTERSPEECH, 2023

2023

[45] [45]

Gazneli, G

A. Gazneli, G. Zimerman, T. Ridnik, G. Sharir, A. Noy, End-to-end audio strikes back: Boosting augmentations towards an efficient audio classification network, arXiv preprint arXiv:2204.11479 (2022)

work page arXiv 2022

[46] [46]

H. Wang, S. Zheng, Y . Chen, L. Cheng, Q. Chen, Cam++: A fast and efficient network for speaker verification using context-aware masking, arXiv preprint arXiv:2303.00332 (2023)

work page arXiv 2023

[47] [47]

Koffas, J

S. Koffas, J. Xu, M. Conti, S. Picek, Can you hear it? backdoor attacks via ultrasonic triggers, in: Proceedings of the 2022 ACM workshop on wireless security and ma- chine learning, 2022, pp. 57–62

2022

[48] [48]

T. Zhai, Y . Li, Z. Zhang, B. Wu, Y . Jiang, S.-T. Xia, Backdoor attack against speaker verification, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 2560–2564

2021

[49] [49]

Turner, D

A. Turner, D. Tsipras, A. Madry, Clean-label backdoor attacks (2018)

2018

[50] [50]

J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, C. Liu, Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency, in: ICLR, 2023

2023

[51] [51]

Xiang, Z

Z. Xiang, Z. Xiong, B. Li, Umd: Unsupervised model de- tection for x2x backdoor attacks, in: International Con- ference on Machine Learning, PMLR, 2023, pp. 38013– 38038

2023

[52] [52]

N. M. Jebreel, J. Domingo-Ferrer, Y . Li, Defending against backdoor attacks by layer-wise feature analysis, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2023, pp. 428–440

2023

[53] [53]

Y . Liu, Y . Xie, A. Srivastava, Neural trojans, in: 2017 IEEE international conference on computer design (ICCD), IEEE, 2017, pp. 45–48

2017

[54] [54]

B. Tran, J. Li, A. Madry, Spectral signatures in backdoor attacks, Advances in neural information processing sys- tems 31 (2018)

2018

[55] [55]

L. v. d. Maaten, G. Hinton, Visualizing data using t- sne, Journal of machine learning research 9 (Nov) (2008) 2579–2605. 13

2008