DRL-CLBA: A Clean Label Backdoor Attack for Speech Classification via DDPG Reinforcement Learning
Pith reviewed 2026-07-03 14:11 UTC · model grok-4.3
The pith
Reinforcement learning plants clean-label backdoors in speech classifiers that survive fine-tuning and pruning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DRL-CLBA embeds sample-specific triggers via deep audio steganography and uses DDPG to optimize target samples toward trigger-bearing anchor points in deep latent space, enabling label-migration-free poisoning that achieves high attack success rates across three datasets and four DNNs while resisting fine-tuning, pruning, and spectral signature defenses.
What carries the argument
DDPG reinforcement learning framework that optimizes target samples toward trigger-bearing anchor points in the model's deep latent space.
If this is right
- The attack reaches high success rates on multiple speech datasets and model architectures.
- It bypasses defenses that rely on detecting mismatched labels in the training set.
- The backdoor remains effective after fine-tuning, pruning, and spectral signature removal.
- Speech-controlled systems become vulnerable to training-time poisoning that evades manual inspection.
Where Pith is reading between the lines
- The same latent-space alignment idea could be tested on image or text models by replacing the audio steganography step with an equivalent embedding method.
- Defenses might need to inspect alignments between samples in feature space during training rather than relying only on labels or input statistics.
- Applying the attack to real-world voice assistant training pipelines would reveal whether the reported resistance holds outside controlled datasets.
Load-bearing premise
The reinforcement learning agent can reliably move target audio samples to match hidden trigger features in latent space so the backdoor forms without any label changes.
What would settle it
Training a model on data poisoned by the described method and finding that inputs containing the steganographic trigger produce the target class at rates no higher than unpoisoned baseline models.
Figures
read the original abstract
Deep learning models for speech classification are vulnerable to backdoor attacks, where malicious triggers cause misclassification at inference time. While sample-specific attacks can bypass many defenses, they often rely on poisoned label attack, making them detectable via manual data defense. In this paper, we propose DRL-CLBA, a novel clean label backdoor attack for speech classification that leverages Deep Deterministic Policy Gradient (DDPG) reinforcement learning. We also utilize deep audio steganography to embed sample-specific triggers into source audio, creating feature-space anchors. The proposed reinforcement learning framework effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning of target samples. Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses. The attack demonstrates strong resistance against fine-tuning, pruning, and spectral signature defenses, exposing critical vulnerabilities in speech-controlled systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DRL-CLBA, a clean-label backdoor attack on speech classification models. It uses Deep Deterministic Policy Gradient (DDPG) reinforcement learning to optimize target samples toward trigger-bearing anchor points in the model's latent space, combined with deep audio steganography to embed sample-specific triggers without requiring label changes. The central claim is that this achieves high attack success rates across three datasets and four DNNs while resisting fine-tuning, pruning, and spectral signature defenses.
Significance. If the results hold with proper validation, the work would highlight a novel RL-driven approach to clean-label attacks in the audio domain, potentially exposing vulnerabilities in speech-controlled systems and motivating new defenses. The integration of DDPG for latent-space optimization and steganography represents a methodological contribution worth exploring further.
major comments (2)
- [Abstract] Abstract: The central claim that 'Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses' is unsupported by any quantitative values, tables, figures, error bars, dataset details, or ablation studies in the manuscript.
- [Abstract] The description of the DDPG framework 'effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning' lacks any equations, algorithm details, loss functions, or state/action definitions, which is load-bearing for assessing the weakest assumption and overall soundness.
minor comments (1)
- The manuscript would benefit from adding the full experimental section, including specific ASR percentages, defense evasion rates, hyperparameter settings for DDPG, and comparisons to prior backdoor attacks.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract accordingly to strengthen clarity while preserving the manuscript's existing content.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'Experimental results across three datasets and four different DNNs demonstrate that DRL-CLBA achieves a high attack success rate, effectively bypassing some backdoor defenses' is unsupported by any quantitative values, tables, figures, error bars, dataset details, or ablation studies in the manuscript.
Authors: The manuscript body (Section 4) contains the supporting experimental results, including tables reporting attack success rates, figures on defense resistance, dataset details in Section 3.1, and ablation studies. The abstract summarizes these findings at a high level. To directly address the comment, we will revise the abstract to include key quantitative highlights drawn from the existing results in the paper. revision: yes
-
Referee: [Abstract] The description of the DDPG framework 'effectively optimizes target samples toward trigger-bearing anchor points in the model's deep latent space, enabling label-migration-free poisoning' lacks any equations, algorithm details, loss functions, or state/action definitions, which is load-bearing for assessing the weakest assumption and overall soundness.
Authors: The abstract is a concise summary; the full DDPG framework details—including state/action definitions, reward function, algorithm, and loss functions—are provided in Section 3.2 with equations and pseudocode. We will revise the abstract to briefly reference these components for improved accessibility without altering the technical content. revision: yes
Circularity Check
No circularity detectable; abstract-only text supplies no equations or derivations
full rationale
The supplied document consists solely of the abstract, which describes a DDPG-based clean-label attack at a conceptual level but contains no equations, optimization derivations, self-citations, fitted parameters presented as predictions, or any other load-bearing steps that could be inspected for circularity. Without access to methods, results, or explicit chains of reasoning, no reduction of outputs to inputs by construction can be exhibited. This is the default honest non-finding when the paper text provides no verifiable derivation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A. Berg, M. O’Connor, M. T. Cruz, Keyword transformer: A self-attention model for keyword spotting, in: Inter- speech 2021, ISCA, 2021, pp. 4249–4253
2021
-
[2]
B. Desplanques, J. Thienpondt, K. Demuynck, Ecapa- tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification, arXiv preprint arXiv:2005.07143 (2020)
-
[3]
Y . Li, Y . Jiang, Z. Li, S.-T. Xia, Backdoor learning: A survey, IEEE transactions on neural networks and learning systems 35 (1) (2022) 5–22
2022
-
[4]
T. Gu, K. Liu, B. Dolan-Gavitt, S. Garg, Badnets: Eval- uating backdooring attacks on deep neural networks, Ieee Access 7 (2019) 47230–47244
2019
-
[5]
Y . Liu, X. Ma, J. Bailey, F. Lu, Reflection backdoor: A natural backdoor attack on deep neural networks, in: Eu- ropean Conference on Computer Vision, Springer, 2020, pp. 182–199
2020
-
[6]
H. A. A. K. Hammoud, B. Ghanem, Check your other door! creating backdoor attacks in the frequency domain, in: 33rd British Machine Vision Conference Proceedings, BMVC 2022, 2022
2022
-
[7]
X. Chen, A. Salem, D. Chen, M. Backes, S. Ma, Q. Shen, Z. Wu, Y . Zhang, Badnl: Backdoor attacks against nlp models with semantic-preserving improvements, in: Pro- ceedings of the 37th Annual Computer Security Applica- tions Conference, 2021, pp. 554–569
2021
-
[8]
B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y . Zhao, Neural cleanse: Identifying and mitigating backdoor attacks in neural networks, in: 2019 IEEE sym- posium on security and privacy (SP), IEEE, 2019, pp. 707–723
2019
-
[9]
Z. Shen, W. Hou, Y . Li, Cssba: A clean label sample- specific backdoor attack, in: 2023 IEEE International Conference on Image Processing (ICIP), IEEE, 2023, pp. 965–969
2023
-
[10]
H. Cai, P. Zhang, Y . Xiao, S. Ji, M. Xiao, L. Cheng, Clean- label backdoor attack based on robust feature attenuation for speech recognition, Expert Systems with Applications (2025) 127546
2025
-
[11]
URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11
360 Digital Security Group, Security vulnerability report on large language models: A comprehensive study from the perspective of real-world vulnerabilities, Technical report, 360 Digital Security Group, Beijing (2024). URLhttps://pub1-bjyt.s3.360.cn/bcms/%E5% A4%A7%E6%A8%A1%E5%9E%8B%E5%AE%89%E5%85%A8% E6%BC%8F%E6%B4%9E%E6%8A%A5%E5%91%8A.pdf 11
2024
-
[12]
A Small Number of Samples Can Poison LLMs of Any Size
A. Souly, J. Rando, E. Chapman, X. Davies, B. Hasir- cioglu, E. Shereen, C. Mougan, V . Mavroudis, E. Jones, C. Hicks, et al., Poisoning attacks on llms require a near-constant number of poison samples, arXiv preprint arXiv:2510.07192 (2025)
-
[13]
R. Wang, H. Chen, Z. Zhu, L. Liu, B. Wu, Versatile back- door attack with visible, semantic, sample-specific and compatible triggers, IEEE Transactions on Pattern Analy- sis and Machine Intelligence (2025)
2025
-
[14]
X. Chen, C. Liu, B. Li, K. Lu, D. Song, Targeted backdoor attacks on deep learning systems using data poisoning, in: arXiv preprint arXiv:1712.05526, 2017, presented at the Workshop on Machine Learning and Computer Security
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Jiang, H
W. Jiang, H. Li, G. Xu, H. Ren, H. Yang, T. Zhang, S. Yu, Rethinking the design of backdoor triggers and adversarial perturbations: A color space perspective, IEEE Transac- tions on Dependable and Secure Computing 22 (3) (2024) 2823–2840
2024
-
[16]
Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, S. Nepal, Strip: A defence against trojan attacks on deep neural networks, in: Proceedings of the 35th annual com- puter security applications conference, 2019, pp. 113– 125
2019
-
[17]
K. Liu, B. Dolan-Gavitt, S. Garg, Fine-pruning: Defend- ing against backdooring attacks on deep neural networks, in: International symposium on research in attacks, intru- sions, and defenses, Springer, 2018, pp. 273–294
2018
-
[18]
M. Xue, C. He, J. Wang, W. Liu, One-to-n & n-to- one: Two advanced backdoor attacks against deep learn- ing models, IEEE Transactions on Dependable and Secure Computing 19 (3) (2020) 1562–1578
2020
-
[19]
K. D. Doan, Y . Lao, P. Li, Marksman backdoor: Backdoor attacks with arbitrary target class, Advances in Neural In- formation Processing Systems 35 (2022) 38260–38273
2022
-
[20]
Salem, R
A. Salem, R. Wen, M. Backes, S. Ma, Y . Zhang, Dy- namic backdoor attacks against machine learning models, in: 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), IEEE, 2022, pp. 703–718
2022
-
[21]
L. Hou, Z. Hua, Y . Li, Y . Zheng, L. Y . Zhang, M-to-n backdoor paradigm: A multi-trigger and multi-target at- tack to deep learning models, IEEE Transactions on Cir- cuits and Systems for Video Technology 34 (11) (2024) 11299–11312
2024
-
[22]
Y . Li, X. Ma, J. He, H. Huang, Y .-G. Jiang, Multi-trigger backdoor attacks: More triggers, more threats, CoRR (2024)
2024
-
[23]
X. Gong, B. Tian, M. Xue, S. Li, Y . Chen, Q. Wang, Megatron: Evasive clean-label backdoor attacks against vision transformer, IEEE Transactions on Dependable and Secure Computing (2025)
2025
-
[24]
Y . Wang, H. Li, L. Zhang, Y . Hu, A. C. Kot, Clean-label attack on face authentication systems through rolling shut- ter mechanism, IEEE Signal Processing Letters 32 (2024) 36–40
2024
-
[25]
H. L. Xinyuan, S. Joshi, T. Thebaud, J. Villalba, N. De- hak, S. Khudanpur, Clean label attacks against slu sys- tems, in: 2024 IEEE Spoken Language Technology Work- shop (SLT), IEEE, 2024, pp. 1107–1114
2024
-
[26]
Y . Yin, H. Chen, Y . Gao, P. Sun, L. Wu, Z. Li, W. Liu, Ffcba: Feature-based full-target clean-label backdoor at- tacks, in: Proceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 3884–3892
2025
-
[27]
M. Zhu, Y . Li, J. Guo, T. Wei, S.-T. Xia, Z. Qin, Towards sample-specific backdoor attack with clean labels via at- tribute trigger, IEEE Transactions on Dependable and Se- cure Computing (2025)
2025
-
[28]
W. You, D. Lowd, The ultimate cookbook for invisible poison: Crafting subtle clean-label text backdoors with style attributes, in: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), IEEE, 2025, pp. 222–246
2025
-
[29]
L. Xie, P. Kang, H. Yang, J. Hu, Clean label backdoor attack based on feature distance guided sample selection and noise optimization, in: 2025 IEEE 24th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, 2025, pp. 950– 957
2025
-
[30]
C. Yuan, J. Bai, S. Yuan, N. Wei, Stealthy and effec- tive clean-label backdoor attack via adaptive frequency- domain suppression and trigger combination, IEEE Trans- actions on Information Forensics and Security (2025)
2025
-
[31]
Z. Wu, H. Li, D. Wu, S. Pang, Clear: A clean-label backdoor attack via representation-guided trigger embed- ding, in: 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, 2025, pp. 5534– 5539
2025
-
[32]
Y . Tang, X. Xu, L. Sun, Cats: Clean-label backdoor at- tack on speech recognition via speech synthesis, Journal of Systems Architecture (2025) 103596
2025
-
[33]
Zhang, S
C. Zhang, S. Sun, J. Tu, X. Chen, D. Wang, Clean-label backdoor attack via sample-customized feature alignment, Expert Systems with Applications (2025) 129481
2025
-
[34]
S. Choi, S. Seo, B. Shin, H. Byun, M. Kersner, B. Kim, D. Kim, S. Ha, Temporal convolution for real-time key- word spotting on mobile devices, in: Proc. Interspeech 2019, 2019, pp. 3372–3376
2019
-
[35]
Huang, T
L. Huang, T. Yuan, Y . Liang, Z. Chen, C. Wen, Y . Xie, J. Zhang, D. Ke, Limi-vc: A light weight voice con- version model with mutual information disentanglement, in: ICASSP 2023-2023 IEEE International Conference 12 on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2023, pp. 1–5
2023
-
[36]
Bartoli, T
P. Bartoli, T. Bondini, C. Veronesi, A. Giudici, N. An- tonello, F. Zappa, et al., End-to-end efficiency in keyword spotting: a system-level approach for embedded micro- controllers, in: Proceedings of IEEE Sensors 2025, 2025, pp. 1–4
2025
-
[37]
Y . Xi, H. Li, H. Li, J. Guo, X. Li, W. Ding, K. Yu, Ntc-kws: Noise-aware ctc for robust keyword spotting, in: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2025, pp. 1–5
2025
-
[38]
X. Ge, X. Zhang, M. Sun, Y . Wang, L. Li, K. SongGong, Cross-domain redundancy exploration by a deep encoder– decoder network for speech steganography, Journal of In- formation Security and Applications 93 (2025) 104150
2025
-
[39]
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
P. Warden, Speech commands: A dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
Becker, J
S. Becker, J. Vielhaben, M. Ackermann, K.-R. Müller, S. Lapuschkin, W. Samek, Audiomnist: Exploring ex- plainable artificial intelligence for audio analysis on a sim- ple benchmark, Journal of the Franklin Institute 361 (1) (2024) 418–428
2024
-
[41]
Y . Xi, H. Li, B. Yang, H. Li, H. Xu, K. Yu, Tdt-kws: Fast and accurate keyword spotting using token-and-duration transducer, in: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2024, pp. 11351–11355
2024
-
[42]
Nagrani, J
A. Nagrani, J. S. Chung, A. Zisserman, V oxceleb: A large-scale speaker identification dataset, Interspeech 2017 (2017) 2616
2017
-
[43]
K. Zhou, B. Sisman, R. Liu, H. Li, Emotional voice con- version: Theory, databases and esd, Speech Communica- tion 137 (2022) 1–18
2022
-
[44]
Y . Chen, S. Zheng, H. Wang, L. Cheng, Q. Chen, J. Qi, An enhanced res2net with local and global feature fusion for speaker verification, in: INTERSPEECH, 2023
2023
-
[45]
A. Gazneli, G. Zimerman, T. Ridnik, G. Sharir, A. Noy, End-to-end audio strikes back: Boosting augmentations towards an efficient audio classification network, arXiv preprint arXiv:2204.11479 (2022)
- [46]
-
[47]
Koffas, J
S. Koffas, J. Xu, M. Conti, S. Picek, Can you hear it? backdoor attacks via ultrasonic triggers, in: Proceedings of the 2022 ACM workshop on wireless security and ma- chine learning, 2022, pp. 57–62
2022
-
[48]
T. Zhai, Y . Li, Z. Zhang, B. Wu, Y . Jiang, S.-T. Xia, Backdoor attack against speaker verification, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 2560–2564
2021
-
[49]
Turner, D
A. Turner, D. Tsipras, A. Madry, Clean-label backdoor attacks (2018)
2018
-
[50]
J. Guo, Y . Li, X. Chen, H. Guo, L. Sun, C. Liu, Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency, in: ICLR, 2023
2023
-
[51]
Xiang, Z
Z. Xiang, Z. Xiong, B. Li, Umd: Unsupervised model de- tection for x2x backdoor attacks, in: International Con- ference on Machine Learning, PMLR, 2023, pp. 38013– 38038
2023
-
[52]
N. M. Jebreel, J. Domingo-Ferrer, Y . Li, Defending against backdoor attacks by layer-wise feature analysis, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, 2023, pp. 428–440
2023
-
[53]
Y . Liu, Y . Xie, A. Srivastava, Neural trojans, in: 2017 IEEE international conference on computer design (ICCD), IEEE, 2017, pp. 45–48
2017
-
[54]
B. Tran, J. Li, A. Madry, Spectral signatures in backdoor attacks, Advances in neural information processing sys- tems 31 (2018)
2018
-
[55]
L. v. d. Maaten, G. Hinton, Visualizing data using t- sne, Journal of machine learning research 9 (Nov) (2008) 2579–2605. 13
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.