TTS-generated poisoned audio implants backdoors in SER models achieving high attack success at low poisoning ratios while preserving clean performance.
Where Do Backdoors Live? A Component-Level Analysis of Backdoor Propagation in Speech Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Speech language models (SLMs) are systems of systems: independent components that unite to achieve a common goal. Despite their heterogeneous nature, SLMs are often studied end-to-end; how information flows through the pipeline remains obscure. We investigate this question through the lens of backdoor attacks. We first establish that backdoors can propagate through the SLM, leaving all tasks highly vulnerable. From this, we design a component analysis to discover the role each component takes in backdoor learning. We find that backdoor persistence or erasure is highly dependent on the targeted component. Beyond propagation, we examine how backdoors are encoded in shared multitask embeddings, showing that poisoned samples are not directly separable from benign ones, challenging a common separability assumption used in filtering defenses. Our findings emphasize the need to treat multimodal pipelines as intricate systems with unique vulnerabilities, not solely extensions of unimodal ones.
citation-role summary
citation-polarity summary
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
citing papers explorer
-
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.