Where Do Backdoors Live? A Component-Level Analysis of Backdoor Propagation in Speech Language Models

Alexandrine Fortier; Jes\'us Villalba; Najim Dehak; Patrick Cardinal; Peter West; Thomas Thebaud

arxiv: 2510.01157 · v3 · submitted 2025-10-01 · 💻 cs.CL · cs.CR· cs.SD

Where Do Backdoors Live? A Component-Level Analysis of Backdoor Propagation in Speech Language Models

Alexandrine Fortier , Thomas Thebaud , Jes\'us Villalba , Najim Dehak , Patrick Cardinal , Peter West This is my paper

Pith reviewed 2026-05-18 10:18 UTC · model grok-4.3

classification 💻 cs.CL cs.CRcs.SD

keywords backdoor attacksspeech language modelscomponent analysisbackdoor propagationmultimodal pipelinesembedding separabilitytask vulnerability

0 comments

The pith

Backdoors propagate through speech language models to leave all tasks vulnerable, with their persistence depending on the targeted component.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Speech language models consist of multiple independent components that work together in a pipeline. The paper demonstrates that backdoors introduced into this system can propagate to affect every task the model performs. It further shows that the survival or disappearance of the backdoor hinges on which specific component is attacked. The research also finds that in the shared embeddings for multiple tasks, poisoned examples do not separate clearly from clean ones. This setup questions the reliability of defenses that assume easy separation of malicious data.

Core claim

Backdoors can propagate through the SLM, leaving all tasks highly vulnerable. Backdoor persistence or erasure is highly dependent on the targeted component. Poisoned samples are not directly separable from benign ones in shared multitask embeddings, challenging a common separability assumption used in filtering defenses.

What carries the argument

Component-level analysis of backdoor injection and observation across individual stages in the speech language model pipeline.

If this is right

Defenses must consider the flow of backdoors between components instead of viewing the model as a monolithic entity.
Compromising any single component can expose all associated tasks to the backdoor attack.
Filtering defenses based on separating poisoned from benign samples may fail in multitask embedding spaces.
Treating multimodal systems as unique rather than simple extensions of single-modality models is necessary for security.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security measures for chained AI components should map out propagation routes to find weak points.
Applying similar component breakdowns to other pipeline-based AI systems could uncover analogous vulnerabilities.
Future model designs might focus on isolating or protecting components that allow long-term backdoor survival.

Load-bearing premise

The component analysis assumes that backdoors can be injected into and isolated within individual pipeline stages without the injection process creating confounding interactions across stages.

What would settle it

A demonstration that backdoors do not propagate to all tasks or that poisoned samples separate clearly from benign ones in the embeddings would contradict the claims of system-wide vulnerability and non-separability.

read the original abstract

Speech language models (SLMs) are systems of systems: independent components that unite to achieve a common goal. Despite their heterogeneous nature, SLMs are often studied end-to-end; how information flows through the pipeline remains obscure. We investigate this question through the lens of backdoor attacks. We first establish that backdoors can propagate through the SLM, leaving all tasks highly vulnerable. From this, we design a component analysis to reveal the role each component takes in backdoor learning. We find that backdoor persistence or erasure is highly dependent on the targeted component. Beyond propagation, we examine how backdoors are encoded in shared multitask embeddings, showing that poisoned samples are not directly separable from benign ones, challenging a common separability assumption used in filtering defenses. Our findings emphasize the need to treat multimodal pipelines as intricate systems with unique vulnerabilities, not solely extensions of unimodal ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper examines backdoor attacks in speech language models (SLMs) viewed as heterogeneous pipelines of components (e.g., acoustic encoder and language decoder). It claims that backdoors propagate through the full SLM, rendering all tasks highly vulnerable; that backdoor persistence or erasure depends strongly on which component is targeted during injection; and that poisoned samples cannot be directly separated from benign ones in shared multitask embeddings, undermining common filtering defenses.

Significance. If the component-level attributions hold under clean isolation, the work usefully shifts attention from end-to-end SLM behavior to pipeline-specific vulnerabilities in multimodal systems. The controlled-injection methodology and the non-separability observation in embeddings are concrete contributions that could guide defense design beyond unimodal backdoor literature.

major comments (2)

[§4] §4 (Component Analysis): The attribution of persistence/erasure differences to specific components requires explicit verification that injection into one stage does not produce measurable leakage into others via shared embeddings or joint optimization. Without reported cross-stage backdoor activation rates or ablation controls during the poisoning phase, observed component dependence may reflect uneven propagation rather than intrinsic component properties.
[§5] §5 (Embedding Analysis): The claim that poisoned samples are not directly separable rests on qualitative visualization or distance metrics; quantitative support such as linear separability accuracy, silhouette scores, or downstream filter performance on the poisoned vs. benign split is needed to substantiate the challenge to filtering defenses.

minor comments (2)

[Abstract] Abstract and §3 should report key quantitative results (attack success rates, component-specific persistence percentages, dataset sizes, and number of runs) with error bars or confidence intervals.
[Figures] Figure captions and axis labels in the embedding plots need to specify the exact dimensionality reduction method, the number of samples per class, and whether the visualization is from a held-out set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below with explanations based on our experimental design and indicate the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [§4] §4 (Component Analysis): The attribution of persistence/erasure differences to specific components requires explicit verification that injection into one stage does not produce measurable leakage into others via shared embeddings or joint optimization. Without reported cross-stage backdoor activation rates or ablation controls during the poisoning phase, observed component dependence may reflect uneven propagation rather than intrinsic component properties.

Authors: We agree that explicit verification of isolation during injection is important to attribute effects to intrinsic component properties rather than leakage. Our component-targeted poisoning was performed by selectively updating parameters within the chosen component (e.g., acoustic encoder) while holding the other components fixed during that poisoning run. To directly address the concern, we will add cross-stage backdoor activation rates and ablation controls in the revised §4, measuring activation success when a backdoor is injected into one component but evaluated on downstream tasks that rely on other components. This will help confirm the observed persistence/erasure patterns. revision: yes
Referee: [§5] §5 (Embedding Analysis): The claim that poisoned samples are not directly separable rests on qualitative visualization or distance metrics; quantitative support such as linear separability accuracy, silhouette scores, or downstream filter performance on the poisoned vs. benign split is needed to substantiate the challenge to filtering defenses.

Authors: We acknowledge that additional quantitative metrics would make the non-separability claim more robust. Our current results use t-SNE visualizations and pairwise distance metrics to demonstrate overlap in the shared multitask embeddings. In the revision, we will incorporate linear separability accuracy via a trained probe classifier, silhouette scores for cluster quality, and the performance of a basic filtering defense applied to the poisoned versus benign sample split. These additions will provide stronger evidence against the separability assumption in filtering defenses. revision: yes

Circularity Check

0 steps flagged

Empirical component analysis of backdoor propagation shows no circular derivation

full rationale

The paper reports results from controlled backdoor injection experiments across SLM pipeline stages, observing propagation, persistence, and embedding separability. No equations, fitted parameters, or self-referential definitions are used to derive the central claims; observations are direct experimental outcomes. No load-bearing self-citations or ansatzes reduce the findings to inputs by construction. The derivation chain is self-contained against external benchmarks via empirical measurement.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard machine-learning assumptions about backdoor injection and component isolation; no new free parameters, axioms, or invented entities are introduced in the abstract.

axioms (1)

domain assumption Backdoor attacks can be targeted at individual pipeline components independently
This premise underpins the component analysis design described in the abstract.

pith-pipeline@v0.9.0 · 5702 in / 1124 out tokens · 36250 ms · 2026-05-18T10:18:22.791508+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conduct a component-level analysis that isolates the role of the audio encoder, projection connector, and LoRA adapters in backdoor propagation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.