Codec-Robust Attacks on Audio LLMs
Pith reviewed 2026-05-21 06:40 UTC · model grok-4.3
The pith
Perturbations optimized inside a neural audio codec's latent space survive lossy compression and achieve high attack success on Audio LLMs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CodecAttack optimizes a perturbation directly in the continuous latent space of a neural audio codec and hardens it with multi-bitrate straight-through Expectation-over-Transformation, allowing the attack to transmit through the compression channel that removes most waveform perturbations.
What carries the argument
Optimization of the adversarial perturbation in the continuous latent representation of the neural audio codec, which aligns with the codec's internal bit allocation.
If this is right
- Lossy compression preprocessing no longer reliably blocks adversarial audio inputs to Audio LLMs.
- The same latent perturbation transfers to other codecs such as MP3 and AAC-LC without retraining.
- Perturbation energy concentrates below 4 kHz, matching the frequency bands that receive the most bits during compression.
- Three different target Audio LLM models remain vulnerable under realistic deployment pipelines.
Where Pith is reading between the lines
- Security designs that treat codec compression as a first line of defense will need additional safeguards.
- Similar latent-space optimization could be tested on other compression pipelines such as video or image codecs.
- Model developers may need to incorporate codec simulation directly into training or detection routines.
Load-bearing premise
The codec will continue to allocate most bits to the low-frequency region where the latent perturbation places its energy.
What would settle it
Running the attack through a codec that distributes bits uniformly across all frequencies or at extremely high bitrates and measuring whether success rates drop to waveform-baseline levels.
Figures
read the original abstract
Prior attacks on Audio Large Language Models (Audio LLMs) demonstrated that carefully crafted waveform-domain perturbations can force targeted adversarial outputs. As a defense mechanism against these attacks, real-world codec compression preprocessing has been studied to both detect and remove the perturbations. Yet no existing attack has demonstrated robustness against these compressions. We introduce CodecAttack, which optimizes a perturbation in a neural audio codec's continuous latent space rather than directly perturbing the audio waveform. We show that the codec's compression channel, which discards waveform perturbations, transmits perturbations crafted in its own latent space. To further harden the attack across real-world compression channels, we apply multi-bitrate straight-through Expectation-over-Transformation (EoT), all without modifying the target model. Across three realistic Audio LLM deployment scenarios and three target models, CodecAttack achieves an average 85.5% target-substring attack success rate (ASR) on Opus at moderate bitrates, while the waveform baseline trained with identical EoT hardening does not exceed 26% at any bitrate. The attack transfers to held-out codecs, reaching up to 100% ASR on MP3 and 84% on AAC-LC without retraining. A per-band energy analysis shows that the latent perturbation concentrates below 4kHz, exactly where codecs allocate the most bits, while the waveform baseline spreads into higher frequencies that codecs discard. These results demonstrate that lossy compression is not a reliable defense against adversarial audio and that codec-aware attacks pose a practical threat to deployed Audio LLM systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CodecAttack, which generates adversarial perturbations for Audio LLMs by optimizing directly in the continuous latent space of a neural audio codec rather than the waveform domain. Using multi-bitrate straight-through Expectation-over-Transformation (EoT) hardening without modifying the target model, it reports an average 85.5% target-substring attack success rate (ASR) on Opus at moderate bitrates across three models and scenarios, substantially outperforming an EoT-hardened waveform baseline (≤26% ASR at any bitrate). The attack transfers to held-out codecs (up to 100% ASR on MP3, 84% on AAC-LC). A per-band energy analysis attributes the robustness to the latent perturbation concentrating below 4 kHz, where codecs allocate most bits.
Significance. If the results hold, the work shows that codec preprocessing is not a reliable defense against adversarial attacks on Audio LLMs when perturbations are optimized in the codec's own latent space. The concrete ASR numbers, direct baseline comparison, and cross-codec transfer provide practical evidence of the threat. Credit is given for applying EoT across bitrates and for linking perturbation energy distribution to codec bit-allocation behavior. This could prompt more robust defenses or codec-aware training in audio AI systems.
major comments (1)
- [Per-band energy analysis] Per-band energy analysis (abstract and results section): the claim that latent-space optimization enables codec robustness rests on the observation that perturbations concentrate below 4 kHz while the waveform baseline spreads to higher frequencies. However, the manuscript contains no ablation comparing against a waveform attack that is explicitly band-limited to the same <4 kHz region under identical EoT hardening. Without this control, it is unclear whether the 85.5% vs. ≤26% gap is due to the latent construction itself or simply to low-frequency concentration, which directly affects the central explanation for why the attack survives compression.
minor comments (2)
- [Experimental results] The abstract and experimental description omit the exact optimization hyperparameters, number of EoT samples per bitrate, and any statistical significance tests or standard deviations on the reported ASR figures; adding these would strengthen reproducibility.
- [Abstract] Clarify the identities of the three target models and the three realistic Audio LLM deployment scenarios to help readers assess generalizability.
Simulated Author's Rebuttal
We thank the referee for the constructive comment, which highlights a valuable opportunity to strengthen the causal interpretation of our results. We address the concern directly below.
read point-by-point responses
-
Referee: [Per-band energy analysis] Per-band energy analysis (abstract and results section): the claim that latent-space optimization enables codec robustness rests on the observation that perturbations concentrate below 4 kHz while the waveform baseline spreads to higher frequencies. However, the manuscript contains no ablation comparing against a waveform attack that is explicitly band-limited to the same <4 kHz region under identical EoT hardening. Without this control, it is unclear whether the 85.5% vs. ≤26% gap is due to the latent construction itself or simply to low-frequency concentration, which directly affects the central explanation for why the attack survives compression.
Authors: We agree that the existing per-band analysis shows a correlation but does not isolate whether the robustness arises specifically from latent-space optimization or from low-frequency concentration alone. To resolve this ambiguity, we will add a new ablation in the revised manuscript: a waveform-domain attack explicitly constrained to the <4 kHz band and trained under identical multi-bitrate straight-through EoT hardening. Updated ASR results, energy distributions, and discussion will be included to clarify whether the latent construction provides an advantage beyond frequency localization. revision: yes
Circularity Check
No circularity: empirical ASR measurements on held-out codecs and bitrates
full rationale
The paper reports direct empirical results from optimizing perturbations in codec latent space and measuring target-substring ASR on Opus, MP3, and AAC-LC at various bitrates, with comparisons to EoT-hardened waveform baselines. The per-band energy analysis is presented as an observational explanation for why latent perturbations survive compression (concentrating below 4 kHz where codecs allocate bits), but this is post-hoc correlation from the generated examples rather than a closed derivation or fitted parameter renamed as prediction. No equations, self-citations, or uniqueness claims reduce the central attack success claims to inputs defined within the paper itself. The evaluation uses held-out codecs and bitrates, keeping the results falsifiable and independent of any internal construction loop.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
optimizes a perturbation in a neural audio codec's continuous latent space... multi-bitrate straight-through Expectation-over-Transformation (EoT)
-
IndisputableMonolith/Foundation/DimensionForcingalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
per-band energy analysis shows that the latent perturbation concentrates below 4 kHz
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.