pith. sign in

arxiv: 2605.18168 · v1 · pith:NG35DT6Knew · submitted 2026-05-18 · 💻 cs.CR · cs.SD

Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models

Pith reviewed 2026-05-20 09:45 UTC · model grok-4.3

classification 💻 cs.CR cs.SD
keywords jailbreak attacklarge audio language modelsacoustic latent semanticsadversarial interferencesafety alignmentuniversal triggercross-modal vulnerability
0
0 comments X

The pith

Benign audio infused with specific acoustic latent semantics can universally jailbreak large audio language models by interfering with their safety alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that large audio language models can be jailbroken using audio that contains no malicious content but carries particular paralinguistic features from audio generation priors. These features cause an interference effect that drifts the model's inference path away from safety checks, allowing ordinary harmful text prompts to succeed. The approach creates a universal trigger set of audio clips that work across different queries and models without needing to optimize for each instance. A reader would care because it shifts the attack paradigm from embedding bad content to subtly disrupting alignment through the audio modality's inherent properties.

Core claim

The authors claim that LALM safety alignment is vulnerable to Acoustic Latent Semantics (ALS) in benign interference audio, which induces inference path drift and enables the Acoustic Interference Attack (AIA) to bypass safeguards with standard malicious text queries, achieving high success rates on multiple models without per-query optimization.

What carries the argument

Acoustic Latent Semantics (ALS), the underlying paralinguistic features intrinsic to the priors of audio generative models, which when infused into benign audio interfere with cross-modal safety alignment.

If this is right

  • A fixed set of instruction-neutral interference audio clips enables jailbreaks for any standard malicious text query across models.
  • The attack decouples the audio component from the malicious payload, unlike prior methods that embed harmful content directly in audio signals.
  • AIA reaches state-of-the-art attack success rates on 10 LALMs evaluated across five datasets.
  • Interpretability analysis links the effect to inference path drift and identifies recurring effective patterns inside ALS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the mechanism holds, alignment techniques for LALMs would need to address latent paralinguistic features in audio encoders rather than only filtering explicit content.
  • The same interference principle could be tested in other multimodal models that incorporate generative priors from one modality into cross-modal reasoning.
  • A practical mitigation test would involve preprocessing audio inputs to suppress the specific ALS patterns before they reach the model.

Load-bearing premise

The observed attack success stems specifically from interference with safety alignment via intrinsic ALS priors of audio generative models rather than from other acoustic or optimization artifacts.

What would settle it

An experiment that applies the same interference audio to LALMs whose audio encoders have been retrained or realigned to neutralize the identified ALS patterns, then measures whether jailbreak success rates fall to baseline levels.

Figures

Figures reproduced from arXiv: 2605.18168 by Li Liu, Xixin Wu, Yanyun Wang, Yu Huang, Zi Liang.

Figure 1
Figure 1. Figure 1: The paradigm-level comparison between the existing audio jailbreaks against LALMs and the proposed Acoustic In￾terference. Existing works fall in the following routes (or their combinations): ① optimizing (text) semantic before AGM (e.g., se￾mantic trojans), ② explicitly adjusting coarse-grained, pre-defined proxies of audio features within AGM (e.g., discrete acoustic pa￾rameters like gender and emotion),… view at source ↗
Figure 2
Figure 2. Figure 2: The complete construction pipeline of the proposed ALS arsenal. Each element in the final arsenal Arep constitutes a triple: the corresponding representative audio xrep, latent history prompt hrep, and 12-dimensional index s. safety alignment without any instance-specific optimiza￾tions. Our contributions are summarized as follows: • For the first time, we identify Acoustic Interference as a novel vulnerab… view at source ↗
Figure 3
Figure 3. Figure 3: The exploration process for the vulnerability of LALMs to the proposed Acoustic Interference. The results show a bi-directional interference effect. The introduction of ALS suppresses the success of previously strong text attacks but amplifies that of originally relatively weak ones, indicating that even natural ALS can cause a drift in the safety alignment path of LALM inference. h and output it along wit… view at source ↗
Figure 4
Figure 4. Figure 4: The acoustic interference attack (AIA) framework. It begins with text jailbreaks. After a given query time, if the text is not able to break the target LALM merely by itself (i.e., it falls into the medium/weak set), the audio interference is activated. The universal interference audio files are taken in order from the ranked interference set and appended to construct the multi-modal jailbreak queries. art… view at source ↗
Figure 5
Figure 5. Figure 5: Mechanistic analysis of Inference Path Drift. a) Acoustic interference systematically induces a negative shift in the refusal logit margin. b) In the latent space, acoustic interference steers the representation away from the safety alignment direction, particularly in the late transformer layers (indicated by the arrow drifting towards compliance). c) Causal patching confirms that injecting acoustic inter… view at source ↗
Figure 6
Figure 6. Figure 6: Feature distribution comparison between top 25% (red) and bottom 25% (blue) successful ALS-synthesized interference audio. Most indexes demonstrate a significant impact on the jailbreak result, while intuitively, the larger the gray fields, the greater the impact [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Quantitative analysis: Test of significance with the WD score and direction of vulnerability indicated by the Peak Shift (i.e., green + / red - denotes a preference for higher / lower scores). The study is based on the distribution divergence of acoustic features across the jailbreak outcomes. Specifically, we par￾tition the ALS arsenal into the Top 25% (highest ASR) and Bottom 25% (lowest ASR) subsets bas… view at source ↗
read the original abstract

The integration of audio modality into Large Audio Language Models (LALMs) significantly expands their attack surface. Existing jailbreak paradigms predominantly treat audio as a carrier for malicious payloads, relying on semantic optimization, acoustic parameter control, or additive perturbation to embed harmful content into the audio signal. In this work, we challenge this necessity and propose a new paradigm in which the role of audio shifts from content injection to safety alignment interference. We reveal that LALM safety alignment can be compromised solely by specific Acoustic Latent Semantics (ALS), the underlying paralinguistic features intrinsic to the priors of audio generative models. Distinct from previous works that leverage explicit acoustic parameters to merely style malicious audio, we demonstrate that interference audio, benign in content but infused with specific ALS, can serve as a universal jailbreak trigger. Leveraging this insight, we propose the Acoustic Interference Attack (AIA), which decouples the attack payload from the audio. Specifically, AIA employs a set of universal, instruction-neutral interference audio, enabling standard malicious text queries to bypass safety alignment without instance-specific optimization. Extensive experiments on 10 LALMs across five datasets demonstrate that AIA achieves the state-of-the-art attack success rate. Furthermore, our interpretability analysis uncovers the inference path drift induced by AIA and identifies the inherent effective patterns within ALS, revealing the fundamental vulnerability of cross-modal alignment in LALMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a new jailbreak paradigm for Large Audio Language Models (LALMs) called the Acoustic Interference Attack (AIA). It claims that benign, instruction-neutral audio infused with specific Acoustic Latent Semantics (ALS)—paralinguistic features intrinsic to audio generative model priors—can interfere with safety alignment, enabling standard malicious text queries to succeed universally without instance-specific optimization or payload embedding. Experiments on 10 LALMs across five datasets report state-of-the-art attack success rates, supported by interpretability analysis identifying inference path drift and effective ALS patterns.

Significance. If the results hold, the work is significant for revealing a fundamental vulnerability in cross-modal safety alignment of LALMs by decoupling attacks from explicit harmful audio content. The multi-model, multi-dataset evaluation and interpretability component provide broad empirical grounding and mechanistic insight into how latent audio features can induce alignment drift, which could inform defenses for audio-enabled AI systems. The shift from payload injection to alignment interference represents a conceptual advance in adversarial audio attacks.

major comments (3)
  1. [Methodology] Methodology section (likely §3): The paper attributes the universal jailbreak effect to ALS-induced inference path drift but provides no equations, generative model details, or procedure for isolating/infusing ALS into the interference audio. This is load-bearing for the central claim, as it prevents verification that the effect stems from intrinsic paralinguistic priors rather than incidental acoustic or optimization artifacts.
  2. [Experiments] Experiments section (likely §5, results tables): Reported SOTA attack success rates across 10 LALMs and five datasets lack error bars, statistical tests, or ablation studies on acoustic parameters (e.g., spectrogram statistics, duration, loudness) or embedding shifts. Without these controls, the attribution to alignment-specific interference cannot be isolated from confounds, weakening the causal claim.
  3. [Interpretability analysis] Interpretability analysis (likely §6): The analysis claims to uncover inference path drift and inherent ALS patterns but does not quantify drift specifically attributable to ALS versus other factors or provide controls ruling out non-alignment explanations. This leaves the mechanistic account incomplete for supporting the universal, alignment-interference paradigm.
minor comments (2)
  1. [Introduction] Notation: 'Acoustic Latent Semantics (ALS)' is used throughout without a formal definition, extraction equation, or reference to related latent feature work in audio models, which could improve clarity.
  2. [Figures] Figure clarity: Captions for any drift visualization figures should explicitly define metrics (e.g., what 'path drift' quantifies) and include scale bars or statistical overlays for interpretability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and insightful comments on our manuscript. We value the recognition of the potential significance of the Acoustic Interference Attack paradigm for understanding vulnerabilities in Large Audio Language Models. Below, we provide point-by-point responses to the major comments and outline the revisions we will make to address them.

read point-by-point responses
  1. Referee: [Methodology] Methodology section (likely §3): The paper attributes the universal jailbreak effect to ALS-induced inference path drift but provides no equations, generative model details, or procedure for isolating/infusing ALS into the interference audio. This is load-bearing for the central claim, as it prevents verification that the effect stems from intrinsic paralinguistic priors rather than incidental acoustic or optimization artifacts.

    Authors: We appreciate this observation and agree that additional technical details would strengthen the presentation. In the revised manuscript, we will expand the Methodology section to include explicit equations describing the ALS infusion process, details on the audio generative models used to derive the priors, and a step-by-step procedure for isolating and infusing the Acoustic Latent Semantics into the interference audio. This will clarify that the effect arises from the paralinguistic features intrinsic to the model priors. revision: yes

  2. Referee: [Experiments] Experiments section (likely §5, results tables): Reported SOTA attack success rates across 10 LALMs and five datasets lack error bars, statistical tests, or ablation studies on acoustic parameters (e.g., spectrogram statistics, duration, loudness) or embedding shifts. Without these controls, the attribution to alignment-specific interference cannot be isolated from confounds, weakening the causal claim.

    Authors: We acknowledge the need for greater statistical rigor and controls in the experimental results. We will revise the Experiments section to include error bars in the results tables, conduct appropriate statistical tests (e.g., paired t-tests) to assess significance, and perform ablation studies varying acoustic parameters such as duration, loudness, and spectrogram statistics. These additions will help isolate the contribution of ALS to the alignment interference effect. revision: yes

  3. Referee: [Interpretability analysis] Interpretability analysis (likely §6): The analysis claims to uncover inference path drift and inherent ALS patterns but does not quantify drift specifically attributable to ALS versus other factors or provide controls ruling out non-alignment explanations. This leaves the mechanistic account incomplete for supporting the universal, alignment-interference paradigm.

    Authors: We agree that the interpretability analysis can be made more robust. In the revision, we will quantify the inference path drift specifically due to ALS by introducing controlled comparisons and additional metrics. We will also include experiments with controls to rule out non-alignment related explanations, thereby providing stronger evidence for the mechanistic role of ALS in causing alignment drift. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on external experiments

full rationale

The paper advances an empirical claim that specific Acoustic Latent Semantics (ALS) in interference audio can induce universal jailbreaks in LALMs via inference path drift. This is supported by experiments on 10 models across five datasets reporting state-of-the-art attack success rates, plus an interpretability analysis. No equations, fitted parameters, or self-cited derivations appear in the provided text that would reduce the central result to an input by construction. The work therefore qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence and transferability of Acoustic Latent Semantics as intrinsic features from audio generative model priors that can reliably induce inference path drift in LALMs; no explicit free parameters or invented entities beyond the named attack and feature set are detailed in the abstract.

axioms (1)
  • domain assumption LALM safety alignment can be compromised solely by specific Acoustic Latent Semantics without content injection
    Invoked when the paper states that benign interference audio suffices to bypass alignment and attributes success to ALS priors.
invented entities (1)
  • Acoustic Latent Semantics (ALS) no independent evidence
    purpose: Paralinguistic features that interfere with safety alignment
    Introduced as the key mechanism; no independent falsifiable evidence (e.g., predicted acoustic signature verifiable outside the attack) is provided in the abstract.

pith-pipeline@v0.9.0 · 5790 in / 1374 out tokens · 47228 ms · 2026-05-20T09:45:15.233361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 10 internal anchors

  1. [1]

    Jailbreaking leading safety-aligned llms with simple adaptive attacks

    Andriushchenko, M., Croce, F., and Flammarion, N. Jailbreaking leading safety-aligned llms with simple adaptive attacks. In International Conference on Learning Representations (ICLR), 2025

  2. [2]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

  3. [3]

    N., Lee, S., and Narayanan, S

    Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., and Narayanan, S. S. Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008

  4. [4]

    J., Tramèr, F., Hassani, H., and Wong, E

    Chao, P., Debenedetti, E., Robey, A., Andriushchenko, M., Croce, F., Sehwag, V., Dobriban, E., Flammarion, N., Pappas, G. J., Tramèr, F., Hassani, H., and Wong, E. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. In Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2024

  5. [5]

    J., and Wong, E

    Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., and Wong, E. Jailbreaking black box large language models in twenty queries. In IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

  6. [6]

    Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models

    Chen, G., Song, F., Zhao, Z., Jia, X., Liu, Y., Qiao, Y., and Zhang, W. Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models. IEEE Transactions on Dependable and Secure Computing (TDSC), 2026

  7. [7]

    Wavlm: Large-scale self-supervised pre-training for full stack speech processing

    Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2022

  8. [8]

    Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models

    Cheng, H., Xiao, E., Shao, J., Wang, Y., Yang, L., Shen, C., Torr, P., Gu, J., and Xu, R. Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

  9. [9]

    W., Huang, L., Li, B., Chen, H., and Choo, K.-K

    Chiu, C. W., Huang, L., Li, B., Chen, H., and Choo, K.-K. R. `do as i say not as i do': A semi-automated approach for jailbreak prompt attack against multimodal llms. arXiv preprint arXiv:2502.00735, 2025

  10. [10]

    Qwen2-Audio Technical Report

    Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., et al. Qwen2-audio technical report. arXiv preprint arXiv:2407.10759, 2024

  11. [11]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

  12. [12]

    Kimi-Audio Technical Report

    Ding, D., Ju, Z., Leng, Y., Liu, S., Liu, T., Shang, Z., Shen, K., Song, W., Tan, X., Tang, H., et al. Kimi-audio technical report. arXiv preprint arXiv:2504.18425, 2025

  13. [13]

    Llama-omni: Seamless speech interaction with large language models

    Fang, Q., Guo, S., Zhou, Y., Ma, Z., Zhang, S., and Feng, Y. Llama-omni: Seamless speech interaction with large language models. In International Conference on Learning Representations (ICLR), 2025

  14. [14]

    Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022

  15. [15]

    F., Ellis, D

    Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M. Audio set: An ontology and human-labeled dataset for audio events. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017

  16. [16]

    Figstep: Jailbreaking large vision-language models via typographic visual prompts

    Gong, Y., Ran, D., Liu, J., Wang, C., Cong, T., Wang, A., Duan, S., and Wang, X. Figstep: Jailbreaking large vision-language models via typographic visual prompts. In AAAI Conference on Artificial Intelligence (AAAI), 2025

  17. [17]

    J., Shlens, J., and Szegedy, C

    Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015

  18. [18]

    Gemini-3-pro, 2025

    Google. Gemini-3-pro, 2025. URL https://gemini.google.com

  19. [19]

    Badnets: Evaluating backdooring attacks on deep neural networks

    Gu, T., Liu, K., Dolan-Gavitt, B., and Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 2019

  20. [20]

    ``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models

    Gupta, I., Khachaturov, D., and Mullins, R. ``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models. In Workshop on Machine Learning for Audio \!@\! International Conference on Machine Learning (ICML), 2025

  21. [21]

    Best-of-n jailbreaking

    Hughes, J., Price, S., Lynch, A., Schaeffer, R., Barez, F., Koyejo, S., Sleight, H., Jones, E., Perez, E., and Sharma, M. Best-of-n jailbreaking. Advances in Neural Information Processing Systems (NeurIPS), 2025

  22. [22]

    Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models

    Jiang, L., Rao, K., Han, S., Ettinger, A., Brahman, F., Kumar, S., Mireshghallah, N., Lu, X., Sap, M., Choi, Y., et al. Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models. Advances in Neural Information Processing Systems (NeurIPS), 2024

  23. [23]

    Advwave: Stealthy adversarial jailbreak attack against large audio-language models

    Kang, M., Xu, C., and Li, B. Advwave: Stealthy adversarial jailbreak attack against large audio-language models. In International Conference on Learning Representations (ICLR), 2025

  24. [24]

    Deep learning

    LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature, 2015

  25. [25]

    Virus infection attack on llms: Your poisoning can spread "via" synthetic data

    Liang, Z., Ye, Q., Liu, X., Wang, Y., Xu, J., and Hu, H. Virus infection attack on llms: Your poisoning can spread "via" synthetic data. Advances in Neural Information Processing Systems (NeurIPS), 2025

  26. [26]

    Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model

    Lin, W., Zhou, N., Wang, Y., Li, J., Xiong, H., and Liu, L. Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

  27. [27]

    Safety of multimodal large language models on images and texts

    Liu, X., Zhu, Y., Lan, Y., Yang, C., and Qiao, Y. Safety of multimodal large language models on images and texts. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

  28. [28]

    Livingstone, S. R. and Russo, F. A. The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 2018

  29. [29]

    The Llama 3 Herd of Models

    Llama-Team and Meta-AI. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

  30. [30]

    Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

    Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., et al. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. In International Conference on Machine Learning (ICML), 2024

  31. [31]

    Tree of attacks: Jailbreaking black-box llms automatically

    Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., and Karbasi, A. Tree of attacks: Jailbreaking black-box llms automatically. Advances in Neural Information Processing Systems (NeurIPS), 2024

  32. [32]

    Gpt-4o-audio, 2024

    OpenAI. Gpt-4o-audio, 2024. URL https://www.openai.com

  33. [33]

    Training language models to follow instructions with human feedback

    Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 2022

  34. [34]

    Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models

    Peng, Z., Liu, Y., Sun, Z., Li, M., Luo, Z., Zheng, J., Dong, W., He, X., Wang, X., Xue, Y., et al. Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models. In International Conference on Learning Representations (ICLR), 2026

  35. [35]

    Visual adversarial examples jailbreak aligned large language models

    Qi, X., Huang, K., Panda, A., Henderson, P., Wang, M., and Mittal, P. Visual adversarial examples jailbreak aligned large language models. In AAAI Conference on Artificial Intelligence (AAAI), 2024

  36. [36]

    Multilingual and multi-accent jailbreaking of audio llms

    Roh, J., Shejwalkar, V., and Houmansadr, A. Multilingual and multi-accent jailbreaking of audio llms. In Conference on Language Modeling (COLM), 2025

  37. [37]

    V oice jailbreak attacks against gpt-4o, 2024

    Shen, X., Wu, Y., Backes, M., and Zhang, Y. Voice jailbreak attacks against gpt-4o. arXiv preprint arXiv:2405.19103, 2024

  38. [38]

    Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models.arXiv preprint arXiv:2505.15406,

    Song, Z., Jiang, Q., Cui, M., Li, M., Gao, L., Zhang, Z., Xu, Z., Wang, Y., Wang, C., Ouyang, G., et al. Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models. arXiv preprint arXiv:2505.15406, 2025

  39. [39]

    Bark: Text-prompted generative audio model, 2023

    Suno-AI. Bark: Text-prompted generative audio model, 2023. URL https://huggingface.co/suno/bark

  40. [40]

    Intriguing properties of neural networks

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014

  41. [41]

    Salmonn: Towards generic hearing abilities for large language models

    Tang, C., Yu, W., Sun, G., Chen, X., Tan, T., Li, W., Lu, L., MA, Z., and Zhang, C. Salmonn: Towards generic hearing abilities for large language models. In International Conference on Learning Representations (ICLR), 2024

  42. [42]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi \`e re, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

  43. [43]

    Poisoning language models during instruction tuning

    Wan, A., Wallace, E., Shen, S., and Klein, D. Poisoning language models during instruction tuning. In International Conference on Machine Learning (ICML), 2023

  44. [44]

    and Liu, L

    Wang, Y. and Liu, L. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  45. [45]

    Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

    Wei, A., Haghtalab, N., and Steinhardt, J. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

  46. [46]

    Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

    Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg-Kirkpatrick, T., and Dubnov, S. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

  47. [47]

    Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models,

    Xiao, E., Cheng, H., Shao, J., Duan, J., Xu, K., Yang, L., Gu, J., and Xu, R. Tune in, act up: Exploring the impact of audio modality-specific edits on large audio language models in jailbreak. arXiv preprint arXiv:2501.13772v1, 2025

  48. [48]

    Qwen2.5-Omni Technical Report

    Xu, J., Guo, Z., He, J., Hu, H., He, T., Bai, S., Chen, K., Wang, J., Fan, Y., Dang, K., Zhang, B., Wang, X., Chu, Y., and Lin, J. Qwen2.5-omni technical report. arXiv preprint arXiv:2503.20215, 2025 a

  49. [49]

    Qwen3-Omni Technical Report

    Xu, J., Guo, Z., Hu, H., Chu, Y., Wang, X., He, J., Wang, Y., Shi, X., He, T., Zhu, X., Lv, Y., Wang, Y., Guo, D., Wang, H., Ma, L., Zhang, P., Zhang, X., Hao, H., Guo, Z., Yang, B., Zhang, B., Ma, Z., Wei, X., Bai, S., Chen, K., Liu, X., Wang, P., Yang, M., Liu, D., Ren, X., Zheng, B., Men, R., Zhou, F., Yu, B., Yang, J., Yu, L., Zhou, J., and Lin, J. Qw...

  50. [50]

    Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019

    Yamagishi, J., Veaux, C., and MacDonald, K. Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019. URL https://doi.org/10.7488/ds/2645

  51. [51]

    Audio is the achilles’ heel: Red teaming audio large multimodal models

    Yang, H., Qu, L., Shareghi, E., and Haffari, G. Audio is the achilles’ heel: Red teaming audio large multimodal models. In Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025 a

  52. [52]

    Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard

    Yang, Y., Zhang, X., Han, Z., Wang, S., Zhuang, J., Jin, Z., Shao, J., Sun, G., and Zhang, C. Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard. arXiv preprint arXiv:2511.10222, 2025 b

  53. [53]

    H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al

    Ye, H., Yang, C.-H. H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al. Omnivinci: Enhancing architecture and data for omni-modal understanding llm. In International Conference on Learning Representations (ICLR), 2026

  54. [54]

    Smack: Semantically meaningful adversarial audio attack

    Yu, Z., Chang, Y., Zhang, N., and Xiao, C. Smack: Semantically meaningful adversarial audio attack. In USENIX Security Symposium, 2023

  55. [55]

    J., Jia, Y., Chen, Z., and Wu, Y

    Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., and Wu, Y. Libritts: A corpus derived from librispeech for text-to-speech. In INTERSPEECH, 2019

  56. [56]

    Mimo-audio: Audio language models are few-shot learners,

    Zhang, D., Wang, G., Xue, J., Fang, K., Zhao, L., Ma, R., Ren, S., Liu, S., Guo, T., Zhuang, W., et al. Mimo-audio: Audio language models are few-shot learners. arXiv preprint arXiv:2512.23808, 2025

  57. [57]

    Judging llm-as-a-judge with mt-bench and chatbot arena

    Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2023

  58. [58]

    Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting

    Zhou, Z., Lyu, G., Huang, Y., Wang, Z., Jia, Z., and Yang, Z. Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

  59. [59]

    Revitalizing canonical pre-alignment for irregular multivariate time series forecasting

    Zhou, Z., Huang, Y., Wang, Y., Wu, Y., Kwok, J., and Liang, Y. Revitalizing canonical pre-alignment for irregular multivariate time series forecasting. In AAAI Conference on Artificial Intelligence (AAAI), 2026

  60. [60]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023