Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models

Li Liu; Xixin Wu; Yanyun Wang; Yu Huang; Zi Liang

arxiv: 2605.18168 · v1 · pith:NG35DT6Knew · submitted 2026-05-18 · 💻 cs.CR · cs.SD

Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models

Yanyun Wang , Yu Huang , Zi Liang , Xixin Wu , Li Liu This is my paper

Pith reviewed 2026-05-20 09:45 UTC · model grok-4.3

classification 💻 cs.CR cs.SD

keywords jailbreak attacklarge audio language modelsacoustic latent semanticsadversarial interferencesafety alignmentuniversal triggercross-modal vulnerability

0 comments

The pith

Benign audio infused with specific acoustic latent semantics can universally jailbreak large audio language models by interfering with their safety alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that large audio language models can be jailbroken using audio that contains no malicious content but carries particular paralinguistic features from audio generation priors. These features cause an interference effect that drifts the model's inference path away from safety checks, allowing ordinary harmful text prompts to succeed. The approach creates a universal trigger set of audio clips that work across different queries and models without needing to optimize for each instance. A reader would care because it shifts the attack paradigm from embedding bad content to subtly disrupting alignment through the audio modality's inherent properties.

Core claim

The authors claim that LALM safety alignment is vulnerable to Acoustic Latent Semantics (ALS) in benign interference audio, which induces inference path drift and enables the Acoustic Interference Attack (AIA) to bypass safeguards with standard malicious text queries, achieving high success rates on multiple models without per-query optimization.

What carries the argument

Acoustic Latent Semantics (ALS), the underlying paralinguistic features intrinsic to the priors of audio generative models, which when infused into benign audio interfere with cross-modal safety alignment.

If this is right

A fixed set of instruction-neutral interference audio clips enables jailbreaks for any standard malicious text query across models.
The attack decouples the audio component from the malicious payload, unlike prior methods that embed harmful content directly in audio signals.
AIA reaches state-of-the-art attack success rates on 10 LALMs evaluated across five datasets.
Interpretability analysis links the effect to inference path drift and identifies recurring effective patterns inside ALS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the mechanism holds, alignment techniques for LALMs would need to address latent paralinguistic features in audio encoders rather than only filtering explicit content.
The same interference principle could be tested in other multimodal models that incorporate generative priors from one modality into cross-modal reasoning.
A practical mitigation test would involve preprocessing audio inputs to suppress the specific ALS patterns before they reach the model.

Load-bearing premise

The observed attack success stems specifically from interference with safety alignment via intrinsic ALS priors of audio generative models rather than from other acoustic or optimization artifacts.

What would settle it

An experiment that applies the same interference audio to LALMs whose audio encoders have been retrained or realigned to neutralize the identified ALS patterns, then measures whether jailbreak success rates fall to baseline levels.

Figures

Figures reproduced from arXiv: 2605.18168 by Li Liu, Xixin Wu, Yanyun Wang, Yu Huang, Zi Liang.

**Figure 1.** Figure 1: The paradigm-level comparison between the existing audio jailbreaks against LALMs and the proposed Acoustic Interference. Existing works fall in the following routes (or their combinations): ① optimizing (text) semantic before AGM (e.g., semantic trojans), ② explicitly adjusting coarse-grained, pre-defined proxies of audio features within AGM (e.g., discrete acoustic parameters like gender and emotion),… view at source ↗

**Figure 2.** Figure 2: The complete construction pipeline of the proposed ALS arsenal. Each element in the final arsenal Arep constitutes a triple: the corresponding representative audio xrep, latent history prompt hrep, and 12-dimensional index s. safety alignment without any instance-specific optimizations. Our contributions are summarized as follows: • For the first time, we identify Acoustic Interference as a novel vulnerab… view at source ↗

**Figure 3.** Figure 3: The exploration process for the vulnerability of LALMs to the proposed Acoustic Interference. The results show a bi-directional interference effect. The introduction of ALS suppresses the success of previously strong text attacks but amplifies that of originally relatively weak ones, indicating that even natural ALS can cause a drift in the safety alignment path of LALM inference. h and output it along wit… view at source ↗

**Figure 4.** Figure 4: The acoustic interference attack (AIA) framework. It begins with text jailbreaks. After a given query time, if the text is not able to break the target LALM merely by itself (i.e., it falls into the medium/weak set), the audio interference is activated. The universal interference audio files are taken in order from the ranked interference set and appended to construct the multi-modal jailbreak queries. art… view at source ↗

**Figure 5.** Figure 5: Mechanistic analysis of Inference Path Drift. a) Acoustic interference systematically induces a negative shift in the refusal logit margin. b) In the latent space, acoustic interference steers the representation away from the safety alignment direction, particularly in the late transformer layers (indicated by the arrow drifting towards compliance). c) Causal patching confirms that injecting acoustic inter… view at source ↗

**Figure 6.** Figure 6: Feature distribution comparison between top 25% (red) and bottom 25% (blue) successful ALS-synthesized interference audio. Most indexes demonstrate a significant impact on the jailbreak result, while intuitively, the larger the gray fields, the greater the impact [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Quantitative analysis: Test of significance with the WD score and direction of vulnerability indicated by the Peak Shift (i.e., green + / red - denotes a preference for higher / lower scores). The study is based on the distribution divergence of acoustic features across the jailbreak outcomes. Specifically, we partition the ALS arsenal into the Top 25% (highest ASR) and Bottom 25% (lowest ASR) subsets bas… view at source ↗

read the original abstract

The integration of audio modality into Large Audio Language Models (LALMs) significantly expands their attack surface. Existing jailbreak paradigms predominantly treat audio as a carrier for malicious payloads, relying on semantic optimization, acoustic parameter control, or additive perturbation to embed harmful content into the audio signal. In this work, we challenge this necessity and propose a new paradigm in which the role of audio shifts from content injection to safety alignment interference. We reveal that LALM safety alignment can be compromised solely by specific Acoustic Latent Semantics (ALS), the underlying paralinguistic features intrinsic to the priors of audio generative models. Distinct from previous works that leverage explicit acoustic parameters to merely style malicious audio, we demonstrate that interference audio, benign in content but infused with specific ALS, can serve as a universal jailbreak trigger. Leveraging this insight, we propose the Acoustic Interference Attack (AIA), which decouples the attack payload from the audio. Specifically, AIA employs a set of universal, instruction-neutral interference audio, enabling standard malicious text queries to bypass safety alignment without instance-specific optimization. Extensive experiments on 10 LALMs across five datasets demonstrate that AIA achieves the state-of-the-art attack success rate. Furthermore, our interpretability analysis uncovers the inference path drift induced by AIA and identifies the inherent effective patterns within ALS, revealing the fundamental vulnerability of cross-modal alignment in LALMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable universal audio trigger that bypasses safety in ten LALMs without embedding the payload, but the claimed mechanism via Acoustic Latent Semantics lacks the controls needed to confirm it over simpler acoustic effects.

read the letter

The main point is that these authors have a practical attack that uses fixed, content-neutral audio clips to let ordinary malicious text queries succeed against multiple audio-language models. They treat the audio as an interferer rather than a carrier, and they report it works across ten models and five datasets with what they call state-of-the-art success rates. That decoupling is the clearest shift from earlier work that tried to stuff harmful content into the waveform itself.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a new jailbreak paradigm for Large Audio Language Models (LALMs) called the Acoustic Interference Attack (AIA). It claims that benign, instruction-neutral audio infused with specific Acoustic Latent Semantics (ALS)—paralinguistic features intrinsic to audio generative model priors—can interfere with safety alignment, enabling standard malicious text queries to succeed universally without instance-specific optimization or payload embedding. Experiments on 10 LALMs across five datasets report state-of-the-art attack success rates, supported by interpretability analysis identifying inference path drift and effective ALS patterns.

Significance. If the results hold, the work is significant for revealing a fundamental vulnerability in cross-modal safety alignment of LALMs by decoupling attacks from explicit harmful audio content. The multi-model, multi-dataset evaluation and interpretability component provide broad empirical grounding and mechanistic insight into how latent audio features can induce alignment drift, which could inform defenses for audio-enabled AI systems. The shift from payload injection to alignment interference represents a conceptual advance in adversarial audio attacks.

major comments (3)

[Methodology] Methodology section (likely §3): The paper attributes the universal jailbreak effect to ALS-induced inference path drift but provides no equations, generative model details, or procedure for isolating/infusing ALS into the interference audio. This is load-bearing for the central claim, as it prevents verification that the effect stems from intrinsic paralinguistic priors rather than incidental acoustic or optimization artifacts.
[Experiments] Experiments section (likely §5, results tables): Reported SOTA attack success rates across 10 LALMs and five datasets lack error bars, statistical tests, or ablation studies on acoustic parameters (e.g., spectrogram statistics, duration, loudness) or embedding shifts. Without these controls, the attribution to alignment-specific interference cannot be isolated from confounds, weakening the causal claim.
[Interpretability analysis] Interpretability analysis (likely §6): The analysis claims to uncover inference path drift and inherent ALS patterns but does not quantify drift specifically attributable to ALS versus other factors or provide controls ruling out non-alignment explanations. This leaves the mechanistic account incomplete for supporting the universal, alignment-interference paradigm.

minor comments (2)

[Introduction] Notation: 'Acoustic Latent Semantics (ALS)' is used throughout without a formal definition, extraction equation, or reference to related latent feature work in audio models, which could improve clarity.
[Figures] Figure clarity: Captions for any drift visualization figures should explicitly define metrics (e.g., what 'path drift' quantifies) and include scale bars or statistical overlays for interpretability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and insightful comments on our manuscript. We value the recognition of the potential significance of the Acoustic Interference Attack paradigm for understanding vulnerabilities in Large Audio Language Models. Below, we provide point-by-point responses to the major comments and outline the revisions we will make to address them.

read point-by-point responses

Referee: [Methodology] Methodology section (likely §3): The paper attributes the universal jailbreak effect to ALS-induced inference path drift but provides no equations, generative model details, or procedure for isolating/infusing ALS into the interference audio. This is load-bearing for the central claim, as it prevents verification that the effect stems from intrinsic paralinguistic priors rather than incidental acoustic or optimization artifacts.

Authors: We appreciate this observation and agree that additional technical details would strengthen the presentation. In the revised manuscript, we will expand the Methodology section to include explicit equations describing the ALS infusion process, details on the audio generative models used to derive the priors, and a step-by-step procedure for isolating and infusing the Acoustic Latent Semantics into the interference audio. This will clarify that the effect arises from the paralinguistic features intrinsic to the model priors. revision: yes
Referee: [Experiments] Experiments section (likely §5, results tables): Reported SOTA attack success rates across 10 LALMs and five datasets lack error bars, statistical tests, or ablation studies on acoustic parameters (e.g., spectrogram statistics, duration, loudness) or embedding shifts. Without these controls, the attribution to alignment-specific interference cannot be isolated from confounds, weakening the causal claim.

Authors: We acknowledge the need for greater statistical rigor and controls in the experimental results. We will revise the Experiments section to include error bars in the results tables, conduct appropriate statistical tests (e.g., paired t-tests) to assess significance, and perform ablation studies varying acoustic parameters such as duration, loudness, and spectrogram statistics. These additions will help isolate the contribution of ALS to the alignment interference effect. revision: yes
Referee: [Interpretability analysis] Interpretability analysis (likely §6): The analysis claims to uncover inference path drift and inherent ALS patterns but does not quantify drift specifically attributable to ALS versus other factors or provide controls ruling out non-alignment explanations. This leaves the mechanistic account incomplete for supporting the universal, alignment-interference paradigm.

Authors: We agree that the interpretability analysis can be made more robust. In the revision, we will quantify the inference path drift specifically due to ALS by introducing controlled comparisons and additional metrics. We will also include experiments with controls to rule out non-alignment related explanations, thereby providing stronger evidence for the mechanistic role of ALS in causing alignment drift. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on external experiments

full rationale

The paper advances an empirical claim that specific Acoustic Latent Semantics (ALS) in interference audio can induce universal jailbreaks in LALMs via inference path drift. This is supported by experiments on 10 models across five datasets reporting state-of-the-art attack success rates, plus an interpretability analysis. No equations, fitted parameters, or self-cited derivations appear in the provided text that would reduce the central result to an input by construction. The work therefore qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence and transferability of Acoustic Latent Semantics as intrinsic features from audio generative model priors that can reliably induce inference path drift in LALMs; no explicit free parameters or invented entities beyond the named attack and feature set are detailed in the abstract.

axioms (1)

domain assumption LALM safety alignment can be compromised solely by specific Acoustic Latent Semantics without content injection
Invoked when the paper states that benign interference audio suffices to bypass alignment and attributes success to ALS priors.

invented entities (1)

Acoustic Latent Semantics (ALS) no independent evidence
purpose: Paralinguistic features that interfere with safety alignment
Introduced as the key mechanism; no independent falsifiable evidence (e.g., predicted acoustic signature verifiable outside the attack) is provided in the abstract.

pith-pipeline@v0.9.0 · 5790 in / 1374 out tokens · 47228 ms · 2026-05-20T09:45:15.233361+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We reveal that LALM safety alignment can be compromised solely by specific Acoustic Latent Semantics (ALS), the underlying paralinguistic features intrinsic to the priors of audio generative models... inference path drift
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical clustering... density-aware selection... 12-dimensional labeling system

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 10 internal anchors

[1]

Jailbreaking leading safety-aligned llms with simple adaptive attacks

Andriushchenko, M., Croce, F., and Flammarion, N. Jailbreaking leading safety-aligned llms with simple adaptive attacks. In International Conference on Learning Representations (ICLR), 2025

work page 2025
[2]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

N., Lee, S., and Narayanan, S

Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., and Narayanan, S. S. Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008

work page 2008
[4]

J., Tramèr, F., Hassani, H., and Wong, E

Chao, P., Debenedetti, E., Robey, A., Andriushchenko, M., Croce, F., Sehwag, V., Dobriban, E., Flammarion, N., Pappas, G. J., Tramèr, F., Hassani, H., and Wong, E. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. In Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[5]

J., and Wong, E

Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., and Wong, E. Jailbreaking black box large language models in twenty queries. In IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

work page 2025
[6]

Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models

Chen, G., Song, F., Zhao, Z., Jia, X., Liu, Y., Qiao, Y., and Zhang, W. Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models. IEEE Transactions on Dependable and Secure Computing (TDSC), 2026

work page 2026
[7]

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2022

work page 2022
[8]

Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models

Cheng, H., Xiao, E., Shao, J., Wang, Y., Yang, L., Shen, C., Torr, P., Gu, J., and Xu, R. Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[9]

W., Huang, L., Li, B., Chen, H., and Choo, K.-K

Chiu, C. W., Huang, L., Li, B., Chen, H., and Choo, K.-K. R. `do as i say not as i do': A semi-automated approach for jailbreak prompt attack against multimodal llms. arXiv preprint arXiv:2502.00735, 2025

work page arXiv 2025
[10]

Qwen2-Audio Technical Report

Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., et al. Qwen2-audio technical report. arXiv preprint arXiv:2407.10759, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Kimi-Audio Technical Report

Ding, D., Ju, Z., Leng, Y., Liu, S., Liu, T., Shang, Z., Shen, K., Song, W., Tan, X., Tang, H., et al. Kimi-audio technical report. arXiv preprint arXiv:2504.18425, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Llama-omni: Seamless speech interaction with large language models

Fang, Q., Guo, S., Zhou, Y., Ma, Z., Zhang, S., and Feng, Y. Llama-omni: Seamless speech interaction with large language models. In International Conference on Learning Representations (ICLR), 2025

work page 2025
[14]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

F., Ellis, D

Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M. Audio set: An ontology and human-labeled dataset for audio events. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017

work page 2017
[16]

Figstep: Jailbreaking large vision-language models via typographic visual prompts

Gong, Y., Ran, D., Liu, J., Wang, C., Cong, T., Wang, A., Duan, S., and Wang, X. Figstep: Jailbreaking large vision-language models via typographic visual prompts. In AAAI Conference on Artificial Intelligence (AAAI), 2025

work page 2025
[17]

J., Shlens, J., and Szegedy, C

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015

work page 2015
[18]

Gemini-3-pro, 2025

Google. Gemini-3-pro, 2025. URL https://gemini.google.com

work page 2025
[19]

Badnets: Evaluating backdooring attacks on deep neural networks

Gu, T., Liu, K., Dolan-Gavitt, B., and Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 2019

work page 2019
[20]

``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models

Gupta, I., Khachaturov, D., and Mullins, R. ``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models. In Workshop on Machine Learning for Audio \!@\! International Conference on Machine Learning (ICML), 2025

work page 2025
[21]

Best-of-n jailbreaking

Hughes, J., Price, S., Lynch, A., Schaeffer, R., Barez, F., Koyejo, S., Sleight, H., Jones, E., Perez, E., and Sharma, M. Best-of-n jailbreaking. Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[22]

Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models

Jiang, L., Rao, K., Han, S., Ettinger, A., Brahman, F., Kumar, S., Mireshghallah, N., Lu, X., Sap, M., Choi, Y., et al. Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models. Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[23]

Advwave: Stealthy adversarial jailbreak attack against large audio-language models

Kang, M., Xu, C., and Li, B. Advwave: Stealthy adversarial jailbreak attack against large audio-language models. In International Conference on Learning Representations (ICLR), 2025

work page 2025
[24]

Deep learning

LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature, 2015

work page 2015
[25]

Virus infection attack on llms: Your poisoning can spread "via" synthetic data

Liang, Z., Ye, Q., Liu, X., Wang, Y., Xu, J., and Hu, H. Virus infection attack on llms: Your poisoning can spread "via" synthetic data. Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[26]

Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model

Lin, W., Zhou, N., Wang, Y., Li, J., Xiong, H., and Liu, L. Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[27]

Safety of multimodal large language models on images and texts

Liu, X., Zhu, Y., Lan, Y., Yang, C., and Qiao, Y. Safety of multimodal large language models on images and texts. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

work page 2024
[28]

Livingstone, S. R. and Russo, F. A. The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 2018

work page 2018
[29]

The Llama 3 Herd of Models

Llama-Team and Meta-AI. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., et al. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. In International Conference on Machine Learning (ICML), 2024

work page 2024
[31]

Tree of attacks: Jailbreaking black-box llms automatically

Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., and Karbasi, A. Tree of attacks: Jailbreaking black-box llms automatically. Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[32]

Gpt-4o-audio, 2024

OpenAI. Gpt-4o-audio, 2024. URL https://www.openai.com

work page 2024
[33]

Training language models to follow instructions with human feedback

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[34]

Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models

Peng, Z., Liu, Y., Sun, Z., Li, M., Luo, Z., Zheng, J., Dong, W., He, X., Wang, X., Xue, Y., et al. Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models. In International Conference on Learning Representations (ICLR), 2026

work page 2026
[35]

Visual adversarial examples jailbreak aligned large language models

Qi, X., Huang, K., Panda, A., Henderson, P., Wang, M., and Mittal, P. Visual adversarial examples jailbreak aligned large language models. In AAAI Conference on Artificial Intelligence (AAAI), 2024

work page 2024
[36]

Multilingual and multi-accent jailbreaking of audio llms

Roh, J., Shejwalkar, V., and Houmansadr, A. Multilingual and multi-accent jailbreaking of audio llms. In Conference on Language Modeling (COLM), 2025

work page 2025
[37]

V oice jailbreak attacks against gpt-4o, 2024

Shen, X., Wu, Y., Backes, M., and Zhang, Y. Voice jailbreak attacks against gpt-4o. arXiv preprint arXiv:2405.19103, 2024

work page arXiv 2024
[38]

Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models.arXiv preprint arXiv:2505.15406,

Song, Z., Jiang, Q., Cui, M., Li, M., Gao, L., Zhang, Z., Xu, Z., Wang, Y., Wang, C., Ouyang, G., et al. Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models. arXiv preprint arXiv:2505.15406, 2025

work page arXiv 2025
[39]

Bark: Text-prompted generative audio model, 2023

Suno-AI. Bark: Text-prompted generative audio model, 2023. URL https://huggingface.co/suno/bark

work page 2023
[40]

Intriguing properties of neural networks

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014

work page 2014
[41]

Salmonn: Towards generic hearing abilities for large language models

Tang, C., Yu, W., Sun, G., Chen, X., Tan, T., Li, W., Lu, L., MA, Z., and Zhang, C. Salmonn: Towards generic hearing abilities for large language models. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[42]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi \`e re, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Poisoning language models during instruction tuning

Wan, A., Wallace, E., Shen, S., and Klein, D. Poisoning language models during instruction tuning. In International Conference on Machine Learning (ICML), 2023

work page 2023
[44]

and Liu, L

Wang, Y. and Liu, L. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025

work page 2025
[45]

Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

Wei, A., Haghtalab, N., and Steinhardt, J. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[46]

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg-Kirkpatrick, T., and Dubnov, S. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

work page 2023
[47]

Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models,

Xiao, E., Cheng, H., Shao, J., Duan, J., Xu, K., Yang, L., Gu, J., and Xu, R. Tune in, act up: Exploring the impact of audio modality-specific edits on large audio language models in jailbreak. arXiv preprint arXiv:2501.13772v1, 2025

work page arXiv 2025
[48]

Qwen2.5-Omni Technical Report

Xu, J., Guo, Z., He, J., Hu, H., He, T., Bai, S., Chen, K., Wang, J., Fan, Y., Dang, K., Zhang, B., Wang, X., Chu, Y., and Lin, J. Qwen2.5-omni technical report. arXiv preprint arXiv:2503.20215, 2025 a

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Qwen3-Omni Technical Report

Xu, J., Guo, Z., Hu, H., Chu, Y., Wang, X., He, J., Wang, Y., Shi, X., He, T., Zhu, X., Lv, Y., Wang, Y., Guo, D., Wang, H., Ma, L., Zhang, P., Zhang, X., Hao, H., Guo, Z., Yang, B., Zhang, B., Ma, Z., Wei, X., Bai, S., Chen, K., Liu, X., Wang, P., Yang, M., Liu, D., Ren, X., Zheng, B., Men, R., Zhou, F., Yu, B., Yang, J., Yu, L., Zhou, J., and Lin, J. Qw...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019

Yamagishi, J., Veaux, C., and MacDonald, K. Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019. URL https://doi.org/10.7488/ds/2645

work page doi:10.7488/ds/2645 2019
[51]

Audio is the achilles’ heel: Red teaming audio large multimodal models

Yang, H., Qu, L., Shareghi, E., and Haffari, G. Audio is the achilles’ heel: Red teaming audio large multimodal models. In Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025 a

work page 2025
[52]

Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard

Yang, Y., Zhang, X., Han, Z., Wang, S., Zhuang, J., Jin, Z., Shao, J., Sun, G., and Zhang, C. Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard. arXiv preprint arXiv:2511.10222, 2025 b

work page arXiv 2025
[53]

H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al

Ye, H., Yang, C.-H. H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al. Omnivinci: Enhancing architecture and data for omni-modal understanding llm. In International Conference on Learning Representations (ICLR), 2026

work page 2026
[54]

Smack: Semantically meaningful adversarial audio attack

Yu, Z., Chang, Y., Zhang, N., and Xiao, C. Smack: Semantically meaningful adversarial audio attack. In USENIX Security Symposium, 2023

work page 2023
[55]

J., Jia, Y., Chen, Z., and Wu, Y

Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., and Wu, Y. Libritts: A corpus derived from librispeech for text-to-speech. In INTERSPEECH, 2019

work page 2019
[56]

Mimo-audio: Audio language models are few-shot learners,

Zhang, D., Wang, G., Xue, J., Fang, K., Zhao, L., Ma, R., Ren, S., Liu, S., Guo, T., Zhuang, W., et al. Mimo-audio: Audio language models are few-shot learners. arXiv preprint arXiv:2512.23808, 2025

work page arXiv 2025
[57]

Judging llm-as-a-judge with mt-bench and chatbot arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[58]

Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting

Zhou, Z., Lyu, G., Huang, Y., Wang, Z., Jia, Z., and Yang, Z. Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

work page 2024
[59]

Revitalizing canonical pre-alignment for irregular multivariate time series forecasting

Zhou, Z., Huang, Y., Wang, Y., Wu, Y., Kwok, J., and Liang, Y. Revitalizing canonical pre-alignment for irregular multivariate time series forecasting. In AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026
[60]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Jailbreaking leading safety-aligned llms with simple adaptive attacks

Andriushchenko, M., Croce, F., and Flammarion, N. Jailbreaking leading safety-aligned llms with simple adaptive attacks. In International Conference on Learning Representations (ICLR), 2025

work page 2025

[2] [2]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

N., Lee, S., and Narayanan, S

Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., and Narayanan, S. S. Iemocap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008

work page 2008

[4] [4]

J., Tramèr, F., Hassani, H., and Wong, E

Chao, P., Debenedetti, E., Robey, A., Andriushchenko, M., Croce, F., Sehwag, V., Dobriban, E., Flammarion, N., Pappas, G. J., Tramèr, F., Hassani, H., and Wong, E. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. In Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[5] [5]

J., and Wong, E

Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., and Wong, E. Jailbreaking black box large language models in twenty queries. In IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

work page 2025

[6] [6]

Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models

Chen, G., Song, F., Zhao, Z., Jia, X., Liu, Y., Qiao, Y., and Zhang, W. Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models. IEEE Transactions on Dependable and Secure Computing (TDSC), 2026

work page 2026

[7] [7]

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2022

work page 2022

[8] [8]

Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models

Cheng, H., Xiao, E., Shao, J., Wang, Y., Yang, L., Shen, C., Torr, P., Gu, J., and Xu, R. Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[9] [9]

W., Huang, L., Li, B., Chen, H., and Choo, K.-K

Chiu, C. W., Huang, L., Li, B., Chen, H., and Choo, K.-K. R. `do as i say not as i do': A semi-automated approach for jailbreak prompt attack against multimodal llms. arXiv preprint arXiv:2502.00735, 2025

work page arXiv 2025

[10] [10]

Qwen2-Audio Technical Report

Chu, Y., Xu, J., Yang, Q., Wei, H., Wei, X., Guo, Z., Leng, Y., Lv, Y., He, J., Lin, J., et al. Qwen2-audio technical report. arXiv preprint arXiv:2407.10759, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Kimi-Audio Technical Report

Ding, D., Ju, Z., Leng, Y., Liu, S., Liu, T., Shang, Z., Shen, K., Song, W., Tan, X., Tang, H., et al. Kimi-audio technical report. arXiv preprint arXiv:2504.18425, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Llama-omni: Seamless speech interaction with large language models

Fang, Q., Guo, S., Zhou, Y., Ma, Z., Zhang, S., and Feng, Y. Llama-omni: Seamless speech interaction with large language models. In International Conference on Learning Representations (ICLR), 2025

work page 2025

[14] [14]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

F., Ellis, D

Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M. Audio set: An ontology and human-labeled dataset for audio events. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017

work page 2017

[16] [16]

Figstep: Jailbreaking large vision-language models via typographic visual prompts

Gong, Y., Ran, D., Liu, J., Wang, C., Cong, T., Wang, A., Duan, S., and Wang, X. Figstep: Jailbreaking large vision-language models via typographic visual prompts. In AAAI Conference on Artificial Intelligence (AAAI), 2025

work page 2025

[17] [17]

J., Shlens, J., and Szegedy, C

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015

work page 2015

[18] [18]

Gemini-3-pro, 2025

Google. Gemini-3-pro, 2025. URL https://gemini.google.com

work page 2025

[19] [19]

Badnets: Evaluating backdooring attacks on deep neural networks

Gu, T., Liu, K., Dolan-Gavitt, B., and Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 2019

work page 2019

[20] [20]

``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models

Gupta, I., Khachaturov, D., and Mullins, R. ``i am bad'': Interpreting stealthy, universal and robust audio jailbreaks in audio-language models. In Workshop on Machine Learning for Audio \!@\! International Conference on Machine Learning (ICML), 2025

work page 2025

[21] [21]

Best-of-n jailbreaking

Hughes, J., Price, S., Lynch, A., Schaeffer, R., Barez, F., Koyejo, S., Sleight, H., Jones, E., Perez, E., and Sharma, M. Best-of-n jailbreaking. Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[22] [22]

Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models

Jiang, L., Rao, K., Han, S., Ettinger, A., Brahman, F., Kumar, S., Mireshghallah, N., Lu, X., Sap, M., Choi, Y., et al. Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models. Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[23] [23]

Advwave: Stealthy adversarial jailbreak attack against large audio-language models

Kang, M., Xu, C., and Li, B. Advwave: Stealthy adversarial jailbreak attack against large audio-language models. In International Conference on Learning Representations (ICLR), 2025

work page 2025

[24] [24]

Deep learning

LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature, 2015

work page 2015

[25] [25]

Virus infection attack on llms: Your poisoning can spread "via" synthetic data

Liang, Z., Ye, Q., Liu, X., Wang, Y., Xu, J., and Hu, H. Virus infection attack on llms: Your poisoning can spread "via" synthetic data. Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[26] [26]

Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model

Lin, W., Zhou, N., Wang, Y., Li, J., Xiong, H., and Liu, L. Backdoordm: A comprehensive benchmark for backdoor learning on diffusion model. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[27] [27]

Safety of multimodal large language models on images and texts

Liu, X., Zhu, Y., Lan, Y., Yang, C., and Qiao, Y. Safety of multimodal large language models on images and texts. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

work page 2024

[28] [28]

Livingstone, S. R. and Russo, F. A. The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 2018

work page 2018

[29] [29]

The Llama 3 Herd of Models

Llama-Team and Meta-AI. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Mazeika, M., Phan, L., Yin, X., Zou, A., Wang, Z., Mu, N., Sakhaee, E., Li, N., Basart, S., Li, B., et al. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. In International Conference on Machine Learning (ICML), 2024

work page 2024

[31] [31]

Tree of attacks: Jailbreaking black-box llms automatically

Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., and Karbasi, A. Tree of attacks: Jailbreaking black-box llms automatically. Advances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[32] [32]

Gpt-4o-audio, 2024

OpenAI. Gpt-4o-audio, 2024. URL https://www.openai.com

work page 2024

[33] [33]

Training language models to follow instructions with human feedback

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[34] [34]

Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models

Peng, Z., Liu, Y., Sun, Z., Li, M., Luo, Z., Zheng, J., Dong, W., He, X., Wang, X., Xue, Y., et al. Jalmbench: Benchmarking jailbreak vulnerabilities in audio language models. In International Conference on Learning Representations (ICLR), 2026

work page 2026

[35] [35]

Visual adversarial examples jailbreak aligned large language models

Qi, X., Huang, K., Panda, A., Henderson, P., Wang, M., and Mittal, P. Visual adversarial examples jailbreak aligned large language models. In AAAI Conference on Artificial Intelligence (AAAI), 2024

work page 2024

[36] [36]

Multilingual and multi-accent jailbreaking of audio llms

Roh, J., Shejwalkar, V., and Houmansadr, A. Multilingual and multi-accent jailbreaking of audio llms. In Conference on Language Modeling (COLM), 2025

work page 2025

[37] [37]

V oice jailbreak attacks against gpt-4o, 2024

Shen, X., Wu, Y., Backes, M., and Zhang, Y. Voice jailbreak attacks against gpt-4o. arXiv preprint arXiv:2405.19103, 2024

work page arXiv 2024

[38] [38]

Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models.arXiv preprint arXiv:2505.15406,

Song, Z., Jiang, Q., Cui, M., Li, M., Gao, L., Zhang, Z., Xu, Z., Wang, Y., Wang, C., Ouyang, G., et al. Audio jailbreak: An open comprehensive benchmark for jailbreaking large audio-language models. arXiv preprint arXiv:2505.15406, 2025

work page arXiv 2025

[39] [39]

Bark: Text-prompted generative audio model, 2023

Suno-AI. Bark: Text-prompted generative audio model, 2023. URL https://huggingface.co/suno/bark

work page 2023

[40] [40]

Intriguing properties of neural networks

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014

work page 2014

[41] [41]

Salmonn: Towards generic hearing abilities for large language models

Tang, C., Yu, W., Sun, G., Chen, X., Tan, T., Li, W., Lu, L., MA, Z., and Zhang, C. Salmonn: Towards generic hearing abilities for large language models. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[42] [42]

LLaMA: Open and Efficient Foundation Language Models

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi \`e re, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Poisoning language models during instruction tuning

Wan, A., Wallace, E., Shen, S., and Klein, D. Poisoning language models during instruction tuning. In International Conference on Machine Learning (ICML), 2023

work page 2023

[44] [44]

and Liu, L

Wang, Y. and Liu, L. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025

work page 2025

[45] [45]

Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

Wei, A., Haghtalab, N., and Steinhardt, J. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[46] [46]

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation

Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg-Kirkpatrick, T., and Dubnov, S. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023

work page 2023

[47] [47]

Jailbreak-audiobench: In-depth evaluation and analysis of jailbreak threats for large audio language models,

Xiao, E., Cheng, H., Shao, J., Duan, J., Xu, K., Yang, L., Gu, J., and Xu, R. Tune in, act up: Exploring the impact of audio modality-specific edits on large audio language models in jailbreak. arXiv preprint arXiv:2501.13772v1, 2025

work page arXiv 2025

[48] [48]

Qwen2.5-Omni Technical Report

Xu, J., Guo, Z., He, J., Hu, H., He, T., Bai, S., Chen, K., Wang, J., Fan, Y., Dang, K., Zhang, B., Wang, X., Chu, Y., and Lin, J. Qwen2.5-omni technical report. arXiv preprint arXiv:2503.20215, 2025 a

work page internal anchor Pith review Pith/arXiv arXiv 2025

[49] [49]

Qwen3-Omni Technical Report

Xu, J., Guo, Z., Hu, H., Chu, Y., Wang, X., He, J., Wang, Y., Shi, X., He, T., Zhu, X., Lv, Y., Wang, Y., Guo, D., Wang, H., Ma, L., Zhang, P., Zhang, X., Hao, H., Guo, Z., Yang, B., Zhang, B., Ma, Z., Wei, X., Bai, S., Chen, K., Liu, X., Wang, P., Yang, M., Liu, D., Ren, X., Zheng, B., Men, R., Zhou, F., Yu, B., Yang, J., Yu, L., Zhou, J., and Lin, J. Qw...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019

Yamagishi, J., Veaux, C., and MacDonald, K. Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit, 2019. URL https://doi.org/10.7488/ds/2645

work page doi:10.7488/ds/2645 2019

[51] [51]

Audio is the achilles’ heel: Red teaming audio large multimodal models

Yang, H., Qu, L., Shareghi, E., and Haffari, G. Audio is the achilles’ heel: Red teaming audio large multimodal models. In Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025 a

work page 2025

[52] [52]

Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard

Yang, Y., Zhang, X., Han, Z., Wang, S., Zhuang, J., Jin, Z., Shao, J., Sun, G., and Zhang, C. Speech-audio compositional attacks on multimodal llms and their mitigation with salmonn-guard. arXiv preprint arXiv:2511.10222, 2025 b

work page arXiv 2025

[53] [53]

H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al

Ye, H., Yang, C.-H. H., Goel, A., Huang, W., Zhu, L., Su, Y., Lin, S., Cheng, A.-C., Wan, Z., Tian, J., et al. Omnivinci: Enhancing architecture and data for omni-modal understanding llm. In International Conference on Learning Representations (ICLR), 2026

work page 2026

[54] [54]

Smack: Semantically meaningful adversarial audio attack

Yu, Z., Chang, Y., Zhang, N., and Xiao, C. Smack: Semantically meaningful adversarial audio attack. In USENIX Security Symposium, 2023

work page 2023

[55] [55]

J., Jia, Y., Chen, Z., and Wu, Y

Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., and Wu, Y. Libritts: A corpus derived from librispeech for text-to-speech. In INTERSPEECH, 2019

work page 2019

[56] [56]

Mimo-audio: Audio language models are few-shot learners,

Zhang, D., Wang, G., Xue, J., Fang, K., Zhao, L., Ma, R., Ren, S., Liu, S., Guo, T., Zhuang, W., et al. Mimo-audio: Audio language models are few-shot learners. arXiv preprint arXiv:2512.23808, 2025

work page arXiv 2025

[57] [57]

Judging llm-as-a-judge with mt-bench and chatbot arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Datasets and Benchmarks Track \!@\! Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[58] [58]

Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting

Zhou, Z., Lyu, G., Huang, Y., Wang, Z., Jia, Z., and Yang, Z. Sdformer: Transformer with spectral filter and dynamic attention for multivariate time series long-term forecasting. In International Joint Conference on Artificial Intelligence (IJCAI), 2024

work page 2024

[59] [59]

Revitalizing canonical pre-alignment for irregular multivariate time series forecasting

Zhou, Z., Huang, Y., Wang, Y., Wu, Y., Kwok, J., and Liang, Y. Revitalizing canonical pre-alignment for irregular multivariate time series forecasting. In AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026

[60] [60]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z., and Fredrikson, M. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023