Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

· 2026 · cs.SD · arXiv 2601.20867

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space. To address this issue, we propose Semantically Expanded Prompt Tuning (SEPT)-a plug-and-play framework that explicitly regularizes the prompt embedding space by incorporating semantic neighbors generated by large language models. SEPT introduces a novel semantic expansion loss with margin constraints that promote intra-class compactness and inter-class separability, thereby enhancing the semantic structure of the prompt embedding space. For comprehensive evaluation, we establish the first benchmark setup for prompt generalization in ALMs, covering both base-to-new generalization and cross-dataset transferability. Extensive experiments demonstrate that SEPT consistently improves generalization performance across multiple prompt tuning baselines, while maintaining computational cost during inference.

representative citing papers

Constraining to Generalize: Subspace Tuning for Few-shot Generalization of Audio-Language Models

cs.SD · 2026-06-17 · unverdicted · novelty 5.0

SubT constrains text embedding drift during few-shot tuning of audio-language models via subspace parameterization, residual anchoring, and gating to improve generalization on unseen classes across 11 benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Constraining to Generalize: Subspace Tuning for Few-shot Generalization of Audio-Language Models cs.SD · 2026-06-17 · unverdicted · none · ref 52 · internal anchor
SubT constrains text embedding drift during few-shot tuning of audio-language models via subspace parameterization, residual anchoring, and gating to improve generalization on unseen classes across 11 benchmarks.

Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

fields

years

verdicts

representative citing papers

citing papers explorer