pith. sign in

arxiv: 2601.18904 · v2 · pith:475HWEBZnew · submitted 2026-01-26 · 💻 cs.SD · cs.AI· cs.CL

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning

classification 💻 cs.SD cs.AIcs.CL
keywords speechin-contextlearningtasksacrossadaptingaudioauditory
0
0 comments X
read the original abstract

Auditory Large Language Models (LLMs) have demonstrated strong performance across a wide range of speech and audio understanding tasks. Nevertheless, they often struggle when applied to low-resource tasks. In case in-domain labeled data are scarce or mismatched with the true test distribution, direct fine-tuning can be brittle. In-Context Learning (ICL) provides a training-free, inference-time solution by adapting auditory LLMs through conditioning on a few in-domain demonstrations. In this work, we first show that $\textit{Vanilla ICL}$, improves zero-shot performance across diverse speech and audio tasks for selected models which suggest that this ICL adaptation capability can be generalized to multimodal setting. Building on this, we propose $\textbf{Meta Speech In-Context Learning (MetaSICL)}$, a post-training recipe utilizes only high resource speech data from various tasks intending to strengthen model's in-context learning capability. Experiments indicate our proposed method outperforms direct fine-tuning in low-resource scenario.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.