CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models

· 2026 · eess.AS · arXiv 2606.23052

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Speech Language Models achieve reasoning capabilities, but are often hindered by massive parameter counts and a tendency to prioritize linguistic priors over acoustic features. While contrastive decoding enhances grounding by contrasting audio-aware and text-only logits, it increases inference latency. We propose Contrastive Audio-Aware Distillation (CAAD), a framework that internalizes the teacher's contrastive reasoning into the student model's weights. To overcome the high computational training overhead in the dual-path token-by-token contrastive distillation process, we introduce a synchronized teacher-forcing strategy. Anchored by unified Pseudo-Ground Truths, this mechanism enables simultaneous full-sequence generation of the teacher's contrastive distributions, allowing student to distill the audio-aware signal efficiently. Overall, CAAD yields a ~8% relative gain over standard knowledge distillation on Dynamic-SUPERB and successfully reduces linguistic bias in MCR-BENCH.

representative citing papers

CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models

eess.AS · 2026-06-22 · unverdicted · novelty 5.0

CAAD internalizes contrastive audio-aware decoding into student SLM weights via synchronized teacher-forcing, delivering an 8% relative gain over standard knowledge distillation on Dynamic-SUPERB while reducing linguistic bias on MCR-BENCH.

citing papers explorer

Showing 1 of 1 citing paper after filters.

CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models eess.AS · 2026-06-22 · unverdicted · none · ref 3 · internal anchor
CAAD internalizes contrastive audio-aware decoding into student SLM weights via synchronized teacher-forcing, delivering an 8% relative gain over standard knowledge distillation on Dynamic-SUPERB while reducing linguistic bias on MCR-BENCH.

CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models

fields

years

verdicts

representative citing papers

citing papers explorer