pith. sign in

CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Speech Language Models achieve reasoning capabilities, but are often hindered by massive parameter counts and a tendency to prioritize linguistic priors over acoustic features. While contrastive decoding enhances grounding by contrasting audio-aware and text-only logits, it increases inference latency. We propose Contrastive Audio-Aware Distillation (CAAD), a framework that internalizes the teacher's contrastive reasoning into the student model's weights. To overcome the high computational training overhead in the dual-path token-by-token contrastive distillation process, we introduce a synchronized teacher-forcing strategy. Anchored by unified Pseudo-Ground Truths, this mechanism enables simultaneous full-sequence generation of the teacher's contrastive distributions, allowing student to distill the audio-aware signal efficiently. Overall, CAAD yields a ~8% relative gain over standard knowledge distillation on Dynamic-SUPERB and successfully reduces linguistic bias in MCR-BENCH.

fields

eess.AS 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • CAAD: Contrastive Audio-Aware Distillation for Efficient Speech Language Models eess.AS · 2026-06-22 · unverdicted · none · ref 3 · internal anchor

    CAAD internalizes contrastive audio-aware decoding into student SLM weights via synchronized teacher-forcing, delivering an 8% relative gain over standard knowledge distillation on Dynamic-SUPERB while reducing linguistic bias on MCR-BENCH.