ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding

· 2026 · cs.SD · arXiv 2606.10591

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Neural speech codecs enable low-bitrate speech communication, yet at ultra-low bitrates (< 1000 bps) preserving perceptual quality and intelligibility is challenging. Existing designs often prioritize acoustic details, leaving limited capacity for the core linguistic message under tight bitrate constraints. To address this, we propose ContextCodec, a codec that transmits content-focused context features to explicitly guide reconstruction. ContextCodec adopts a dual-branch encoder that decouples acoustic details from content-focused context. The context branch is trained with a CLIP-style contrastive loss that aligns context features with phoneme indices, reducing paralinguistic leakage. During decoding, these features are injected at each decoding stage for explicit guidance. In addition, we introduce a lightweight autoregressive latent refinement module. Experiments show a strong quality-intelligibility trade-off down to 500 bps, with an RTF of 0.4886 on a typical mobile CPU.

representative citing papers

ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding

cs.SD · 2026-06-09 · unverdicted · novelty 5.0

ContextCodec uses a dual-branch encoder with CLIP-style contrastive training on phoneme-aligned context features plus autoregressive refinement to improve quality-intelligibility at bitrates down to 500 bps.

citing papers explorer

Showing 1 of 1 citing paper.

ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding cs.SD · 2026-06-09 · unverdicted · none · ref 4 · internal anchor
ContextCodec uses a dual-branch encoder with CLIP-style contrastive training on phoneme-aligned context features plus autoregressive refinement to improve quality-intelligibility at bitrates down to 500 bps.

ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding

fields

years

verdicts

representative citing papers

citing papers explorer