pith. sign in

arxiv: 2606.10591 · v1 · pith:A53NJUP5new · submitted 2026-06-09 · 💻 cs.SD

ContextCodec: Content-Focused Context Guidance for Ultra-Low Bitrate Speech Coding

classification 💻 cs.SD
keywords contextcontent-focusedcontextcodecfeaturesspeechacousticbitratedecoding
0
0 comments X
read the original abstract

Neural speech codecs enable low-bitrate speech communication, yet at ultra-low bitrates (< 1000 bps) preserving perceptual quality and intelligibility is challenging. Existing designs often prioritize acoustic details, leaving limited capacity for the core linguistic message under tight bitrate constraints. To address this, we propose ContextCodec, a codec that transmits content-focused context features to explicitly guide reconstruction. ContextCodec adopts a dual-branch encoder that decouples acoustic details from content-focused context. The context branch is trained with a CLIP-style contrastive loss that aligns context features with phoneme indices, reducing paralinguistic leakage. During decoding, these features are injected at each decoding stage for explicit guidance. In addition, we introduce a lightweight autoregressive latent refinement module. Experiments show a strong quality-intelligibility trade-off down to 500 bps, with an RTF of 0.4886 on a typical mobile CPU.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.