VoCodec achieves better performance than baselines at 1.1 kbps on LibriTTS by embedding voicing-driven quantization that reduces bitrate by ~27% versus uniform allocation.
VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Neural speech codecs are key to speech transmission and storage, but most use uniform quantization across frames, allocating the same bitrate regardless of content and wasting bits. We propose VoCodec, a low-bitrate streamable neural speech codec with voicing-driven quantization that assigns higher bitrate to voiced frames and lower bitrate to unvoiced frames according to perceptual sensitivity. VoCodec embeds a voicing detector in a fully causal encoder-quantizer-decoder neural coding framework, using residual scalar-vector quantization for voiced frames and simple scalar quantization for unvoiced ones. Experiments show that on the LibriTTS dataset at a 16 kHz sampling rate, VoCodec outperforms baseline neural speech codecs even at a bitrate as low as 1.1 kbps. Our further experiments also confirm that introducing voicing-driven quantization can effectively reduce the bitrate by approximately 27% compared with uniform quantization strategy.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization
VoCodec achieves better performance than baselines at 1.1 kbps on LibriTTS by embedding voicing-driven quantization that reduces bitrate by ~27% versus uniform allocation.