pith. sign in

VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Neural speech codecs are key to speech transmission and storage, but most use uniform quantization across frames, allocating the same bitrate regardless of content and wasting bits. We propose VoCodec, a low-bitrate streamable neural speech codec with voicing-driven quantization that assigns higher bitrate to voiced frames and lower bitrate to unvoiced frames according to perceptual sensitivity. VoCodec embeds a voicing detector in a fully causal encoder-quantizer-decoder neural coding framework, using residual scalar-vector quantization for voiced frames and simple scalar quantization for unvoiced ones. Experiments show that on the LibriTTS dataset at a 16 kHz sampling rate, VoCodec outperforms baseline neural speech codecs even at a bitrate as low as 1.1 kbps. Our further experiments also confirm that introducing voicing-driven quantization can effectively reduce the bitrate by approximately 27% compared with uniform quantization strategy.

fields

eess.AS 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.