AugCodec: A Low-Bitrate Disentangled Neural Speech Codec via Data Augmentation

· 2026 · cs.SD · arXiv 2606.21893

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We propose AugCodec, a low-bitrate disentangled neural speech codec that leverages data augmentation to decompose speech into three distinct components: semantic, speaker, and prosody tokens. Specifically, we employ tailored augmenta tion strategies to transform speech into distinct variants, each serving as input for extracting tokens that preserve the target attribute while suppressing others. This disentanglement strategy enables substantial reduction in token rate. Further more, we introduce an augmentation loss that aligns semantic encoder outputs between source and voice-converted speech, encouraging speaker-agnostic embeddings while mitigating the acoustic mismatch induced by voice conversion. Experiments on LibriSpeech test-clean demonstrate that AugCodec significantly outperforms state-of-the-art methods in both reconstruction quality and disentanglement, while operating at only 12.5Hz with three token streams.

representative citing papers

AugCodec: A Low-Bitrate Disentangled Neural Speech Codec via Data Augmentation

cs.SD · 2026-06-20 · unverdicted · novelty 5.0

AugCodec disentangles speech into semantic, speaker, and prosody tokens via tailored data augmentations, achieving 12.5 Hz operation with three streams and outperforming prior codecs on LibriSpeech reconstruction and disentanglement metrics.

citing papers explorer

Showing 1 of 1 citing paper.

AugCodec: A Low-Bitrate Disentangled Neural Speech Codec via Data Augmentation cs.SD · 2026-06-20 · unverdicted · none · ref 2 · internal anchor
AugCodec disentangles speech into semantic, speaker, and prosody tokens via tailored data augmentations, achieving 12.5 Hz operation with three streams and outperforming prior codecs on LibriSpeech reconstruction and disentanglement metrics.

AugCodec: A Low-Bitrate Disentangled Neural Speech Codec via Data Augmentation

fields

years

verdicts

representative citing papers

citing papers explorer