AugCodec disentangles speech into semantic, speaker, and prosody tokens via tailored data augmentations, achieving 12.5 Hz operation with three streams and outperforming prior codecs on LibriSpeech reconstruction and disentanglement metrics.
DisCo-Speech: Control- lable Zero-Shot Speech Generation with A Disentangled Speech Codec,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AugCodec: A Low-Bitrate Disentangled Neural Speech Codec via Data Augmentation
AugCodec disentangles speech into semantic, speaker, and prosody tokens via tailored data augmentations, achieving 12.5 Hz operation with three streams and outperforming prior codecs on LibriSpeech reconstruction and disentanglement metrics.