SARA is a dual-stream VAE that integrates semantic and acoustic streams to achieve high-fidelity reconstruction and natural zero-shot TTS without complex regularizers.
Ds-codec: Dual-stage training with mirror-to- nonmirror architecture switching for speech codec,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SARA: A Dual-Stream VAE for High-Fidelity Speech Generation via Integrating Semantic and Acoustic Representations
SARA is a dual-stream VAE that integrates semantic and acoustic streams to achieve high-fidelity reconstruction and natural zero-shot TTS without complex regularizers.