ASR self-verification via best-of-N sampling eliminates observed catastrophic failures in multiple neural-codec TTS models, with distillation transferring most of the robustness to single-shot decoding.
Koel-tts: Enhancing llm based speech generation with preference alignment and classifier free guidance
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
DDPO-VC applies diffusion denoising policy optimization with dual-teacher rewards to improve speaker de-identification while preserving cognitive utility on dementia speech benchmarks.
citing papers explorer
-
Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs
ASR self-verification via best-of-N sampling eliminates observed catastrophic failures in multiple neural-codec TTS models, with distillation transferring most of the robustness to single-shot decoding.
-
DDPO-VC: Speaker De-Identification via Diffusion Denoising Policy Optimization
DDPO-VC applies diffusion denoising policy optimization with dual-teacher rewards to improve speaker de-identification while preserving cognitive utility on dementia speech benchmarks.