ASR self-verification via best-of-N sampling eliminates observed catastrophic failures in multiple neural-codec TTS models, with distillation transferring most of the robustness to single-shot decoding.
Koel-tts: Enhancing llm based speech generation with preference alignment and classifier free guidance
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
DDPO-VC applies diffusion denoising policy optimization with dual-teacher rewards to improve speaker de-identification while preserving cognitive utility on dementia speech benchmarks.
citing papers explorer
No citing papers match the current filters.