Koel-tts: Enhancing llm based speech generation with preference alignment and classifier free guidance

· 2025 · arXiv 2502.05236

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs

cs.SD · 2026-06-16 · unverdicted · novelty 6.0

ASR self-verification via best-of-N sampling eliminates observed catastrophic failures in multiple neural-codec TTS models, with distillation transferring most of the robustness to single-shot decoding.

DDPO-VC: Speaker De-Identification via Diffusion Denoising Policy Optimization

eess.AS · 2026-06-13 · unverdicted · novelty 6.0

DDPO-VC applies diffusion denoising policy optimization with dual-teacher rewards to improve speaker de-identification while preserving cognitive utility on dementia speech benchmarks.

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models

cs.CL · 2025-10-15 · 2 refs

citing papers explorer

Showing 2 of 2 citing papers after filters.

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs cs.SD · 2026-06-16 · unverdicted · none · ref 8
ASR self-verification via best-of-N sampling eliminates observed catastrophic failures in multiple neural-codec TTS models, with distillation transferring most of the robustness to single-shot decoding.
DDPO-VC: Speaker De-Identification via Diffusion Denoising Policy Optimization eess.AS · 2026-06-13 · unverdicted · none · ref 37
DDPO-VC applies diffusion denoising policy optimization with dual-teacher rewards to improve speaker de-identification while preserving cognitive utility on dementia speech benchmarks.

Koel-tts: Enhancing llm based speech generation with preference alignment and classifier free guidance

fields

years

verdicts

representative citing papers

citing papers explorer