A training-free framework for intra-utterance emotion and duration control in pretrained zero-shot TTS via segment-aware conditioning and steering strategies.
To evaluate the smoothness of transitions in both emotion and speaking rate, we adopt the DNSMOS Pro10 (Cumlin et al., 2024), referred as DNSM
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TED-TTS: Training-Free Intra-Utterance Emotion and Duration Control for Text-to-Speech Synthesis
A training-free framework for intra-utterance emotion and duration control in pretrained zero-shot TTS via segment-aware conditioning and steering strategies.