DUET enables fine-grained emotion control in pretrained diffusion and flow-matching TTS models via unified hidden-space steering and mel-space guidance, outperforming supervised baselines on multiple backbones.
EmoShift: Lightweight activation steering for enhanced emotion-aware speech synthesis.arXiv preprint arXiv:2601.22873, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech
DUET enables fine-grained emotion control in pretrained diffusion and flow-matching TTS models via unified hidden-space steering and mel-space guidance, outperforming supervised baselines on multiple backbones.