CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.
Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
FineCombo-TTS learns a unified acoustic representation with a CFM-based Speech Variance Predictor for flexible precise TTS control from reference audio and text descriptions, supported by the new FineEdit paired dataset.
citing papers explorer
-
CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation
CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.
-
FineCombo-TTS: Collaborative and Precise Controllable Speech Synthesis Using Text Descriptions and Reference Speech
FineCombo-TTS learns a unified acoustic representation with a CFM-based Speech Variance Predictor for flexible precise TTS control from reference audio and text descriptions, supported by the new FineEdit paired dataset.