FineCombo-TTS learns a unified acoustic representation with a CFM-based Speech Variance Predictor for flexible precise TTS control from reference audio and text descriptions, supported by the new FineEdit paired dataset.
Libritts-p: A corpus with speaking style and speaker identity prompts for text-to-speech and style captioning,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FineCombo-TTS: Collaborative and Precise Controllable Speech Synthesis Using Text Descriptions and Reference Speech
FineCombo-TTS learns a unified acoustic representation with a CFM-based Speech Variance Predictor for flexible precise TTS control from reference audio and text descriptions, supported by the new FineEdit paired dataset.