F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
Advances in Neural Information Processing Systems , volume=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
method 1
citation-polarity summary
fields
eess.AS 1years
2024 1verdicts
UNVERDICTED 1roles
method 1polarities
use method 1representative citing papers
citing papers explorer
-
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.