pith:XW2OGWZB
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS generates natural zero-shot speech by padding text with filler tokens and refining it with ConvNeXt inside a flow-matching DiT model.
arxiv:2410.06885 v3 · 2024-10-09 · eess.AS · cs.SD
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XW2OGWZBDLENM2K5RCFFSFTMPR}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our design allows faster training and achieves an inference RTF of 0.15, which is greatly improved compared to state-of-the-art diffusion-based TTS models. Trained on a public 100K hours multilingual dataset, our F5-TTS exhibits highly natural and expressive zero-shot ability, seamless code-switching capability, and speed control efficiency.
That simply padding text with filler tokens and refining with ConvNeXt is sufficient to achieve robust alignment and fast convergence without duration models or phoneme alignment, building on the feasibility shown by E2 TTS.
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.894074Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
bdb4e35b211ac8d6695d888a59166c7c7b860f8bbe48a77ac3a4d56cb19a7f16
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XW2OGWZBDLENM2K5RCFFSFTMPR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bdb4e35b211ac8d6695d888a59166c7c7b860f8bbe48a77ac3a4d56cb19a7f16
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "b3a7045cc5062db49e67d6f377ddfcde6cf72ad49da3746e8f4cf3af1b7b1d89",
"cross_cats_sorted": [
"cs.SD"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "eess.AS",
"submitted_at": "2024-10-09T13:46:34Z",
"title_canon_sha256": "15be299e681d350e3ac1f0251de0a246b339839e069bf3787ea03c96f457655e"
},
"schema_version": "1.0",
"source": {
"id": "2410.06885",
"kind": "arxiv",
"version": 3
}
}