pith. sign in
Pith Number

pith:XW2OGWZB

pith:2024:XW2OGWZBDLENM2K5RCFFSFTMPR
not attested not anchored not stored refs resolved

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Chunhui Wang, Jian Zhao, Kai Yu, Keqi Deng, Xie Chen, Yushen Chen, Zhikang Niu, Ziyang Ma

F5-TTS generates natural zero-shot speech by padding text with filler tokens and refining it with ConvNeXt inside a flow-matching DiT model.

arxiv:2410.06885 v3 · 2024-10-09 · eess.AS · cs.SD

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XW2OGWZBDLENM2K5RCFFSFTMPR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our design allows faster training and achieves an inference RTF of 0.15, which is greatly improved compared to state-of-the-art diffusion-based TTS models. Trained on a public 100K hours multilingual dataset, our F5-TTS exhibits highly natural and expressive zero-shot ability, seamless code-switching capability, and speed control efficiency.

C2weakest assumption

That simply padding text with filler tokens and refining with ConvNeXt is sufficient to achieve robust alignment and fast convergence without duration models or phoneme alignment, building on the feasibility shown by E2 TTS.

C3one line summary

F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.

References

128 extracted · 128 resolved · 11 Pith anchors

[2] Keith Ito and Linda Johnson , title =
[5] International Conference on Machine Learning , pages= 2022
[6] Advances in Neural Information Processing Systems , volume=
[7] Liu, Zhijun and Wang, Shuai and Zhu, Pengcheng and Bi, Mengxiao and Li, Haizhou , journal=
[8] Meister, Aleksandr and Novikov, Matvei and Karpov, Nikolay and Bakhturina, Evelina and Lavrukhin, Vitaly and Ginsburg, Boris , booktitle=. 2023 , organization= 2023

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.894074Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

bdb4e35b211ac8d6695d888a59166c7c7b860f8bbe48a77ac3a4d56cb19a7f16

Aliases

arxiv: 2410.06885 · arxiv_version: 2410.06885v3 · doi: 10.48550/arxiv.2410.06885 · pith_short_12: XW2OGWZBDLEN · pith_short_16: XW2OGWZBDLENM2K5 · pith_short_8: XW2OGWZB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XW2OGWZBDLENM2K5RCFFSFTMPR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bdb4e35b211ac8d6695d888a59166c7c7b860f8bbe48a77ac3a4d56cb19a7f16
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "b3a7045cc5062db49e67d6f377ddfcde6cf72ad49da3746e8f4cf3af1b7b1d89",
    "cross_cats_sorted": [
      "cs.SD"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "eess.AS",
    "submitted_at": "2024-10-09T13:46:34Z",
    "title_canon_sha256": "15be299e681d350e3ac1f0251de0a246b339839e069bf3787ea03c96f457655e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.06885",
    "kind": "arxiv",
    "version": 3
  }
}