pith. sign in
Pith Number

pith:UZSHTDQ6

pith:2024:UZSHTDQ6ER4ZN7L5FZRDYGKWN5
not attested not anchored not stored refs resolved

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Chao Yao, Chuang Ding, Chumin Li, Dejian Zhong, Dongya Jia, Feiya Li, Hui Li, Jian Cong, Jian Wu, Jiawei Chen, Jiaxin Li, Jitong Chen, Junjie Pan, Junteng Zhang, Lelai Deng, Lin Liu, Lu Gao, Lu Lu, Mingqing Gong, Peisong Huang, Philip Anastassiou, Qidi Zhang, Qingqing Huang, Shouda Liu, Shuo Zhang, Sichao Liu, Wenjie Zhang, Xiaobin Zhuang, Xiaoyang Li, Xingxing Li, Xin Wang, Xudong Liu, Yang Zhang, Yifeng Yang, Yuanhao Yi, Yuanyuan Huo, Yuanzhe Chen, Yuchen Liu, Yuping Wang, Yuxuan Wang, Zhengxi Liu, Zhen Wei, Zhiying Huang, Zhuo Chen, Zilin Zhao, Ziyi Chen

Seed-TTS generates speech that matches human recordings in speaker similarity and naturalness according to objective metrics and listener tests.

arxiv:2406.02430 v1 · 2024-06-04 · eess.AS · cs.SD

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UZSHTDQ6ER4ZN7L5FZRDYGKWN5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Seed-TTS achieves performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations.

C2weakest assumption

That subjective listener evaluations and the chosen objective metrics reliably indicate real-world indistinguishability and that the models generalize to unseen speakers and conditions without overfitting to the training distribution.

C3one line summary

Seed-TTS models produce speech matching human naturalness and speaker similarity, with added controllability via self-distillation and reinforcement learning.

References

45 extracted · 45 resolved · 13 Pith anchors

[1] Streaming voice conversion via intermediate bottleneck features and non-streaming teacher guidance 2023
[2] StreamV oice: Streamable context-aware language modeling for real-time zero-shot voice conversion
[3] BASE TTS: lessons from building a billion- parameter text-to-speech model on 100k hours of data
[4] Mega-TTS: Zero-shot text-to-speech at scale with intrinsic inductive bias
[5] Deep reinforcement learning: An overview · arXiv:1701.07274

Formal links

3 machine-checked theorem links

Cited by

36 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.556038Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a664798e1e247996fd7d2e623c19566f7c7d5765e46c5909b3f8f491fc7c573a

Aliases

arxiv: 2406.02430 · arxiv_version: 2406.02430v1 · doi: 10.48550/arxiv.2406.02430 · pith_short_12: UZSHTDQ6ER4Z · pith_short_16: UZSHTDQ6ER4ZN7L5 · pith_short_8: UZSHTDQ6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UZSHTDQ6ER4ZN7L5FZRDYGKWN5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a664798e1e247996fd7d2e623c19566f7c7d5765e46c5909b3f8f491fc7c573a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d89b536815c0e92765ca6b32060a72535fa57099d15026dcd6562ed465c7c460",
    "cross_cats_sorted": [
      "cs.SD"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "eess.AS",
    "submitted_at": "2024-06-04T15:48:29Z",
    "title_canon_sha256": "89d4e6461411ffd6c11540a5e97e810bc4fe655a32c0d9a800fb9876c6018686"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.02430",
    "kind": "arxiv",
    "version": 1
  }
}