pith. sign in
Pith Number

pith:HCXY62YP

pith:2024:HCXY62YPHCZQYYPT3MZO5JI6SM
not attested not anchored not stored refs pending

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Hangrui Hu, Heng Lu, Kai Hu, Qian Chen, Shiliang Zhang, Siqi Zheng, Yexin Yang, Yue Gu, Zhifu Gao, Zhihao Du, Zhijie Yan, Ziyang Ma

Supervised semantic tokens from a multilingual ASR model enable more consistent and similar zero-shot voice cloning than unsupervised tokens in CosyVoice.

arxiv:2407.05407 v2 · 2024-07-07 · cs.SD · cs.AI · eess.AS

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HCXY62YPHCZQYYPT3MZO5JI6SM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning

C2weakest assumption

Inserting vector quantization into the multilingual ASR encoder produces tokens that retain sufficient semantic, acoustic, and prosodic information for high-quality reconstruction by the conditional flow matching model without major loss.

C3one line summary

Supervised semantic tokens from ASR enable CosyVoice to outperform unsupervised tokens in zero-shot multilingual TTS via LLM text-to-token and flow-matching token-to-speech synthesis.

Formal links

1 machine-checked theorem link

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.817252Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

38af8f6b0f38b30c61f3db32eea51e9318757ac25b5f5667b854c69832896505

Aliases

arxiv: 2407.05407 · arxiv_version: 2407.05407v2 · doi: 10.48550/arxiv.2407.05407 · pith_short_12: HCXY62YPHCZQ · pith_short_16: HCXY62YPHCZQYYPT · pith_short_8: HCXY62YP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HCXY62YPHCZQYYPT3MZO5JI6SM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 38af8f6b0f38b30c61f3db32eea51e9318757ac25b5f5667b854c69832896505
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c81bb0c9503acba49fc092e8e606bd94db76a2fc79ae9ff5f31ad8ba58b9d6ec",
    "cross_cats_sorted": [
      "cs.AI",
      "eess.AS"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SD",
    "submitted_at": "2024-07-07T15:16:19Z",
    "title_canon_sha256": "c94cc54b5b0e4e4aacad550fe4d4f44213a7a0bed116877c57b077199304a3b1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.05407",
    "kind": "arxiv",
    "version": 2
  }
}