pith:D4TUMAVG
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
A single-stream speech codec decouples content from speaker traits to let an LLM deliver both zero-shot cloning and fine voice control.
arxiv:2503.01710 v1 · 2025-03-03 · cs.SD · cs.AI · eess.AS
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{D4TUMAVGLHEMVPYXJ3GLZSKBVT}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis.
That BiCodec's decomposition into semantic and global tokens provides clean, independent control over linguistic content and speaker attributes without quality loss or unwanted interactions between the two token streams.
Spark-TTS uses BiCodec single-stream decoupled tokens and Qwen2.5 LLM with CoT to deliver efficient state-of-the-art zero-shot voice cloning and fine-grained voice control.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.473782Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1f274602a659c8cabf174eccbcc941acd025ec017bd5fd9ef70fe12649d84e9a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/D4TUMAVGLHEMVPYXJ3GLZSKBVT \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1f274602a659c8cabf174eccbcc941acd025ec017bd5fd9ef70fe12649d84e9a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "b4bc77c613949cbe690d43096fb51fdfe685e9c1aa7a5ef3b595224bdee14e36",
"cross_cats_sorted": [
"cs.AI",
"eess.AS"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.SD",
"submitted_at": "2025-03-03T16:23:10Z",
"title_canon_sha256": "52966132713a950926ab1d240d92326b9bcb9bdd2cca5cafe7fe5f468f605054"
},
"schema_version": "1.0",
"source": {
"id": "2503.01710",
"kind": "arxiv",
"version": 1
}
}