Pith Number

pith:MXBMD6Y2

pith:2026:MXBMD6Y22XYGBJPHRFNFY7KVTH

not attested not anchored not stored refs resolved

Qwen3-TTS Technical Report

Baosong Yang, Bin Zhang, Dake Guo, Hangrui Hu, Hongkun Hao, Jingren Zhou, Jin Xu, Junyang Lin, Pei Zhang, Ting He, Xinfa Zhu, Xinyu Zhang, Xiong Wang, Zhifang Guo, Zishan Guo, Ziyue Jiang

Qwen3-TTS achieves state-of-the-art multilingual text-to-speech with 3-second voice cloning and low-latency streaming.

arxiv:2601.15621 v1 · 2026-01-22 · cs.SD · cs.CL · eess.AS

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{MXBMD6Y22XYGBJPHRFNFY7KVTH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set).

C2weakest assumption

That the chosen benchmarks and subjective tests accurately reflect real-world multilingual use cases and that the 5 million hours of training data contain no systematic quality or bias issues that would degrade performance outside the reported evaluations.

C3one line summary

Qwen3-TTS delivers state-of-the-art multilingual TTS performance with 3-second voice cloning, description control, and ultra-low-latency streaming via dual tokenizers and a dual-track LM architecture trained on over 5 million hours of data.

References

26 extracted · 26 resolved · 8 Pith anchors

[1] Seed-TTS: A Family of High-Quality Versatile Speech Generation Models · arXiv:2406.02430

[2] F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching · arXiv:2410.06885

[3] High Fidelity Neural Audio Compression · arXiv:2210.13438

[4] Moshi: a speech-text foundation model for real-time dialogue · arXiv:2410.00037

[5] CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens · arXiv:2407.05407

Formal links

1 machine-checked theorem link

Cited by

21 papers in Pith

Raon-OpenTTS: Open Models and Data for Robust Text-to-Speech

RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

JaiTTS: A Thai Voice Cloning Model

Receipt and verification

First computed	2026-05-17T23:38:46.851157Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

65c2c1fb1ad5f060a5e7895a5c7d5599cb7cd4c8c452f2ddd5afe4cc33bc702a

Aliases

arxiv: 2601.15621 · arxiv_version: 2601.15621v1 · doi: 10.48550/arxiv.2601.15621 · pith_short_12: MXBMD6Y22XYG · pith_short_16: MXBMD6Y22XYGBJPH · pith_short_8: MXBMD6Y2

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MXBMD6Y22XYGBJPHRFNFY7KVTH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 65c2c1fb1ad5f060a5e7895a5c7d5599cb7cd4c8c452f2ddd5afe4cc33bc702a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "43e7a156c135547a462f7739d8e3537a2f0d147f2c0f63d766e748add5dabc39",
    "cross_cats_sorted": [
      "cs.CL",
      "eess.AS"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SD",
    "submitted_at": "2026-01-22T03:51:43Z",
    "title_canon_sha256": "d99480f1d84569a33dfe616970481a3a7b0dd54aba79c8dbaf4086a6d7aa9619"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.15621",
    "kind": "arxiv",
    "version": 1
  }
}