Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data

Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent Van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, et al · 2024 · arXiv 2402.08093

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis

eess.AS · 2026-05-16 · unverdicted · novelty 6.0

SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

cs.SD · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

X-Voice achieves zero-shot cross-lingual voice cloning across 30 languages by using IPA as a unified phonetic representation and a two-stage training process that first generates its own audio prompts then fine-tunes without text.

A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

cs.SD · 2026-04-07 · unverdicted · novelty 6.0

A framework detects speaker drift in TTS outputs by computing cosine similarities across speech segments and using LLMs for binary classification, supported by a human-validated synthetic benchmark.

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

eess.AS · 2024-06-04 · unverdicted · novelty 6.0

Seed-TTS models produce speech matching human naturalness and speaker similarity, with added controllability via self-distillation and reinforcement learning.

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

cs.SD · 2024-12-13 · unverdicted · novelty 5.0

CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.

citing papers explorer

Showing 5 of 5 citing papers.

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis eess.AS · 2026-05-16 · unverdicted · none · ref 42
SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning cs.SD · 2026-05-07 · unverdicted · none · ref 52 · 2 links
X-Voice achieves zero-shot cross-lingual voice cloning across 30 languages by using IPA as a unified phonetic representation and a two-stage training process that first generates its own audio prompts then fine-tunes without text.
A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech cs.SD · 2026-04-07 · unverdicted · none · ref 20
A framework detects speaker drift in TTS outputs by computing cosine similarities across speech segments and using LLMs for binary classification, supported by a human-validated synthetic benchmark.
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models eess.AS · 2024-06-04 · unverdicted · none · ref 3
Seed-TTS models produce speech matching human naturalness and speaker similarity, with added controllability via self-distillation and reinforcement learning.
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models cs.SD · 2024-12-13 · unverdicted · none · ref 40
CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.

Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer