SwanBench-Speech is a new benchmark that decomposes long-form speech quality into seven disentangled metrics across 17 scenarios to evaluate generation models.
Ttsds2: resources and benchmark for evaluating human-quality text to speech systems.arXiv preprint arXiv:2506.19441, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SwanVoice is a zero-shot TTS system for 1-4 speakers that reports higher richness and hierarchy scores than open-source baselines on monologue and dialogue tasks via mixed training and DiffusionNFT post-training.
citing papers explorer
-
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios
SwanBench-Speech is a new benchmark that decomposes long-form speech quality into seven disentangled metrics across 17 scenarios to evaluate generation models.
-
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue
SwanVoice is a zero-shot TTS system for 1-4 speakers that reports higher richness and hierarchy scores than open-source baselines on monologue and dialogue tasks via mixed training and DiffusionNFT post-training.