WavTTS is the first raw-waveform diffusion TTS model using DiT flow matching and multi-scale mel supervision that approaches SOTA latent zero-shot performance while beating prior end-to-end models.
Longcat-audiodit: High-fidelity diffusion text-to-speech in the waveform latent space
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
dots.tts reports SOTA benchmark results on Seed-TTS-Eval and other tests via continuous latent-space autoregressive modeling with three listed innovations and code release.
VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.
citing papers explorer
-
dots.tts Technical Report
dots.tts reports SOTA benchmark results on Seed-TTS-Eval and other tests via continuous latent-space autoregressive modeling with three listed innovations and code release.
-
VoxCPM2 Technical Report
VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.