RobustSpeechFlow improves TTS alignment robustness by extending contrastive flow matching with length-preserving repeat and skip latent augmentations, lowering WER from 1.44 to 1.38 on Seed-TTS-eval and CER on ZERO500.
Natural tts synthesis by condi- tioning wavenet on mel spectrogram predictions,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching
RobustSpeechFlow improves TTS alignment robustness by extending contrastive flow matching with length-preserving repeat and skip latent augmentations, lowering WER from 1.44 to 1.38 on Seed-TTS-eval and CER on ZERO500.