A code-mixing guided preference-learning method for TTS produces synthetic data that lowers mixed error rate when fine-tuning Whisper on the SEAME Mandarin-English corpus.
Text-to-speech data augmentation for low resource speech recognition
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2verdicts
UNVERDICTED 2representative citing papers
The authors propose and test a data augmentation framework based on deepfake audio to improve training of speech-to-text transcription models.
citing papers explorer
-
Improving Code-Switching ASR with Code-Mixing Guided Synthetic Speech
A code-mixing guided preference-learning method for TTS produces synthetic data that lowers mixed error rate when fine-tuning Whisper on the SEAME Mandarin-English corpus.
-
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
The authors propose and test a data augmentation framework based on deepfake audio to improve training of speech-to-text transcription models.