pith. machine review for the scientific record. sign in

arxiv: 1904.07944 · v2 · submitted 2019-04-16 · 💻 cs.SD · cs.LG· eess.AS

Recognition: unknown

Expediting TTS Synthesis with Adversarial Vocoding

Authors on Pith no claims yet
classification 💻 cs.SD cs.LGeess.AS
keywords synthesisvocodingadversarialapproachnetworkneuralperceptually-informedspectrograms
0
0 comments X
read the original abstract

Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms na\"ive vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.