A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Andrew Mackie; Corey Miller; Gerald Corrigan; Noel Massey; Orhan Karaali; Otto Schnurr

arxiv: cs/9812006 · v1 · pith:IZBZJX4Snew · submitted 1998-12-05 · 💻 cs.NE · cs.HC

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Orhan Karaali , Gerald Corrigan , Noel Massey , Corey Miller , Otto Schnurr , Andrew Mackie This is my paper

classification 💻 cs.NE cs.HC

keywords neurallinguisticmodulenetworkacousticnetworksmappingrepresentation

0 comments

read the original abstract

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.

This paper has not been read by Pith yet.

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

discussion (0)