Uncovering Latent Style Factors for Expressive Speech Synthesis

· 2017 · cs.CL · arXiv 1711.00520

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Prosodic modeling is a core problem in speech synthesis. The key challenge is producing desirable prosody from textual input containing only phonetic information. In this preliminary study, we introduce the concept of "style tokens" in Tacotron, a recently proposed end-to-end neural speech synthesis model. Using style tokens, we aim to extract independent prosodic styles from training data. We show that without annotation data or an explicit supervision signal, our approach can automatically learn a variety of prosodic variations in a purely data-driven way. Importantly, each style token corresponds to a fixed style factor regardless of the given text sequence. As a result, we can control the prosodic style of synthetic speech in a somewhat predictable and globally consistent way.

representative citing papers

Forward-Backward Decoding for Regularizing End-to-End TTS

eess.AS · 2019-07-18 · unverdicted · novelty 6.0

Forward-backward decoding with divergence regularization and bidirectional decoder improves end-to-end TTS robustness and naturalness by addressing exposure bias via joint L2R/R2L training.

citing papers explorer

Showing 1 of 1 citing paper.

Forward-Backward Decoding for Regularizing End-to-End TTS eess.AS · 2019-07-18 · unverdicted · none · ref 9 · internal anchor
Forward-backward decoding with divergence regularization and bidirectional decoder improves end-to-end TTS robustness and naturalness by addressing exposure bias via joint L2R/R2L training.

Uncovering Latent Style Factors for Expressive Speech Synthesis

fields

years

verdicts

representative citing papers

citing papers explorer