Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Fengna Wang; Kevin El Haddad; No\'e Tits; Thierry Dutoit; Vincent Pagel

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1903.11570 v1 pith:6QOQYGBO submitted 2019-03-27 cs.CL cs.AIcs.LGcs.SDeess.AS

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

No\'e Tits , Fengna Wang , Kevin El Haddad , Vincent Pagel , Thierry Dutoit This is my paper

classification cs.CL cs.AIcs.LGcs.SDeess.AS

keywords speechcontrollatentdifferentexpressiveinterpretationspacessynthesis

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impressive results. However the control parameters often consist of latent variables and remain complex to interpret. In this paper, we analyze and compare different latent spaces and obtain an interpretation of their influence on expressive speech. This will enable the possibility to build controllable speech synthesis systems with an understandable behaviour.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
eess.AS 2019-07 unverdicted novelty 3.0

A methodology is proposed for emotional text-to-speech using emotional data collection, transfer-learning-based annotation of expressiveness features, and fine-tuning of a neutral TTS model.