CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Alexis Moinet; Ammar Abbas; Arent van Korlaar; Mateusz Lajszczak; Penny Karanasou; Peter Makarov; Ray Li; Simon Slangen; Sri Karlapati; Thomas Drugman

arxiv: 2206.13443 · v1 · pith:NZPC6QIOnew · submitted 2022-06-27 · 📡 eess.AS · cs.SD

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Sri Karlapati , Penny Karanasou , Mateusz Lajszczak , Ammar Abbas , Alexis Moinet , Peter Makarov , Ray Li , Arent van Korlaar

show 2 more authors

Simon Slangen Thomas Drugman

This is my paper

classification 📡 eess.AS cs.SD

keywords prosodyfine-grainedmodelspeechtransferappropriatecontextuallycopycat2

0 comments

read the original abstract

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers. We do this by activating distinct parts of the network for different tasks. We train our model using a novel approach to two-stage training. In Stage I, the model learns speaker-independent word-level prosody representations from speech which it uses for many-to-many fine-grained prosody transfer. In Stage II, we learn to predict these prosody representations using the contextual information available in text, thereby, enabling multi-speaker TTS with contextually appropriate prosody. We compare CC2 to two strong baselines, one in TTS with contextually appropriate prosody, and one in fine-grained prosody transfer. CC2 reduces the gap in naturalness between our baseline and copy-synthesised speech by $22.79\%$. In fine-grained prosody transfer evaluations, it obtains a relative improvement of $33.15\%$ in target speaker similarity.

This paper has not been read by Pith yet.

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

discussion (0)