Recognition: unknown
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems
read the original abstract
In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data's efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
VALL-E is a neural codec language model trained on 60K hours of speech that performs zero-shot TTS, synthesizing natural speech that matches an unseen speaker's voice, emotion, and environment from a 3-second prompt.
-
SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization
SEDTalker uses frame-level speech emotion diarization to condition a hybrid Transformer-Mamba model for fine-grained, temporally continuous emotion control in 3D facial animation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.