The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

Adaeze Adigwe , No\'e Tits , Kevin El Haddad , Sarah Ostadabbas , Thierry Dutoit

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIeess.AS

keywords datadatabaseemotionaldimensionefficiencyemotiongenerationmale

read the original abstract

In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data's efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
cs.CL 2023-01 unverdicted novelty 7.0

VALL-E is a neural codec language model trained on 60K hours of speech that performs zero-shot TTS, synthesizing natural speech that matches an unseen speaker's voice, emotion, and environment from a 3-second prompt.
SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization
cs.CV 2026-04 unverdicted novelty 4.0

SEDTalker uses frame-level speech emotion diarization to condition a hybrid Transformer-Mamba model for fine-grained, temporally continuous emotion control in 3D facial animation.