EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation

Hsiao-Tzu Hung; Joann Ching; Juhan Nam; Nabin Kim; Seungheon Doh; Yi-Hsuan Yang

arxiv: 2108.01374 · v1 · pith:HMXF7PUKnew · submitted 2021-08-03 · 💻 cs.SD · cs.MM· eess.AS

EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation

Hsiao-Tzu Hung , Joann Ching , Seungheon Doh , Nabin Kim , Juhan Nam , Yi-Hsuan Yang This is my paper

classification 💻 cs.SD cs.MMeess.AS

keywords musicemotiondatasetemopiagenerationpianousedanalysis

0 comments

read the original abstract

While there are many music datasets with emotion labels in the literature, they cannot be used for research on symbolic-domain music analysis or generation, as there are usually audio files only. In this paper, we present the EMOPIA (pronounced `yee-m\`{o}-pi-uh') dataset, a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip-level emotion labels annotated by four dedicated annotators. Since the clips are not restricted to one clip per song, they can also be used for song-level analysis. We present the methodology for building the dataset, covering the song list curation, clip selection, and emotion annotation processes. Moreover, we prototype use cases on clip-level music emotion classification and emotion-based symbolic music generation by training and evaluating corresponding models using the dataset. The result demonstrates the potential of EMOPIA for being used in future exploration on piano emotion-related MIR tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Exploring How Audio Effects Alter Emotion with Foundation Models
cs.SD 2025-09 unverdicted novelty 5.0

Foundation models are used to examine nonlinear links between audio effects and estimated emotions via embedding probing methods.
A Hybrid Framework for Song Lyric Annotation Based on Human-LLM Alignment
cs.CL 2026-06 unverdicted novelty 4.0

Introduces a new lyrics dataset and hybrid human-LLM framework for emotion annotation that predicts misalignment to improve efficiency.