pith. sign in

arxiv: 2309.02106 · v1 · pith:2KTCCGJTnew · submitted 2023-09-05 · 💻 cs.CL · cs.AI· cs.LG· eess.AS

Leveraging Label Information for Multimodal Emotion Recognition

classification 💻 cs.CL cs.AIcs.LGeess.AS
keywords emotioninformationlabelspeechtextapproachfinallyleveraging
0
0 comments X
read the original abstract

Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Recent Advances in Multimodal Affective Computing: An NLP Perspective

    cs.CL 2024-09 unverdicted novelty 3.0

    Survey organizing multimodal affective computing research around four NLP tasks, method paradigms, datasets, evaluation protocols, and future directions while releasing a resource repository.