Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks

Arun Balaji Buduru; Orchid Chetia Phukan; Rajesh Sharma

arxiv: 2305.18640 · v1 · pith:XH43SOJMnew · submitted 2023-05-29 · 📡 eess.AS

Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks

Orchid Chetia Phukan , Arun Balaji Buduru , Rajesh Sharma This is my paper

classification 📡 eess.AS

keywords embeddingsrecognitionspeakerspeechapproachemotionfeaturesinput

0 comments

read the original abstract

Speech emotion recognition (SER) is a field that has drawn a lot of attention due to its applications in diverse fields. A current trend in methods used for SER is to leverage embeddings from pre-trained models (PTMs) as input features to downstream models. However, the use of embeddings from speaker recognition PTMs hasn't garnered much focus in comparison to other PTM embeddings. To fill this gap and in order to understand the efficacy of speaker recognition PTM embeddings, we perform a comparative analysis of five PTM embeddings. Among all, x-vector embeddings performed the best possibly due to its training for speaker recognition leading to capturing various components of speech such as tone, pitch, etc. Our modeling approach which utilizes x-vector embeddings and mel-frequency cepstral coefficients (MFCC) as input features is the most lightweight approach while achieving comparable accuracy to previous state-of-the-art (SOTA) methods in the CREMA-D benchmark.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Impact Analysis of Speech Representation Learning Models for Acoustic Side-Channel Attack
cs.CR 2026-06 unverdicted novelty 5.0

KEYAC dataset created; KAN fine-tuning achieves SOTA on acoustic side-channel keystroke recognition from speech representations under zero-shot and partial fine-tuning.
Impact Analysis of Speech Representation Learning Models for Acoustic Side-Channel Attack
cs.CR 2026-06 unverdicted novelty 5.0

KEYAC dataset benchmarks speech models for keyboard acoustic side-channel attacks, with KAN fine-tuning setting new SOTA by addressing nonlinear feature interactions.