A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding,

· 2021 · arXiv 2111.02735

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment

cs.SD · 2025-06-17 · unverdicted · novelty 6.0

Acoustic scattering signals fed into fine-tuned self-supervised deep learning models classify hair type and moisture at nearly 90% accuracy as a non-invasive alternative to visual methods.

Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition

eess.AS · 2025-09-10 · unverdicted · novelty 5.0

Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.

EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection

cs.AI · 2026-05-19 · unverdicted · novelty 4.0

Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.

citing papers explorer

Showing 3 of 3 citing papers.

Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment cs.SD · 2025-06-17 · unverdicted · none · ref 40
Acoustic scattering signals fed into fine-tuned self-supervised deep learning models classify hair type and moisture at nearly 90% accuracy as a non-invasive alternative to visual methods.
Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition eess.AS · 2025-09-10 · unverdicted · none · ref 55
Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.
EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection cs.AI · 2026-05-19 · unverdicted · none · ref 40
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding,

fields

years

verdicts

representative citing papers

citing papers explorer