A neural model predicts a set of speaker embeddings from noisy mixtures to enable enrollment-free target speech extraction, outperforming baselines on LibriMix and generalizing to real recordings.
Icassp 2023 deep noise suppression challenge
2 Pith papers cite this work. Polarity classification is still indexing.
fields
eess.AS 2verdicts
UNVERDICTED 2representative citing papers
Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.
citing papers explorer
-
Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction
A neural model predicts a set of speaker embeddings from noisy mixtures to enable enrollment-free target speech extraction, outperforming baselines on LibriMix and generalizing to real recordings.
-
Joint Learning using Mixture-of-Expert-Based Representation for Speech Enhancement and Robust Emotion Recognition
Sparse MERIT uses frame-wise sparse mixture-of-experts with task-specific gating on self-supervised speech features to jointly optimize enhancement and emotion recognition, reporting gains over baselines on MSP-Podcast at low SNR.