A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.
wav2vec 2.0: a framework for self-supervised learning of speech representations,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.