A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.
Empowering whisper as a joint multi- talker and target-talker speech recognition system,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.