A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.
OWSM-CTC: An open encoder-only speech foundation model for speech recognition, translation, and language identification,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
A single multi-speaker encoder jointly optimizes diarization, separation, and ASR, outperforming single-task baselines on LibriMix with diarization error rates of 1.37% and 2.29%.