Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Dong Yu; Jesper Jensen; Morten Kolb{\ae}k; Zheng-Hua Tan

arxiv: 1607.00325 · v2 · pith:OFIEODV3new · submitted 2016-07-01 · 💻 cs.CL · cs.LG· cs.SD· eess.AS

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Dong Yu , Morten Kolb{\ae}k , Zheng-Hua Tan , Jesper Jensen This is my paper

classification 💻 cs.CL cs.LGcs.SDeess.AS

keywords problemseparationspeechdeeppermutationclusteringcocktail-partyinvariant

0 comments

read the original abstract

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from most of the prior arts that treat speech separation as a multi-class regression problem and the deep clustering technique that considers it a segmentation (or clustering) problem, our model optimizes for the separation regression error, ignoring the order of mixing sources. This strategy cleverly solves the long-lasting label permutation problem that has prevented progress on deep learning based techniques for speech separation. Experiments on the equal-energy mixing setup of a Danish corpus confirms the effectiveness of PIT. We believe improvements built upon PIT can eventually solve the cocktail-party problem and enable real-world adoption of, e.g., automatic meeting transcription and multi-party human-computer interaction, where overlapping speech is common.

This paper has not been read by Pith yet.

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

discussion (0)