pith. sign in

arxiv: 2005.11612 · v2 · pith:5FI77U6Gnew · submitted 2020-05-23 · 📡 eess.AS · cs.SD

Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation

classification 📡 eess.AS cs.SD
keywords methodsseparationmulti-channelspeechnetworkperformanceinformationintegrate
0
0 comments X
read the original abstract

Although deep-learning-based methods have markedly improved the performance of speech separation over the past few years, it remains an open question how to integrate multi-channel signals for speech separation. We propose two methods, namely, early-fusion and late-fusion methods, to integrate multi-channel information based on the time-domain audio separation network, which has been proven effective in single-channel speech separation. We also propose channel-sequential-transfer learning, which is a transfer learning framework that applies the parameters trained for a lower-channel network as the initial values of a higher-channel network. For fair comparison, we evaluated our proposed methods using a spatialized version of the wsj0-2mix dataset, which is open-sourced. It was found that our proposed methods can outperform multi-channel deep clustering and improve the performance proportionally to the number of microphones. It was also proven that the performance of the late-fusion method is consistently higher than that of the single-channel method regardless of the angle difference between speakers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.