Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

Deliang Wang; Ke Tan; Zhong-Qiu Wang

arxiv: 1811.09010 · v1 · pith:WSMETPTVnew · submitted 2018-11-22 · 💻 cs.SD · cs.CL· eess.AS

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

Zhong-Qiu Wang , Ke Tan , Deliang Wang This is my paper

classification 💻 cs.SD cs.CLeess.AS

keywords phasereconstructiondeeplearningmixtureseparationsourcespeaker

0 comments

read the original abstract

This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture of two sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency (T-F) unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3mix corpus.

This paper has not been read by Pith yet.

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

discussion (0)