Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Gerald Schuller; Jo\~ao F. Santos; Konstantinos Drossos; Stylianos Ioannis Mimilakis; Tuomas Virtanen; Yoshua Bengio

arxiv: 1711.01437 · v2 · pith:FL2ENFHTnew · submitted 2017-11-04 · 💻 cs.SD · eess.AS

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

Stylianos Ioannis Mimilakis , Konstantinos Drossos , Jo\~ao F. Santos , Gerald Schuller , Tuomas Virtanen , Yoshua Bengio This is my paper

classification 💻 cs.SD eess.AS

keywords maskseparationsingingstepvoicedeepinferencelearning

0 comments

read the original abstract

Singing voice separation based on deep learning relies on the usage of time-frequency masking. In many cases the masking process is not a learnable function or is not encapsulated into the deep learning optimization. Consequently, most of the existing methods rely on a post processing step using the generalized Wiener filtering. This work proposes a method that learns and optimizes (during training) a source-dependent mask and does not need the aforementioned post processing step. We introduce a recurrent inference algorithm, a sparse transformation step to improve the mask generation process, and a learned denoising filter. Obtained results show an increase of 0.49 dB for the signal to distortion ratio and 0.30 dB for the signal to interference ratio, compared to previous state-of-the-art approaches for monaural singing voice separation.

This paper has not been read by Pith yet.

Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask

discussion (0)