pith. sign in

arxiv: 1802.00300 · v1 · pith:TKD56NG6new · submitted 2018-02-01 · 💻 cs.SD · eess.AS

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

classification 💻 cs.SD eess.AS
keywords monauralseparationsingingvoicearchitecturedeeplearningmasker-denoiser
0
0 comments X
read the original abstract

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel deep learning based method that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.