Spectrogram Feature Losses for Music Source Separation

Abhimanyu Sahai; Brian McWilliams; Romann Weber

arxiv: 1901.05061 · v3 · pith:EEEFKE6Mnew · submitted 2019-01-15 · 💻 cs.SD · cs.LG· eess.AS· stat.ML

Spectrogram Feature Losses for Music Source Separation

Abhimanyu Sahai , Romann Weber , Brian McWilliams This is my paper

classification 💻 cs.SD cs.LGeess.ASstat.ML

keywords lossmusicseparationdeepfeaturelearning-basedmodelpixel-level

0 comments

read the original abstract

In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.

This paper has not been read by Pith yet.

Spectrogram Feature Losses for Music Source Separation

discussion (0)