Automatic tagging using deep convolutional neural networks
read the original abstract
We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems
Pretrained audio models show large performance gaps between standard MIR tasks and music recommendation in both hot and cold-start settings.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.