pith. sign in

arxiv: 1606.00298 · v1 · pith:YHA5JIPTnew · submitted 2016-06-01 · 💻 cs.SD · cs.LG

Automatic tagging using deep convolutional neural networks

classification 💻 cs.SD cs.LG
keywords architecturesautomaticconvolutionaldatasetlayerstaggingarchitecturedifferent
0
0 comments X
read the original abstract

We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

    cs.IR 2026-04 unverdicted novelty 5.0

    Pretrained audio models show large performance gaps between standard MIR tasks and music recommendation in both hot and cold-start settings.