pith. sign in

arxiv: 1801.05504 · v2 · pith:JJTDFLWBnew · submitted 2018-01-16 · 💻 cs.SD · cs.LG· eess.AS· stat.ML

Automatic Classification of Music Genre using Masked Conditional Neural Networks

classification 💻 cs.SD cs.LGeess.ASstat.ML
keywords neuralnetworknetworksrecognitionconditionalmclnnclnnmasked
0
0 comments X
read the original abstract

Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over a window of frames to preserve the inter-frame relation, and the MCLNN enforces a systematic sparseness over the network's links that mimics a filterbank-like behavior. The masking operation induces the network to learn in frequency bands, which decreases the network susceptibility to frequency-shifts in time-frequency representations. Additionally, the mask allows an exploration of a range of feature combinations concurrently analogous to the manual handcrafting of the optimum collection of features for a recognition task. MCLNN have achieved competitive performance on the Ballroom music dataset compared to several hand-crafted attempts and outperformed models based on state-of-the-art Convolutional Neural Networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.