Utilizing Domain Knowledge in End-to-End Audio Processing

Hendrik Purwins; Jose Luis Diez Antich; Lars Maal{\o}e; Tycho Max Sylvester Tax

arxiv: 1712.00254 · v1 · pith:IKWAYGRLnew · submitted 2017-12-01 · 💻 cs.SD · eess.AS· stat.ML

Utilizing Domain Knowledge in End-to-End Audio Processing

Tycho Max Sylvester Tax , Jose Luis Diez Antich , Hendrik Purwins , Lars Maal{\o}e This is my paper

classification 💻 cs.SD eess.ASstat.ML

keywords end-to-endaudiofirstlayerslog-scaledmel-spectrogrammodelnetwork

0 comments

read the original abstract

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

This paper has not been read by Pith yet.

Utilizing Domain Knowledge in End-to-End Audio Processing

discussion (0)