Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Boqing Zhu; Changjian Wang; Feng Liu; Jin Lei; Yuxing Peng; Zengquan Lu

arxiv: 1803.10219 · v1 · pith:JFYNSVQNnew · submitted 2018-03-25 · 💻 cs.SD · eess.AS

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Boqing Zhu , Changjian Wang , Feng Liu , Jin Lei , Zengquan Lu , Yuxing Peng This is my paper

classification 💻 cs.SD eess.AS

keywords featureslearningmulti-scalesoundsclassificationconvolutionconvolutionalenvironmental

0 comments

read the original abstract

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional layers to extract features. The features extracted by single size filters are insufficient for building discriminative representation of audios. In this paper, we propose multi-scale convolution operation, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area. For leveraging the waveform-based features and spectrogram-based features in a single model, we introduce two-phase method to fuse the different features. Finally, we propose a novel end-to-end network called WaveMsNet based on the multi-scale convolution operation and two-phase method. On the environmental sounds classification datasets ESC-10 and ESC-50, the classification accuracies of our WaveMsNet achieve 93.75% and 79.10% respectively, which improve significantly from the previous methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
cs.SD 2019-07 unverdicted novelty 4.0

A CRNN model with frame-level attention achieves state-of-the-art accuracy on ESC-10 and ESC-50 environmental sound classification datasets.