pith. sign in

arxiv: 1604.07160 · v2 · pith:FWPH3ONGnew · submitted 2016-04-25 · 💻 cs.SD · cs.MM

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

classification 💻 cs.SD cs.MM
keywords acousticdatadetectioneventaudioaugmentationcontrastconvolutional
0
0 comments X
read the original abstract

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we introduce a convolutional neural network (CNN) with a large input field. In contrast to previous works, this enables to train audio event detection end-to-end. Our architecture is inspired by the success of VGGNet and uses small, 3x3 convolutions, but more depth than previous methods in AED. In order to prevent over-fitting and to take full advantage of the modeling capabilities of our network, we further propose a novel data augmentation method to introduce data variation. Experimental results show that our CNN significantly outperforms state of the art methods including Bag of Audio Words (BoAW) and classical CNNs, achieving a 16% absolute improvement.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

    cs.SD 2019-07 unverdicted novelty 4.0

    A CRNN model with frame-level attention achieves state-of-the-art accuracy on ESC-10 and ESC-50 environmental sound classification datasets.