Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge
read the original abstract
In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task1a).
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge
Fusion of VGG-like 2D CNN, Light-CNN, and x-vector 1D CNN with self-attention pooling on 256-dim log Mel-spectrograms, trained on 4-fold splits and combined with multiple fusion strategies for DCASE2019 Task 1.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.