pith. sign in

arxiv: 1810.04273 · v1 · pith:PAYSFG56new · submitted 2018-10-01 · 📡 eess.AS · cs.SD

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

classification 📡 eess.AS cs.SD
keywords differentclassificationacousticchallengeconvolutionalfeaturesneuralscene
0
0 comments X
read the original abstract

In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task1a).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

    eess.AS 2019-07 unverdicted novelty 2.0

    Fusion of VGG-like 2D CNN, Light-CNN, and x-vector 1D CNN with self-attention pooling on 256-dim log Mel-spectrograms, trained on 4-fold splits and combined with multiple fusion strategies for DCASE2019 Task 1.