pith. sign in

arxiv: 1711.11565 · v3 · pith:NC26IND4new · submitted 2017-11-30 · 💻 cs.SD · cs.AI· cs.MM· cs.RO· eess.AS

Deep Neural Networks for Multiple Speaker Detection and Localization

classification 💻 cs.SD cs.AIcs.MMcs.ROeess.AS
keywords localizationdetectionneuralsoundmethodsmultiplesourcesdifferent
0
0 comments X
read the original abstract

We propose to use neural networks for simultaneous detection and localization of multiple sound sources in human-robot interaction. In contrast to conventional signal processing techniques, neural network-based sound source localization methods require fewer strong assumptions about the environment. Previous neural network-based methods have been focusing on localizing a single sound source, which do not extend to multiple sources in terms of detection and localization. In this paper, we thus propose a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources. In addition, we investigate the use of sub-band cross-correlation information as features for better localization in sound mixtures, as well as three different network architectures based on different motivations. Experiments on real data recorded from a robot show that our proposed methods significantly outperform the popular spatial spectrum-based approaches.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.