Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Axel Berg; Jens Gulin; Johanna Engman; Karl {\AA}str\"om; Magnus Oskarsson

arxiv: 2408.17166 · v1 · pith:K2753CEPnew · submitted 2024-08-30 · 📡 eess.AS · cs.LG

Learning Multi-Target TDOA Features for Sound Event Localization and Detection

Axel Berg , Johanna Engman , Jens Gulin , Karl {\AA}str\"om , Magnus Oskarsson This is my paper

classification 📡 eess.AS cs.LG

keywords featureslocalizationsoundaudiotdoadetectioneventevents

0 comments

read the original abstract

Sound event localization and detection (SELD) systems using audio recordings from a microphone array rely on spatial cues for determining the location of sound events. As a consequence, the localization performance of such systems is to a large extent determined by the quality of the audio features that are used as inputs to the system. We propose a new feature, based on neural generalized cross-correlations with phase-transform (NGCC-PHAT), that learns audio representations suitable for localization. Using permutation invariant training for the time-difference of arrival (TDOA) estimation problem enables NGCC-PHAT to learn TDOA features for multiple overlapping sound events. These features can be used as a drop-in replacement for GCC-PHAT inputs to a SELD-network. We test our method on the STARSS23 dataset and demonstrate improved localization performance compared to using standard GCC-PHAT or SALSA-Lite input features.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SelectTSL: Prompt-Guided Selective Target Sound Localization in Complex Scenarios
cs.SD 2026-07 unverdicted novelty 6.0

SelectTSL is an end-to-end model using a Prompt-Guided Selective Attention Module and IPD enhancer to localize only prompt-specified target sounds and estimate their count and direction in complex acoustic scenes.