Sound Event Detection with Boundary-Aware Optimization and Inference
read the original abstract
Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Executable Boundary Contracts for Sound Event Traces
Defines executable boundary contracts for sound event traces using an STL-embeddable Boolean fragment plus interval and duration clauses, then evaluates them on speech and soundscape data where they disagree with stan...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.