pith. sign in

arxiv: 2303.05678 · v1 · pith:IYHHKIUYnew · submitted 2023-03-10 · 💻 cs.SD · cs.LG· eess.AS

Improving Weakly Supervised Sound Event Detection with Causal Intervention

classification 💻 cs.SD cs.LGeess.AS
keywords causaleventsoundclip-levelco-occurrenceconfounderscontextdetection
0
0 comments X
read the original abstract

Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i.e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision. To tackle this issue, we first establish a structural causal model (SCM) to reveal that the context is the main cause of co-occurrence confounders that mislead the model to learn spurious correlations between frames and clip-level labels. Based on the causal analysis, we propose a causal intervention (CI) method for WSSED to remove the negative impact of co-occurrence confounders by iteratively accumulating every possible context of each class and then re-projecting the contexts to the frame-level features for making the event boundary clearer. Experiments show that our method effectively improves the performance on multiple datasets and can generalize to various baseline models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.