Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Dejing Dou; Di Hu; Errui Ding; Minyue Jiang; Rui Qian; Shilei Wen; Weiyao Lin; Xiao Tan

arxiv: 2010.05466 · v1 · pith:ZELO776Dnew · submitted 2020-10-12 · 💻 cs.CV · cs.LG· cs.MM· cs.SD· eess.AS

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Di Hu , Rui Qian , Minyue Jiang , Xiao Tan , Shilei Wen , Errui Ding , Weiyao Lin , Dejing Dou This is my paper

classification 💻 cs.CV cs.LGcs.MMcs.SDeess.AS

keywords objectobjectssoundinglocalizationcocktail-partyself-supervisedaudiovisualclass-aware

0 comments

read the original abstract

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.

This paper has not been read by Pith yet.

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

discussion (0)