pith. machine review for the scientific record. sign in

arxiv: 1803.08842 · v1 · submitted 2018-03-23 · 💻 cs.CV

Recognition: unknown

Audio-Visual Event Localization in Unconstrained Videos

Authors on Pith no claims yet
classification 💻 cs.CV
keywords audio-visuallocalizationeventcross-modalitymodalitiesattentioncorrelationsdmrn
0
0 comments X
read the original abstract

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event(AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality localization. We develop an audio-guided visual attention mechanism to explore audio-visual correlations, propose a dual multimodal residual network (DMRN) to fuse information over the two modalities, and introduce an audio-visual distance learning network to handle the cross-modality localization. Our experiments support the following findings: joint modeling of auditory and visual modalities outperforms independent modeling, the learned attention can capture semantics of sounding objects, temporal alignment is important for audio-visual fusion, the proposed DMRN is effective in fusing audio-visual features, and strong correlations between the two modalities enable cross-modality localization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models

    cs.IR 2026-04 unverdicted novelty 7.0

    MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.