Vggsound: A large-scale audio-visual dataset.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 721–725, 2020

Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.

citing papers explorer

Showing 1 of 1 citing paper.

When Vision Speaks for Sound cs.CV · 2026-05-13 · unverdicted · none · ref 11
Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.

Vggsound: A large-scale audio-visual dataset.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 721–725, 2020

fields

years

verdicts

representative citing papers

citing papers explorer