When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Haopeng Jin; Hao Wang; Hongzhu Yi; Jiabing Yang; Liang Wang; Minghui Zhang; Shenghua Chai; Tao Yu; Xinlong Chen; Xinming Wang

arxiv: 2606.04098 · v1 · pith:6NCP3ATVnew · submitted 2026-06-02 · 💻 cs.CV

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

Tao Yu , Yujia Yang , Shenghua Chai , Zhang Jinshuai , Haopeng Jin , Hao Wang , Minghui Zhang , Zhongtian Luo

show 12 more authors

Yuchen Long Xinlong Chen Jiabing Yang Zhaolu Kang Yuxuan Zhou Zhengyu Man Xinming Wang Hongzhu Yi Zheqi He Xi Yang Yan Huang Liang Wang

This is my paper

classification 💻 cs.CV

keywords videomisinformationmodelsaccuracyacrossai-generatedalonebenchmark

0 comments

read the original abstract

Video misinformation increasingly operates at the semantic and evidential level: authentic footage may be selectively edited, temporally reordered, spliced across sources, or augmented with AI-generated content to construct false narratives. Such evidence-dependent manipulations cannot be reliably verified from the input video alone, because the missing, reordered, replaced, or recontextualized evidence lies outside the video itself. We introduce \textbf{EVID-Bench}, a benchmark for search-grounded video misinformation detection, where a system must search the open web for related videos and identify what information is false through cross-video comparison. EVID-Bench comprises 222 videos spanning 9 manipulation types across 3 categories: AI generation, single-source editing, and multi-source editing. All samples are verified to be undetectable by frontier models through visual inspection alone. We evaluate nine frontier multimodal models using a retrieval-augmented verification baseline. The best system achieves only 61.43\% point-level accuracy and 43.24\% video-level accuracy, while AI-generated manipulations remain especially challenging. Error analysis reveals recurring challenges: models fixate on irrelevant anchors, misattribute synthetic content to editorial splicing, and terminate search prematurely before fully explaining the manipulation.

This paper has not been read by Pith yet.

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

discussion (0)