EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.
MERIT: Modular Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We present MERIT, an inference-time modular framework for multimodal misinformation detection that decomposes verification into four specialized modules: visual forensics, cross-modal alignment, retrieval-augmented claim verification, and calibrated judgment. On MMFakeBench, MERIT with GPT-4o-mini achieves 81.65% F1, outperforming all reported zero-shot baselines including GPT-4V with MMD-Agent (74.0% F1). A controlled same-model evaluation confirms gains stem from architectural design: MERIT achieves 6.14 points higher misinformation recall than MMD-Agent under identical model conditions, with per-class gains of +18.0 on visual distortion and +5.33 on textual distortion. Ablation studies reveal non-overlapping module specialization, where removing any module disproportionately degrades its target category while leaving others intact. Test set evaluation on 5,000 samples confirms generalization within 0.21 F1 points of validation results. The framework operates with any instruction-following vision-language model and produces citation-linked rationales for human review.
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection
EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.