Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

· 2026 · cs.CL · arXiv 2601.06600

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns-experimental errors, logical fallacies, and fabricated claims-each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.

representative citing papers

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.

citing papers explorer

Showing 1 of 1 citing paper after filters.

When Seeing Is Not Believing -- A Benchmark for Search-Grounded Video Misinformation Detection cs.CV · 2026-06-02 · unverdicted · none · ref 27 · internal anchor
EVID-Bench supplies 222 videos across nine manipulation types in three categories and shows that frontier multimodal models reach at most 61.43% point-level accuracy when forced to use web search to identify false information.

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation

fields

years

verdicts

representative citing papers

citing papers explorer