Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.
Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computervision, pages 216–233
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Video generation models demonstrate competitive multimodal reasoning on a new benchmark, matching or exceeding VLMs on visual puzzles and achieving 92% on MATH and 69.2% on MMMU.
citing papers explorer
-
Steering the Verifiability of Multimodal AI Hallucinations
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Video generation models demonstrate competitive multimodal reasoning on a new benchmark, matching or exceeding VLMs on visual puzzles and achieving 92% on MATH and 69.2% on MMMU.