TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.
Vlm-robustbench: A comprehensive benchmark for robustness of vision-language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.
citing papers explorer
-
TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos
TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.
-
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.