A combination of illusion-specific image transformations, anti-illusion prompts, and majority voting lets VLMs reach 90.48% accuracy on a 630-image illusion benchmark without any model training.
Do vlms perceive or recall? prob- ing visual perception vs
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
Degradation-Driven Prompting improves VQA by intentionally reducing image detail and using masks, lines, and examples to guide models toward essential structures.
SQI uses axiomatic constraints, hierarchical decomposition, and counterfactual verification to align linguistic reasoning with visual perception in frozen VLMs, achieving second place on the DataCV 2026 illusion challenge.
citing papers explorer
-
Illusion-Aware Visual Preprocessing and Anti-Illusion Prompting for Classic Illusion Understanding in Vision-Language Models
A combination of illusion-specific image transformations, anti-illusion prompts, and majority voting lets VLMs reach 90.48% accuracy on a 630-image illusion benchmark without any model training.
-
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
Degradation-Driven Prompting improves VQA by intentionally reducing image detail and using masks, lines, and examples to guide models toward essential structures.
-
Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning
SQI uses axiomatic constraints, hierarchical decomposition, and counterfactual verification to align linguistic reasoning with visual perception in frozen VLMs, achieving second place on the DataCV 2026 illusion challenge.