Reformulating 53 visual reasoning tasks in polar coordinates causes frontier MLLMs to drop from 70-83% to 31-39% accuracy while preserving logical equivalence, revealing a Cartesian shortcut in current benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MLLMs ignore dial state geometry and cluster by appearance, causing inconsistency under variations; TriSCA's state-distance alignment, metadata supervision, and objective alignment improve robustness on clock and gauge benchmarks.
citing papers explorer
-
The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space
Reformulating 53 visual reasoning tasks in polar coordinates causes frontier MLLMs to drop from 70-83% to 31-39% accuracy while preserving logical equivalence, revealing a Cartesian shortcut in current benchmarks.
-
State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading
MLLMs ignore dial state geometry and cluster by appearance, causing inconsistency under variations; TriSCA's state-distance alignment, metadata supervision, and objective alignment improve robustness on clock and gauge benchmarks.