SpatiaLab benchmark shows state-of-the-art VLMs achieve 54.93% accuracy on multiple-choice spatial reasoning in real scenes versus 87.57% for humans.
A proper redesign is possible but requires removing redundant lines, separating circuits (high-voltage vs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?
SpatiaLab benchmark shows state-of-the-art VLMs achieve 54.93% accuracy on multiple-choice spatial reasoning in real scenes versus 87.57% for humans.