An empirical analysis on spatial reasoning capabilities of large multimodal models

An empirical analysis on spatial reasoning capabilities of large multimodal models · 2025 · arXiv 2411.06048

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

SATURN: Symbolic Spatial Reasoning for Multi-Perspective Grounding

cs.CV · 2026-06-21 · unverdicted · novelty 7.0

SATURN reconstructs approximate 3D scenes, derives soft perspective-aware predicates, and executes them symbolically to achieve stable performance on complex multi-perspective spatial grounding tasks where VLMs degrade.

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

cs.CV · 2026-05-22 · unverdicted · novelty 7.0 · 2 refs

DriveSpatial benchmark shows the strongest of 15 VLMs trails humans by 28.4 points on spatiotemporal tasks, with cognitive scene construction as the primary weakness.

Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs

cs.DB · 2026-06-04 · unverdicted · novelty 6.0

Introduces CausalPhys benchmark with causal graphs and CRFT fine-tuning to improve VLMs' causal physical reasoning accuracy and interpretability.

Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

cs.CV · 2026-01-29 · unverdicted · novelty 6.0

VLMs reach only 0.66 accuracy on relative camera pose estimation while humans achieve 0.91 and specialized pipelines reach 0.99, exposing weaknesses in multi-view spatial reasoning.

citing papers explorer

Showing 4 of 4 citing papers after filters.

SATURN: Symbolic Spatial Reasoning for Multi-Perspective Grounding cs.CV · 2026-06-21 · unverdicted · none · ref 5
SATURN reconstructs approximate 3D scenes, derives soft perspective-aware predicates, and executes them symbolically to achieve stable performance on complex multi-perspective spatial grounding tasks where VLMs degrade.
DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving cs.CV · 2026-05-22 · unverdicted · none · ref 50 · 2 links
DriveSpatial benchmark shows the strongest of 15 VLMs trails humans by 28.4 points on spatiotemporal tasks, with cognitive scene construction as the primary weakness.
Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs cs.DB · 2026-06-04 · unverdicted · none · ref 51
Introduces CausalPhys benchmark with causal graphs and CRFT fine-tuning to improve VLMs' causal physical reasoning accuracy and interpretability.
Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation cs.CV · 2026-01-29 · unverdicted · none · ref 5
VLMs reach only 0.66 accuracy on relative camera pose estimation while humans achieve 0.91 and specialized pipelines reach 0.99, exposing weaknesses in multi-view spatial reasoning.

An empirical analysis on spatial reasoning capabilities of large multimodal models

fields

years

verdicts

representative citing papers

citing papers explorer