Do vision-language models represent space and how? eval- uating spatial frame of reference under ambiguities

· 2024 · arXiv 2410.17385

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

ProSR adds a Counterfactual Invariance Penalty and a Tail Drift Penalty to shape VLM reasoning trajectories for better visual dependence and stability on spatial tasks.

When Do Diffusion Models learn to Generate Multiple Objects?

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.

OmniView-Space: Reinforcing Spatial Reasoning via Multi-Perspective Spatial Mapping

cs.CV · 2026-07-01 · unverdicted · novelty 5.0

OmniView-Space framework with MPSM, tool-guided reasoning, and distillation achieves SOTA on spatial reasoning benchmarks for MLLMs while reducing external geometry dependencies.

IntentNav: Learning Spatial-Visual Object Navigation from Human Demonstrations

cs.RO · 2026-06-06 · unverdicted · novelty 5.0

IntentNav is a spatial-visual imitation framework that infers human search intent via frontier labeling to train VLM policies for object navigation, reporting SOTA on MP3D and HM3D benchmarks with zero-shot transfer to wheeled, quadruped, and humanoid robots.

AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

cs.RO · 2025-03-10 · unverdicted · novelty 5.0

AutoSpatial improves VLM spatial reasoning for social navigation by combining minimal manual supervision with auto-labeled VQA pairs and hierarchical training, showing gains up to 20.5% in action prediction over baselines.

citing papers explorer

Showing 5 of 5 citing papers after filters.

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs cs.CV · 2026-05-25 · unverdicted · none · ref 5
ProSR adds a Counterfactual Invariance Penalty and a Tail Drift Penalty to shape VLM reasoning trajectories for better visual dependence and stability on spatial tasks.
When Do Diffusion Models learn to Generate Multiple Objects? cs.CV · 2026-04-30 · unverdicted · none · ref 30
Using the mosaic controlled dataset framework, experiments show scene complexity dominates over concept imbalance in diffusion model failures for multi-object generation, with counting especially hard in low-data regimes and compositional generalization collapsing under held-out combinations.
OmniView-Space: Reinforcing Spatial Reasoning via Multi-Perspective Spatial Mapping cs.CV · 2026-07-01 · unverdicted · none · ref 18
OmniView-Space framework with MPSM, tool-guided reasoning, and distillation achieves SOTA on spatial reasoning benchmarks for MLLMs while reducing external geometry dependencies.
IntentNav: Learning Spatial-Visual Object Navigation from Human Demonstrations cs.RO · 2026-06-06 · unverdicted · none · ref 10
IntentNav is a spatial-visual imitation framework that infers human search intent via frontier labeling to train VLM policies for object navigation, reporting SOTA on MP3D and HM3D benchmarks with zero-shot transfer to wheeled, quadruped, and humanoid robots.
AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning cs.RO · 2025-03-10 · unverdicted · none · ref 7
AutoSpatial improves VLM spatial reasoning for social navigation by combining minimal manual supervision with auto-labeled VQA pairs and hierarchical training, showing gains up to 20.5% in action prediction over baselines.

Do vision-language models represent space and how? eval- uating spatial frame of reference under ambiguities

fields

years

verdicts

representative citing papers

citing papers explorer