Internspatial: A comprehensive dataset for spatial reasoning in vision-language models

Nianchen Deng, Lixin Gu, Shenglong Ye, Yinan He, Zhe Chen, Songze Li, Haomin Wang, Xingguang Wei, Tianshuo Yang, Min Dou, et al · 2025 · arXiv 2506.18385

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 baseline 1

representative citing papers

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

cs.CV · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

SCP: Spatial Causal Prediction in Video

cs.CV · 2026-03-04 · unverdicted · novelty 7.0

SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.

Do Vision-Language Models Understand 3D Scenes or Just Catalogue Objects?

cs.CV · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

VLMs achieve 53-97% on rearrangement planning but only 6-45% on occlusion and under 7% on reflections, with failures localized to visual token compression after the vision encoder.

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

cs.AI · 2026-06-08 · unverdicted · novelty 5.0

AlloSpatial adds structured allocentric priors and a harness for tool-use and arbitration to improve spatial reasoning in foundation models, with 5-18% gains on VSI-Bench and MindCube in training-free settings and further gains after RL internalization.

citing papers explorer

Showing 6 of 6 citing papers after filters.

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation cs.CV · 2026-05-21 · unverdicted · none · ref 74 · 2 links
SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 19
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
SCP: Spatial Causal Prediction in Video cs.CV · 2026-03-04 · unverdicted · none · ref 15
SCP defines a new benchmark task for predicting spatial causal outcomes beyond direct observation and shows that 23 leading models lag far behind humans on it.
Do Vision-Language Models Understand 3D Scenes or Just Catalogue Objects? cs.CV · 2026-05-19 · unverdicted · none · ref 12 · 2 links
VLMs achieve 53-97% on rearrangement planning but only 6-45% on occlusion and under 7% on reflections, with failures localized to visual token compression after the vision encoder.
SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images cs.CV · 2026-05-12 · unverdicted · none · ref 16
SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.
AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models cs.AI · 2026-06-08 · unverdicted · none · ref 12
AlloSpatial adds structured allocentric priors and a harness for tool-use and arbitration to improve spatial reasoning in foundation models, with 5-18% gains on VSI-Bench and MindCube in training-free settings and further gains after RL internalization.

Internspatial: A comprehensive dataset for spatial reasoning in vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer