hub Canonical reference

Has gpt-5 achieved spatial intelligence? an empirical study

Cai, Z · 2025 · arXiv 2508.13142

Canonical reference. 75% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 75% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 baseline 1 method 1

citation-polarity summary

background 6 baseline 1 use method 1

representative citing papers

MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

MindEdit-Bench introduces six spatial reasoning tasks from 120 private indoor photo triplets, with two new counterfactual editing tasks where VLMs score 8-31% against 81-97% human accuracy.

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.

Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

GAMSI is a dual-pathway Geometry-Aware MLLM using Metric-Structure Decoupled Queries and Expert-Guided Visual Grounding on RGB inputs alone, trained on a new 152k-sample MTS dataset to reach SOTA on seven spatial benchmarks.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

cs.CV · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.

Exploring Spatial Intelligence from a Generative Perspective

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

cs.CV · 2025-05-22 · conditional · novelty 7.0

Presents SpatialScore benchmark for MLLM spatial reasoning, evaluates 49 models showing large human gap, and supplies SpatialCorpus plus SpatialAgent to improve performance.

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

cs.AI · 2026-06-10 · unverdicted · novelty 6.0

A new consistency-verifier RL framework with OT-GRPO raises spatial reasoning accuracy in LRMs to near supervised levels using only internal geometric and semantic checks.

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

GeoWeaver performs token-adaptive geometric grounding on visual tokens from a multi-level bank prior to language modeling to support better spatio-temporal reasoning.

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

SpatialImaginer integrates visual imagination with textual chain-of-thought to improve spatial reasoning robustness in multimodal large language models.

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

cs.IR · 2026-01-31

citing papers explorer

Showing 14 of 14 citing papers.

MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos cs.CV · 2026-07-01 · unverdicted · none · ref 34
MindEdit-Bench introduces six spatial reasoning tasks from 120 private indoor photo triplets, with two new counterfactual editing tasks where VLMs score 8-31% against 81-97% human accuracy.
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks cs.AI · 2026-06-08 · unverdicted · none · ref 8
SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.
Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence cs.CV · 2026-05-25 · unverdicted · none · ref 44
GAMSI is a dual-pathway Geometry-Aware MLLM using Metric-Structure Decoupled Queries and Expert-Guided Visual Grounding on RGB inputs alone, trained on a new 152k-sample MTS dataset to reach SOTA on seven spatial benchmarks.
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 7
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation cs.CV · 2026-05-21 · unverdicted · none · ref 84 · 2 links
SpaceDG is the first large-scale benchmark dataset (~1M QA pairs) simulating nine visual degradations in 3DGS-rendered scenes to measure and improve spatial intelligence robustness in MLLMs.
Exploring Spatial Intelligence from a Generative Perspective cs.CV · 2026-04-22 · unverdicted · none · ref 2
Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.
SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence cs.CV · 2025-05-22 · conditional · none · ref 12
Presents SpatialScore benchmark for MLLM spatial reasoning, evaluates 49 models showing large human gap, and supplies SpatialCorpus plus SpatialAgent to improve performance.
The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning cs.AI · 2026-06-10 · unverdicted · none · ref 6
A new consistency-verifier RL framework with OT-GRPO raises spatial reasoning accuracy in LRMs to near supervised levels using only internal geometric and semantic checks.
GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning cs.CV · 2026-05-21 · unverdicted · none · ref 5
GeoWeaver performs token-adaptive geometric grounding on visual tokens from a multi-level bank prior to language modeling to support better spatio-temporal reasoning.
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment cs.CV · 2026-05-08 · unverdicted · none · ref 4
Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 9 · 2 links
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026-05-12 · unverdicted · none · ref 9
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning cs.CV · 2026-04-19 · unverdicted · none · ref 6
SpatialImaginer integrates visual imagination with textual chain-of-thought to improve spatial reasoning robustness in multimodal large language models.
MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval cs.IR · 2026-01-31 · unreviewed · ref 49

Has gpt-5 achieved spatial intelligence? an empirical study

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer