hub Canonical reference

Has gpt-5 achieved spatial intelligence? an empirical study

Zhongang Cai, Yubo Wang, Qingping Sun, Ruisi Wang, Chenyang Gu, Wanqi Yin, Zhiqian Lin, Zhitao Yang, Chen Wei, Xuanke Shi, et al · 2025 · arXiv 2508.13142

Canonical reference. 75% of citing Pith papers cite this work as background.

10 Pith papers citing it

Background 75% of classified citations

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 baseline 1 method 1

citation-polarity summary

background 6 baseline 1 use method 1

representative citing papers

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

SpaceDG introduces the first large-scale degradation-aware spatial reasoning dataset using 3D Gaussian Splatting synthesis, showing that visual degradations impair MLLM performance but finetuning on the data improves robustness and can exceed human levels under degradation.

Exploring Spatial Intelligence from a Generative Perspective

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.

SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

cs.CV · 2025-05-22 · conditional · novelty 7.0

Presents SpatialScore benchmark for MLLM spatial reasoning, evaluates 49 models showing large human gap, and supplies SpatialCorpus plus SpatialAgent to improve performance.

GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

GeoWeaver performs token-adaptive geometric grounding on visual tokens from a multi-level bank prior to language modeling to support better spatio-temporal reasoning.

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

SpatialImaginer integrates visual imagination with textual chain-of-thought to improve spatial reasoning robustness in multimodal large language models.

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

cs.IR · 2026-01-31

citing papers explorer

Showing 10 of 10 citing papers.

VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis cs.CV · 2026-05-21 · unverdicted · none · ref 7
VGenST-Bench is a new video benchmark for MLLM spatio-temporal reasoning built via generative synthesis, a multi-agent pipeline with human oversight, a 3x2x2 taxonomy, and hierarchical tasks separating perception from reasoning.
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation cs.CV · 2026-05-21 · unverdicted · none · ref 84
SpaceDG introduces the first large-scale degradation-aware spatial reasoning dataset using 3D Gaussian Splatting synthesis, showing that visual degradations impair MLLM performance but finetuning on the data improves robustness and can exceed human levels under degradation.
Exploring Spatial Intelligence from a Generative Perspective cs.CV · 2026-04-22 · unverdicted · none · ref 2
Fine-tuning multimodal models on a new synthetic spatial benchmark improves generative spatial compliance on real and synthetic tasks and transfers to better spatial understanding.
SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence cs.CV · 2025-05-22 · conditional · none · ref 12
Presents SpatialScore benchmark for MLLM spatial reasoning, evaluates 49 models showing large human gap, and supplies SpatialCorpus plus SpatialAgent to improve performance.
GeoWeaver: Grounding Visual Tokens with Geometric Evidence before Scene Reasoning cs.CV · 2026-05-21 · unverdicted · none · ref 5
GeoWeaver performs token-adaptive geometric grounding on visual tokens from a multi-level bank prior to language modeling to support better spatio-temporal reasoning.
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment cs.CV · 2026-05-08 · unverdicted · none · ref 4
Proxy3D generates efficient 3D proxy representations via semantic clustering from video frames and aligns them to VLMs through multi-stage training on the new SpaceSpan dataset, achieving competitive performance on 3D VQA, grounding, and spatial benchmarks with shorter sequences.
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 9 · 2 links
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026-05-12 · unverdicted · none · ref 9
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning cs.CV · 2026-04-19 · unverdicted · none · ref 6
SpatialImaginer integrates visual imagination with textual chain-of-thought to improve spatial reasoning robustness in multimodal large language models.
MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval cs.IR · 2026-01-31 · unreviewed · ref 49

Has gpt-5 achieved spatial intelligence? an empirical study

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer