Spatialreasoner: To- wards explicit and generalizable 3d spatial reasoning

Wufei Ma, Yu-Cheng Chou, Qihao Liu, Xingrui Wang, Celso de Melo, Jianwen Xie, Alan Yuille · 2025 · arXiv 2504.20024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

VLMs fail to ground numerical values in spatial perception on new bidirectional tasks, relying on shallow cues instead of coordinate-aware representations.

SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

SpatiO uses heterogeneous vision-language agents with test-time orchestration to dynamically weight their contributions for improved spatial reasoning on benchmarks like 3DSRBench and CV-Bench.

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

cs.CV · 2026-02-23 · unverdicted · novelty 6.0

Pose-VLA uses a decoupled two-stage pre-training with discrete pose tokens to extract universal 3D spatial priors from 3D datasets and robotic trajectories, achieving 79.5% success on RoboTwin 2.0 and 96.0% on LIBERO.

PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

cs.CV · 2026-04-24 · unverdicted · novelty 5.0

PASR performs pose-aware analysis-by-synthesis by aligning 3D projections with DINOv3 patch features, outperforming prior methods on clean and occluded retrieval while also handling pose estimation and classification.

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

cs.CV · 2026-04-06

citing papers explorer

Showing 6 of 6 citing papers.

SPACENUM: Revisiting Spatial Numerical Understanding in VLMs cs.AI · 2026-05-22 · unverdicted · none · ref 20
VLMs fail to ground numerical values in spatial perception on new bidirectional tasks, relying on shallow cues instead of coordinate-aware representations.
SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning cs.CV · 2026-04-23 · unverdicted · none · ref 23
SpatiO uses heterogeneous vision-language agents with test-time orchestration to dynamically weight their contributions for improved spatial reasoning on benchmarks like 3DSRBench and CV-Bench.
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies cs.CV · 2026-02-23 · unverdicted · none · ref 28
Pose-VLA uses a decoupled two-stage pre-training with discrete pose tokens to extract universal 3D spatial priors from 3D datasets and robotic trajectories, achieving 79.5% success on RoboTwin 2.0 and 96.0% on LIBERO.
PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views cs.CV · 2026-04-24 · unverdicted · none · ref 22
PASR performs pose-aware analysis-by-synthesis by aligning 3D projections with DINOv3 patch features, outperforming prior methods on clean and occluded retrieval while also handling pose estimation and classification.
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments cs.CV · 2026-04-20 · unverdicted · none · ref 65
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models cs.CV · 2026-04-06 · unreviewed · ref 94

Spatialreasoner: To- wards explicit and generalizable 3d spatial reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer