SpatialBench evaluates 41 spatial foundation models across 6 paradigms and 5 task suites, finds they are not all-round players, and introduces the DA-Next-5M dataset plus DA-Next baseline model.
Masked depth modeling for spatial perception
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
dataset 1polarities
use dataset 1representative citing papers
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.
ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.
GARD performs diffusion-based multi-view restoration in the feature space of a feed-forward 3D reconstructor to recover scene geometry and RGB images under degraded conditions, shown effective on the DA3 benchmark.
citing papers explorer
-
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
SpatialBench evaluates 41 spatial foundation models across 6 paradigms and 5 task suites, finds they are not all-round players, and introduces the DA-Next-5M dataset plus DA-Next baseline model.
-
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction
Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.
-
Towards Consistent Video Geometry Estimation
ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.
-
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction
GARD performs diffusion-based multi-view restoration in the feature space of a feed-forward 3D reconstructor to recover scene geometry and RGB images under degraded conditions, shown effective on the DA3 benchmark.