Masked depth modeling for spatial perception

Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tianxiang Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue · 2026 · arXiv 2601.17895

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

cs.CV · 2026-05-26 · unverdicted · novelty 8.0

SpatialBench evaluates 41 spatial foundation models across 6 paradigms and 5 task suites, finds they are not all-round players, and introduces the DA-Next-5M dataset plus DA-Next baseline model.

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control

cs.RO · 2026-05-21 · conditional · novelty 7.0

EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction

cs.RO · 2026-02-10 · unverdicted · novelty 6.0

Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.

Towards Consistent Video Geometry Estimation

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

cs.CV · 2026-05-25 · unverdicted · novelty 5.0

GARD performs diffusion-based multi-view restoration in the feature space of a feed-forward 3D reconstructor to recover scene geometry and RGB images under degraded conditions, shown effective on the DA3 benchmark.

citing papers explorer

Showing 6 of 6 citing papers after filters.

SpatialBench: Is Your Spatial Foundation Model an All-Round Player? cs.CV · 2026-05-26 · unverdicted · none · ref 93
SpatialBench evaluates 41 spatial foundation models across 6 paradigms and 5 task suites, finds they are not all-round players, and introduces the DA-Next-5M dataset plus DA-Next baseline model.
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control cs.RO · 2026-05-21 · conditional · none · ref 32
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
WildDet3D: Scaling Promptable 3D Detection in the Wild cs.CV · 2026-04-09 · unverdicted · none · ref 51
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction cs.RO · 2026-02-10 · unverdicted · none · ref 30
Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.
Towards Consistent Video Geometry Estimation cs.CV · 2026-05-28 · unverdicted · none · ref 58
ViGeo is a feed-forward transformer for video geometry that introduces dynamic chunking attention and a completion-based data refinement framework to achieve SOTA on depth, normals, and point map estimation.
Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction cs.CV · 2026-05-25 · unverdicted · none · ref 53
GARD performs diffusion-based multi-view restoration in the feature space of a feed-forward 3D reconstructor to recover scene geometry and RGB images under degraded conditions, shown effective on the DA3 benchmark.

Masked depth modeling for spatial perception

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer