EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
Masked depth modeling for spatial perception.arXiv preprint arXiv:2601.17895, 2026
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
dataset 1polarities
use dataset 1representative citing papers
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.
citing papers explorer
-
EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction
Robo3R predicts accurate metric-scale 3D scene geometry from RGB images and robot states for improved robotic manipulation performance.