arXiv preprint arXiv:2511.20648 (2025) 26 D

Yunze Man, Shihao Wang, Guowen Zhang, Johan Bjorck, Zhiqi Li, Liang-Yan Gui, Jim Fan, Jan Kautz, Yu- Xiong Wang, Zhiding Yu · 2025 · arXiv 2511.20648

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

cs.CV · 2026-02-23 · unverdicted · novelty 6.0

Pose-VLA uses a decoupled two-stage pre-training with discrete pose tokens to extract universal 3D spatial priors from 3D datasets and robotic trajectories, achieving 79.5% success on RoboTwin 2.0 and 96.0% on LIBERO.

citing papers explorer

Showing 3 of 3 citing papers.

WildDet3D: Scaling Promptable 3D Detection in the Wild cs.CV · 2026-04-09 · unverdicted · none · ref 35
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D cs.CV · 2026-04-06 · unverdicted · none · ref 29
BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies cs.CV · 2026-02-23 · unverdicted · none · ref 29
Pose-VLA uses a decoupled two-stage pre-training with discrete pose tokens to extract universal 3D spatial priors from 3D datasets and robotic trajectories, achieving 79.5% success on RoboTwin 2.0 and 96.0% on LIBERO.

arXiv preprint arXiv:2511.20648 (2025) 26 D

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer