arXiv preprint arXiv:2411.16833 (2024)

Jin Yao, Hao Gu, Xuweiyi Chen, Jiayun Wang, Zezhou Cheng · 2024 · arXiv 2411.16833

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

cs.CV · 2026-03-20 · unverdicted · novelty 7.0

MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.

Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.

Grounded 3D-Aware Spatial Vision-Language Modeling

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

GR3D is a VLM that combines explicit 2D, implicit 2D, and monocular 3D grounding mechanisms to improve performance on spatial understanding benchmarks.

citing papers explorer

Showing 4 of 4 citing papers after filters.

WildDet3D: Scaling Promptable 3D Detection in the Wild cs.CV · 2026-04-09 · unverdicted · none · ref 62
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane cs.CV · 2026-03-20 · unverdicted · none · ref 53
MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.
Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D cs.CV · 2026-04-06 · unverdicted · none · ref 51
BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.
Grounded 3D-Aware Spatial Vision-Language Modeling cs.CV · 2026-05-28 · unverdicted · none · ref 43
GR3D is a VLM that combines explicit 2D, implicit 2D, and monocular 3D grounding mechanisms to improve performance on spatial understanding benchmarks.

arXiv preprint arXiv:2411.16833 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer