pith. sign in

arXiv preprint arXiv:2411.16833 (2024)

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 4

years

2026 4

verdicts

UNVERDICTED 4

roles

background 1

polarities

background 1

clear filters

representative citing papers

WildDet3D: Scaling Promptable 3D Detection in the Wild

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D

cs.CV · 2026-04-06 · unverdicted · novelty 6.0

BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.

Grounded 3D-Aware Spatial Vision-Language Modeling

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

GR3D is a VLM that combines explicit 2D, implicit 2D, and monocular 3D grounding mechanisms to improve performance on spatial understanding benchmarks.

citing papers explorer

Showing 4 of 4 citing papers after filters.

  • WildDet3D: Scaling Promptable 3D Detection in the Wild cs.CV · 2026-04-09 · unverdicted · none · ref 62

    WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.

  • MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane cs.CV · 2026-03-20 · unverdicted · none · ref 53

    MoCA3D formulates monocular 3D box prediction as dense pixel-space tasks using corner heatmaps and depth maps, with a new PAG metric for image-plane evaluation.

  • Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D cs.CV · 2026-04-06 · unverdicted · none · ref 51

    BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.

  • Grounded 3D-Aware Spatial Vision-Language Modeling cs.CV · 2026-05-28 · unverdicted · none · ref 43

    GR3D is a VLM that combines explicit 2D, implicit 2D, and monocular 3D grounding mechanisms to improve performance on spatial understanding benchmarks.