pith. sign in

arxiv: 2507.02393 · v2 · pith:MSWDFBFTnew · submitted 2025-07-03 · 💻 cs.CV · cs.GR

PLOT: Pseudo-Labeling via Object Tracking for Monocular 3D Object Detection

classification 💻 cs.CV cs.GR
keywords objectplotacrossmonocularannotationsdetectionperceptionpseudo-labeling
0
0 comments X
read the original abstract

Monocular 3D object detection is crucial for scalable perception across fields like autonomous driving, robotics, and surveillance. However, progress is hindered by limited 3D annotations and the inherent ambiguity of single-image geometry. Existing methods often rely on strong geometric assumptions or carefully curated datasets, which limit their applicability to real-world scenarios. In this paper, we present PLOT (Pseudo-Labeling via Object Tracking), a framework that generates 3D annotations from monocular videos without auxiliary sensors or model retraining. PLOT tracks object and background trajectories to estimate camera motion and perform object association in pose-unknown settings. These trajectories provide point correspondences that align frame-wise pseudo-LiDARs, which are then fused via simple optimization into a unified object shape robust to occlusion and viewpoint shifts. Recognizing temporal coherence as a fundamental requirement for reliable shape fusion and video perception, we design a global object memory that preserves consistent object identities across frames. PLOT achieves robust annotation quality and strong generalization on both M3OD video benchmarks and in-the-wild videos, proving its effectiveness across diverse and unconstrained domains. Project page: https://plot-eccv.github.io.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Depth2Pose: A Pose-Based Benchmark for Monocular Depth Estimation without Ground-Truth Depth

    cs.CV 2026-05 unverdicted novelty 7.0

    Depth2Pose is a new evaluation framework for monocular depth estimators that uses relative camera pose accuracy as a task-driven proxy and introduces the D2P dataset of challenging out-of-distribution scenes.