pith. sign in

arxiv: 2012.09988 · v1 · pith:B6KETGAInew · submitted 2020-12-18 · 💻 cs.CV

Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations

classification 💻 cs.CV
keywords datasetobjectdetectionobjectronvideosannotatedannotationsapplications
0
0 comments X
read the original abstract

3D object detection has recently become popular due to many applications in robotics, augmented reality, autonomy, and image retrieval. We introduce the Objectron dataset to advance the state of the art in 3D object detection and foster new research and applications, such as 3D object tracking, view synthesis, and improved 3D shape representation. The dataset contains object-centric short videos with pose annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos. We also propose a new evaluation metric, 3D Intersection over Union, for 3D object detection. We demonstrate the usefulness of our dataset in 3D object detection tasks by providing baseline models trained on this dataset. Our dataset and evaluation source code are available online at http://www.objectron.dev

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data

    cs.CV 2021-11 accept novelty 8.0

    ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.

  2. PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

    cs.CV 2026-04 unverdicted novelty 6.0

    PhyEdit improves physical accuracy in image object manipulation by using explicit geometric simulation as 3D-aware guidance combined with joint 2D-3D supervision.