{"paper":{"title":"ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data","license":"http://creativecommons.org/licenses/by/4.0/","headline":"ARKitScenes is the largest indoor RGB-D dataset captured with widely available mobile LiDAR sensors and includes laser-scanned depth plus manual 3D bounding box labels.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Afshin Dehghan, Arik Schwartz, Brandon Joffe, Daniel Kurz, Elad Shulman, Gilad Baruch, Peter Fu, Tal Dimry, Thomas Gebauer, Yuri Feigin, Zhuoyuan Chen","submitted_at":"2021-11-17T04:27:01Z","abstract_excerpt":"Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This opens a whole new era in scene understanding for the Computer Vision community as well as app developers. The fundamental research in scene understanding together with the advances in machine learning can"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ARKitScenes is not only the first RGB-D dataset captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the mobile RGB-D captures, laser-scanned depth maps, and manual 3D bounding box labels are sufficiently accurate and representative of real-world indoor scenes to push state-of-the-art methods on the two downstream tasks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ARKitScenes is the largest indoor RGB-D dataset captured with widely available mobile LiDAR sensors and includes laser-scanned depth plus manual 3D bounding box labels.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3c387f667a4e6222b478ef02458f17c5b950c9d51c3ae7e0ff30d381ea12702a"},"source":{"id":"2111.08897","kind":"arxiv","version":3},"verdict":{"id":"e7b663bd-2e04-4d98-a9db-f5bda035c67a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T10:41:47.251267Z","strongest_claim":"ARKitScenes is not only the first RGB-D dataset captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released.","one_line_summary":"ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the mobile RGB-D captures, laser-scanned depth maps, and manual 3D bounding box labels are sufficiently accurate and representative of real-world indoor scenes to push state-of-the-art methods on the two downstream tasks.","pith_extraction_headline":"ARKitScenes is the largest indoor RGB-D dataset captured with widely available mobile LiDAR sensors and includes laser-scanned depth plus manual 3D bounding box labels."},"references":{"count":46,"sample":[{"doi":"","year":2019,"title":"3d-sis: 3d semantic instance segmentation of rgb-d scans","work_id":"539e33f7-2816-4473-843a-b15021c485ff","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Gspn: Generative shape proposal network for 3d instance segmentation in point cloud","work_id":"07e7b214-eaae-4bd1-b860-e357578e92e9","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Sgpn: Similarity group proposal network for 3d point cloud instance segmentation","work_id":"57c8f838-5ec0-4798-b9a0-f7aa85147d09","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Deep hough voting for 3d object detection in point clouds","work_id":"c0b97321-80f0-4ac6-8485-2c4888fa836e","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Qi, Xinlei Chen, and Leonidas J","work_id":"d4ca096c-778f-41fa-896e-6997ba6bd3f6","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":46,"snapshot_sha256":"60741609255dbc446c82bc40a32e889b0254d68727f86a88d05548cbd9c0368d","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"21450e658b5260556d840098a83bc3ce8df5ac576c70db6ba09db9bfcf857134"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}