Recognition: 2 theorem links
· Lean TheoremARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
Pith reviewed 2026-05-15 10:41 UTC · model grok-4.3
The pith
ARKitScenes is the largest indoor RGB-D dataset captured with widely available mobile LiDAR sensors and includes laser-scanned depth plus manual 3D bounding box labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ARKitScenes is the first RGB-D dataset captured with the widely available depth sensor on iPads and iPhones and the largest indoor scene understanding dataset released. It supplies raw and processed mobile device data, high-resolution depth maps from a stationary laser scanner, and manually labeled 3D oriented bounding boxes for furniture. Evaluation on 3D object detection and color-guided depth upsampling shows the dataset pushes state-of-the-art performance and introduces new real-world challenges.
What carries the argument
The ARKitScenes dataset that pairs mobile RGB-D captures with laser-scanner depth maps and manual 3D bounding box annotations for indoor furniture.
If this is right
- 3D object detection models achieve higher accuracy on large furniture taxonomies when trained with the labeled mobile data.
- Color-guided depth upsampling produces higher-resolution outputs by using the laser scans as precise ground truth.
- The dataset scale supports training larger machine-learning models for indoor scene understanding.
- Methods developed on the data must handle noise and viewpoint variation typical of handheld mobile captures.
- The combination of mobile and laser data creates a bridge between consumer hardware and high-precision references.
Where Pith is reading between the lines
- App developers could fine-tune models on this data to add room-layout awareness to consumer AR experiences without extra hardware.
- The dataset could be used to study how well algorithms generalize from mobile captures to other depth sensors.
- Future releases might add semantic segmentation labels or dynamic object tracks to extend the current static bounding-box focus.
- Cross-validation across different device models within the captures could reveal hardware-specific biases in depth sensing.
Load-bearing premise
The mobile RGB-D captures, laser-scanned depth maps, and manual 3D bounding box labels are sufficiently accurate and representative of real-world indoor scenes to advance state-of-the-art methods.
What would settle it
A controlled test in which models trained on ARKitScenes show no improvement over models trained on prior datasets when evaluated on independent mobile RGB-D captures from varied indoor rooms would falsify the usefulness claim.
read the original abstract
Scene understanding is an active research area. Commercial depth sensors, such as Kinect, have enabled the release of several RGB-D datasets over the past few years which spawned novel methods in 3D scene understanding. More recently with the launch of the LiDAR sensor in Apple's iPads and iPhones, high quality RGB-D data is accessible to millions of people on a device they commonly use. This opens a whole new era in scene understanding for the Computer Vision community as well as app developers. The fundamental research in scene understanding together with the advances in machine learning can now impact people's everyday experiences. However, transforming these scene understanding methods to real-world experiences requires additional innovation and development. In this paper we introduce ARKitScenes. It is not only the first RGB-D dataset that is captured with a now widely available depth sensor, but to our best knowledge, it also is the largest indoor scene understanding data released. In addition to the raw and processed data from the mobile device, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further analyze the usefulness of the data for two downstream tasks: 3D object detection and color-guided depth upsampling. We demonstrate that our dataset can help push the boundaries of existing state-of-the-art methods and it introduces new challenges that better represent real-world scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ARKitScenes as the first RGB-D dataset captured with Apple's widely available LiDAR sensor on mobile iPads/iPhones and, to the authors' knowledge, the largest indoor scene understanding dataset released. It supplies raw and processed mobile RGB-D captures, registered high-resolution depth maps from a stationary laser scanner, and manually annotated 3D oriented bounding boxes over a furniture taxonomy. The authors compare scale and characteristics to prior datasets (ScanNet, Matterport3D) and demonstrate utility on two downstream tasks: 3D object detection and color-guided depth upsampling, claiming the data pushes SOTA boundaries while introducing real-world challenges.
Significance. If the scale, registration quality, and annotation accuracy hold, the release supplies a high-value resource whose mobile capture characteristics better match everyday consumer hardware than prior lab-style datasets. This can accelerate development of robust 3D scene understanding methods for mobile applications, with the laser-scanned depths and 3D boxes providing strong supervision signals for detection and upsampling benchmarks.
major comments (2)
- [§4] §4 (Dataset Statistics): the central claim that ARKitScenes is the largest indoor dataset requires an explicit side-by-side table (number of scenes, frames, annotated objects, capture conditions) against ScanNet and Matterport3D; without these numbers the size/diversity assertion is unsupported.
- [§6] §6 (Downstream Tasks): the demonstrations for 3D object detection and depth upsampling must report concrete metrics (mAP, RMSE, etc.) and baselines; the abstract states only that the data 'pushes boundaries' without evidence, which is load-bearing for the utility claim.
minor comments (2)
- Figure captions should explicitly state what each panel shows (RGB, mobile depth, laser depth, projected boxes) and include scale bars or units.
- [§3] The taxonomy of furniture classes and the exact annotation protocol (number of annotators, quality control) should be listed in a dedicated subsection or table.
Simulated Author's Rebuttal
We thank the referee for the positive recommendation of minor revision and the constructive comments. We address each point below.
read point-by-point responses
-
Referee: [§4] §4 (Dataset Statistics): the central claim that ARKitScenes is the largest indoor dataset requires an explicit side-by-side table (number of scenes, frames, annotated objects, capture conditions) against ScanNet and Matterport3D; without these numbers the size/diversity assertion is unsupported.
Authors: We agree that an explicit comparison table will strengthen the claim. In the revised manuscript we will insert a side-by-side table in §4 that reports number of scenes, frames, annotated objects, and capture conditions for ARKitScenes, ScanNet, and Matterport3D. revision: yes
-
Referee: [§6] §6 (Downstream Tasks): the demonstrations for 3D object detection and depth upsampling must report concrete metrics (mAP, RMSE, etc.) and baselines; the abstract states only that the data 'pushes boundaries' without evidence, which is load-bearing for the utility claim.
Authors: We will revise the abstract to include the key quantitative results (mAP for detection and RMSE for upsampling) and will ensure §6 explicitly lists all metrics together with the baselines used. This will provide the concrete evidence requested. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper is a dataset release paper whose central claims concern the scale, sensor type, and annotation quality of ARKitScenes itself. No mathematical derivations, fitted parameters, or predictions appear in the manuscript. Claims of being the first LiDAR-based RGB-D dataset and the largest indoor scene-understanding release are supported by explicit size statistics and direct comparisons to ScanNet, Matterport3D, and similar prior releases, none of which reduce to self-citation chains or self-definitional loops. The two downstream-task demonstrations (3D object detection and depth upsampling) are empirical evaluations on the released data rather than derivations that collapse to their own inputs. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mobile RGB-D sensors such as Apple's LiDAR produce depth data of sufficient quality for indoor scene understanding tasks
Forward citations
Cited by 21 Pith papers
-
ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models
ViSRA boosts MLLM 3D spatial reasoning performance by up to 28.9% on unseen tasks via a plug-and-play video-based agent that extracts explicit spatial cues from expert models without any post-training.
-
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.
-
DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs
DENALI is the first large-scale real-world dataset of space-time histograms from low-cost LiDARs for training models to perceive hidden objects via multi-bounce light cues.
-
Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale
A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.
-
WildDet3D: Scaling Promptable 3D Detection in the Wild
WildDet3D is a promptable 3D detector paired with a new 1M-image dataset across 13.5K categories that sets SOTA on open-world and zero-shot 3D detection benchmarks.
-
Reasoning over Video: Evaluating How MLLMs Extract, Integrate, and Reconstruct Spatiotemporal Evidence
VAEX-BENCH shows state-of-the-art MLLMs perform substantially worse on abstractive spatiotemporal reasoning tasks than on matched extractive tasks in video understanding.
-
ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training
ZipMap achieves linear-time bidirectional 3D reconstruction by zipping image collections into a compact stateful representation via test-time training layers.
-
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and d...
-
Hyperbolic Distillation: Geometry-Guided Cross-Modal Transfer for Robust 3D Object Detection
HGC-Det applies hyperbolic geometry to constrain cross-modal distillation between images and point clouds, with added semantic-guided voxel optimization and feature aggregation, yielding improved accuracy-efficiency t...
-
HSG: Hyperbolic Scene Graph
Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D
BoxerNet lifts 2D bounding boxes to metric 3D boxes via transformer regression with aleatoric uncertainty and median depth encoding, then fuses multi-view results to outperform CuTR by large margins on open-world benchmarks.
-
Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding
UniScene3D learns unified 3D scene representations from colored pointmaps using contrastive CLIP pretraining plus cross-view geometric and grounded view alignments, achieving state-of-the-art results on viewpoint grou...
-
SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
SpatialStack improves 3D spatial reasoning in vision-language models by stacking and synchronizing multi-level geometric features with the language backbone.
-
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding
Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at...
-
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
-
R3D: Revisiting 3D Policy Learning
A transformer 3D encoder plus diffusion decoder architecture, with 3D-specific augmentations, outperforms prior 3D policy methods on manipulation benchmarks by improving training stability.
-
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
OpenSpatial supplies a principled open-source data engine and 3-million-sample dataset that raises spatial-reasoning model performance by an average of 19 percent on benchmarks.
-
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
JoyAI-Image unifies visual understanding, generation, and editing in one model and claims stronger spatial intelligence through bidirectional perception-generation loops.
-
Seed1.5-VL Technical Report
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.
Reference graph
Works this paper leans on
-
[1]
3d-sis: 3d semantic instance segmentation of rgb-d scans
Ji Hou, Angela Dai, and Matthias Nießner. 3d-sis: 3d semantic instance segmentation of rgb-d scans. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4421–4430, 2019
work page 2019
-
[2]
Gspn: Generative shape proposal network for 3d instance segmentation in point cloud
Li Yi, Wang Zhao, He Wang, Minhyuk Sung, and Leonidas J Guibas. Gspn: Generative shape proposal network for 3d instance segmentation in point cloud. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 3947–3956, 2019
work page 2019
-
[3]
Sgpn: Similarity group proposal network for 3d point cloud instance segmentation
Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. InProc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 2569–2578, 2018
work page 2018
-
[4]
Deep hough voting for 3d object detection in point clouds
Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep hough voting for 3d object detection in point clouds. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[5]
Qi, Xinlei Chen, and Leonidas J
Charles R. Qi, Xinlei Chen, and Leonidas J. Guibas Or Litany. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[6]
Svga- net: Sparse voxel-graph attention network for 3d object detection from point clouds
Qingdong He, Zhengning Wang, Hao Zeng, Yi Zeng, Shuaicheng Liu, and Bing Zeng. Svga- net: Sparse voxel-graph attention network for 3d object detection from point clouds. arXiv preprint arXiv:2006.04043, 2020
-
[7]
Group-free 3d object detection via transformers
Ze Liu, Zheng Zhang, Yue Cao, Han Hu, and Xin Tong. Group-free 3d object detection via transformers. arXiv preprint arXiv:2104.00678, 2021
-
[8]
ShapeNet: An Information-Rich 3D Model Repository
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[9]
Sun3d: A database of big spaces reconstructed using sfm and object labels
Jianxiong Xiao, Andrew Owens, and Antonio Torralba. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proc. International Conference on Computer Vision (ICCV), pages 1625–1632, 2013
work page 2013
-
[10]
A category-level 3d object dataset: Putting the kinect to work
Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. A category-level 3d object dataset: Putting the kinect to work. InConsumer depth cameras for computer vision, pages 141–165. Springer, 2013
work page 2013
-
[11]
3d semantic parsing of large-scale indoor spaces
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 1534–1543, 2016
work page 2016
-
[12]
Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding
Kaichun Mo, Shilin Zhu, Angel X Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , pages 909–918, 2019
work page 2019
-
[13]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 2446–2454, 2020
work page 2020
-
[14]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[15]
Sun rgb-d: A rgb-d scene understanding benchmark suite
S Song, S Lichtenberg, and J Xiao. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , 2015
work page 2015
-
[16]
Indoor scene segmentation using a structured light sensor
Nathan Silberman and Rob Fergus. Indoor scene segmentation using a structured light sensor. In 2011 IEEE international conference on computer vision workshops (ICCV workshops), pages 601–608. IEEE, 2011
work page 2011
-
[17]
https://www.apple.com/newsroom/2020/03/apple-unveils-new-ipad-pro-with-lidar- scanner-and-trackpad-support-in-ipados/. 10
work page 2020
-
[18]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. Ieee, 2009
work page 2009
-
[19]
Vision meets robotics: The kitti dataset
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset. 2013
work page 2013
- [20]
-
[21]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[22]
Matterport3D: Learning from RGB-D Data in Indoor Environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Scenenn: A scene meshes dataset with annotations
Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai- Kit Yeung. Scenenn: A scene meshes dataset with annotations. In 2016 Fourth International Conference on 3D Vision (3DV), pages 92–101. IEEE, 2016
work page 2016
-
[24]
Pigraphs: Learning interaction snapshots from observations
Manolis Savva, Angel X Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. Pigraphs: Learning interaction snapshots from observations. ACM Transactions on Graphics (TOG), 35(4):1–12, 2016
work page 2016
-
[25]
A naturalistic open source movie for optical flow evaluation
Daniel J Butler, Jonas Wulff, Garrett B Stanley, and Michael J Black. A naturalistic open source movie for optical flow evaluation. In Proc. European Conference on Computer Vision (ECCV) , pages 611–625. Springer, 2012
work page 2012
-
[26]
High-resolution stereo datasets with subpixel-accurate ground truth
Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Nera Neši´ c, Xi Wang, and Porter Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition, pages 31–42. Springer, 2014
work page 2014
-
[27]
Structure aware single-stage 3d object detection from point cloud
Chenhang He, Hui Zeng, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. Structure aware single-stage 3d object detection from point cloud. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[28]
Hvnet: Hybrid voxel network for lidar based 3d object detection
Maosheng Ye, Shuangjie Xu, and Tongyi Cao. Hvnet: Hybrid voxel network for lidar based 3d object detection. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[29]
Point-gnn: Graph neural network for 3d object detection in a point cloud
Weijing Shi and Raj Rajkumar. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[30]
Mlcvnet: Multi-level context votenet for 3d object detection
Qian Xie, Yu-Kun Lai, Jing Wu, Zhoutao Wang, Yiming Zhang, Kai Xu, and Jun Wang. Mlcvnet: Multi-level context votenet for 3d object detection. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[31]
Jintai Chen, Biwen Lei, Qingyu Song, Haochao Ying, Danny Z. Chen, and Jian Wu. A hierarchi- cal graph network for 3d object detection on point clouds. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[32]
Frodo: From detections to 3d objects
Martin Runz, Kejie Li, Meng Tang, Lingni Ma, Chen Kong, Tanner Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, and Richard Newcombe. Frodo: From detections to 3d objects. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020
work page 2020
-
[33]
Generative sparse detection networks for 3d single-shot object detection
JunYoung Gwak, Christopher Choy, and Silvio Savarese. Generative sparse detection networks for 3d single-shot object detection. arXiv preprint arXiv:2006.12356, 2020
-
[34]
Frustum pointnets for 3d object detection from rgb-d data
Charles R Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J Guibas. Frustum pointnets for 3d object detection from rgb-d data. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 918–927, 2018
work page 2018
-
[35]
Pv-rcnn: Point-voxel feature set abstraction for 3d object detection
Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), pages 10529–10538, 2020. 11
work page 2020
-
[36]
Objectron: A large scale dataset of object-centric videos in the wild with pose annotations
Adel Ahmadyan, Liangkai Zhang, Jianing Wei, Artsiom Ablavatski, and Matthias Grundmann. Objectron: A large scale dataset of object-centric videos in the wild with pose annotations. arXiv preprint arXiv:2012.09988, 2020
-
[37]
Depth map super-resolution by deep multi-scale guidance
Tak-Wai Hui, Chen Change Loy, , and Xiaoou Tang. Depth map super-resolution by deep multi-scale guidance. In Proc. European Conference on Computer Vision (ECCV) , pages 353–369, 2016
work page 2016
-
[38]
Cohen, Dani Lischinski, and Matt Uyttendaele
Johannes Kopf, Michael F . Cohen, Dani Lischinski, and Matt Uyttendaele. Joint bilateral upsampling. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007), 26(3):to appear, 2007
work page 2007
-
[39]
Image guided depth upsampling using anisotropic total generalized variation
David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Rüther, and Horst Bischof. Image guided depth upsampling using anisotropic total generalized variation. InProc. International Conference on Computer Vision (ICCV) , pages 993–1000, 2013
work page 2013
-
[40]
A taxonomy and evaluation of dense two-frame stereo correspondence algorithms
Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision (IJCV) , 47(1):7–42, 2002
work page 2002
-
[41]
High-accuracy stereo depth maps using structured light
Daniel Scharstein and Richard Szeliski. High-accuracy stereo depth maps using structured light. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , volume 1, pages I–I. IEEE, 2003
work page 2003
-
[42]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981
work page 1981
-
[43]
H3dnet: 3d object detection using hybrid geometric primitives
Zaiwei Zhang, Bo Sun, Haitao Yang, and Qixing Huang. H3dnet: 3d object detection using hybrid geometric primitives. In Proc. European Conference on Computer Vision (ECCV) , 2020
work page 2020
-
[44]
Qi, Li Yi, Hao Su, and Leonidas J
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[45]
Multi-scale progressive fusion learning for depth map super-resolution
Chuhua Xian, Kun Qian, Zitian Zhang, and Charlie CL Wang. Multi-scale progressive fusion learning for depth map super-resolution. arXiv preprint arXiv:2011.11865, 2020
-
[46]
Megadepth: Learning single-view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) , 2018. 12
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.