A review on deep learning techniques for 3D sensed data classification

David Griffiths; Jan Boehm

arxiv: 1907.04444 · v1 · pith:2NYD6V66new · submitted 2019-07-09 · 💻 cs.CV

A review on deep learning techniques for 3D sensed data classification

David Griffiths , Jan Boehm This is my paper

Pith reviewed 2026-05-25 00:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords deep learning3D data classificationpoint cloudsRGB-Dmulti-viewvolumetricend-to-end architecturesunstructured data

0 comments

The pith

Deep learning methods for 3D sensed data classification fall into four main architecture categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how deep learning has advanced 2D image understanding but remains less developed for 3D sensed data such as point clouds. It covers background concepts and traditional methods before organizing current work into RGB-D, multi-view, volumetric, and fully end-to-end designs, while documenting datasets for each. The review closes by using existing literature to identify where future research would yield the highest value for applications like robotics navigation and remote sensing.

Core claim

The authors establish that the state-of-the-art deep learning architectures for unstructured Euclidean 3D data can be grouped into RGB-D based methods, multi-view methods, volumetric methods, and fully end-to-end architecture designs, each supported by specific datasets, and that mapping these categories clarifies the path toward more capable classification systems.

What carries the argument

The four-category taxonomy of RGB-D, multi-view, volumetric, and end-to-end architecture designs for processing 3D sensed data.

If this is right

Indoor robotics navigation systems can adopt more reliable 3D classification once the reviewed methods mature.
National-scale remote sensing applications gain automated understanding of sensed data.
Researchers gain documented datasets for benchmarking new classification models.
Future work concentrates on the research areas the discussion identifies as most valuable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy supplies a baseline that later reviews can use to measure how the field has progressed.
Hybrid methods that combine elements from more than one category may emerge as a natural next step.
Periodic updates to the dataset list would keep the overview useful as new collections appear.

Load-bearing premise

That the four categories and the listed datasets together give a representative picture of the field without major omissions at the time of writing.

What would settle it

A widely used deep learning method for 3D data classification that cannot be placed in any of the four architecture categories.

Figures

Figures reproduced from arXiv: 1907.04444 by David Griffiths, Jan Boehm.

**Figure 1.** Figure 1: Network architecture to process RGB and depth images separately to learn low level features. Multiple RNNs [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Pipeline for object detection and instance segmentation of RGB-D images. First a random forest classifier is [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: a) 3D amodal region proposal network used for the deep sliding network architecture. Two receptive fields [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: CNN with anisotropic probing kernels. Elongated 3D convolutions are used to first extract features from [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Multi-view CNN. Rendered 2D images are acquired with virtual cameras and fed into independent CNNs [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: PointNet classification and segmentation network architectures. The network consumes raw [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: PointNet++ architecture. Hierarchical feature learning is introduced to learn features at various scales. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Frustum point net architecture. 2D CNN object detection is used to determine objects in RGB-D depth maps. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: DGCNN network architecture for unsupervised context prediction for point clouds. The intermediate latent [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This 2019 review organizes DL methods for 3D sensed data into four categories and lists datasets but adds no new technical results.

read the letter

This paper is a literature review from 2019 that groups deep learning methods for classifying 3D sensed data into RGB-D, multi-view, volumetric, and end-to-end approaches, while listing relevant datasets and touching on future directions. It does a solid job of providing background on traditional methods and then breaking down the main architectures in each category. Documenting the datasets alongside the methods gives readers a practical starting point for exploring the area. The final section uses existing literature to flag areas for future work, which keeps the discussion grounded. The soft spots are typical for a survey of this type. Its usefulness depends on how thorough and accurate the coverage actually is in the full text, and the abstract gives no sign of deep critical analysis or gap identification beyond a standard discussion. Being from mid-2019, it naturally misses later work on transformers, graph networks, and large-scale 3D models, but that is expected rather than a flaw. There are no machine-checked proofs or shipped code, which is normal for a review but means the claims rest on the authors' selection of papers. This kind of paper is aimed at newcomers to 3D computer vision or researchers needing a structured overview of the state around 2019. A reader interested in the evolution of the field or building a reference list might get value from it. It deserves a serious referee because the topic remains relevant for robotics and remote sensing, and the four-category structure is a reasonable way to organize the material. I would recommend sending it to peer review rather than desk rejecting it, with reviewers likely focusing on completeness and whether the dataset lists are still representative.

Referee Report

0 major / 3 minor

Summary. The paper is a survey reviewing deep learning techniques for 3D sensed data classification. It covers background concepts and traditional methodologies, then examines four main approach categories (RGB-D, multi-view, volumetric, and fully end-to-end architectures), documents associated datasets, and concludes with a literature-based discussion of future research directions in the field.

Significance. If the review accurately and representatively synthesizes the literature, it would offer a useful consolidation of the state of 3D deep learning as of 2019, particularly for applications in robotics and remote sensing where 3D data processing lags behind 2D. The explicit documentation of datasets and forward-looking discussion add practical value for researchers entering the area.

minor comments (3)

[Abstract] Abstract: grammatical error in 'such as point clouds, is comparatively immature' (subject-verb agreement); should be 'are comparatively immature'.
[Abstract] Abstract: unnecessary semicolon in 'including; RGB-D'; rephrase to 'including RGB-D, multi-view, volumetric and fully end-to-end architecture designs' for clarity.
[Abstract] The manuscript should ensure consistent terminology between the title ('3D sensed data classification') and abstract ('processing unstructured Euclidean data') to avoid reader confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their supportive summary, recognition of the paper's potential value for researchers in robotics and remote sensing, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity: pure literature review with no derivations or predictions

full rationale

This is a survey paper that summarizes background concepts, existing architectures (RGB-D, multi-view, volumetric, end-to-end), datasets, and future directions drawn from external literature. No original equations, fitted parameters, predictions, or derivation chains are present. All claims are descriptive citations of prior work; the representativeness of coverage is an external validity issue, not a circularity issue. No steps reduce to self-definition, fitted inputs, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature review and introduces no new mathematical derivations, empirical claims, or modeling choices that would require free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5673 in / 956 out tokens · 17174 ms · 2026-05-25T00:42:21.675476+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages · 24 internal anchors

[1]

3D free-form object recognition in range images using local surface patches,

H. Chen and B. Bhanu, “3D free-form object recognition in range images using local surface patches,” Pattern Recognition Letters, vol. 28, no. 10, pp. 1252–1262, 2007

work page 2007
[2]

Using Spin Images for Efﬁcient Object Recognition in Cluttered 3D Scenes,

A. E. Johnson and M. Hebert, “Using Spin Images for Efﬁcient Object Recognition in Cluttered 3D Scenes,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 21, no. 5, pp. 433–449, 1999

work page 1999
[3]

Intrinsic shape signatures: A shape descriptor for 3D object recognition,

Y . Zhong, “Intrinsic shape signatures: A shape descriptor for 3D object recognition,” in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009, pp. 689–696

work page 2009
[4]

A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,

J. Sun, M. Ovsjanikov, and L. Guibas, “A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,” Computer Graphics Forum, vol. 28, no. 5, pp. 1383–1392, 2009

work page 2009
[5]

Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,

B. Matei, Y . Shan, H. S. Sawhney, Y . Tan, R. Kumar, D. Huber, and M. Hebert, “Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1111–1126, 2006

work page 2006
[6]

Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,

L. Shang and M. Greenspan, “Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,” International Journal of Computer Vision, vol. 89, no. 2, pp. 211–228, 2010

work page 2010
[7]

Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,

Y . Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan, “Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,” International Journal of Computer Vision, vol. 105, no. 1, pp. 63–86, 2013

work page 2013
[8]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[9]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975

work page 1975
[10]

Fast Approximate Nearest Neighbors with Automatic Algorithm Conﬁguration,

M. Muja and D. Lowe, “Fast Approximate Nearest Neighbors with Automatic Algorithm Conﬁguration,” in Proceedings of the Fourth International Conference on Computer Vision Theory and Applications. Lisboa, Portugal: SciTePress - Science and and Technology Publications, 2009, pp. 331–340

work page 2009
[11]

Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efﬁcient classiﬁers,

M. Weinmann, B. Jutzi, S. Hinz, and C. Mallet, “Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efﬁcient classiﬁers,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 105, pp. 286–304, 2015

work page 2015
[12]

Contextual classiﬁcation of lidar data and building object detection in urban areas,

J. Niemeyer, F. Rottensteiner, and U. Soergel, “Contextual classiﬁcation of lidar data and building object detection in urban areas,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 87, pp. 152–165, 2014

work page 2014
[13]

Multi-scale Feature Extraction on Point-Sampled Surfaces,

M. Pauly, R. Keiser, and M. Gross, “Multi-scale Feature Extraction on Point-Sampled Surfaces,” Computer Graphics Forum, vol. 22, no. 3, pp. 281–289, 2003

work page 2003
[14]

3D terrestrial lidar data classiﬁcation of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,

N. Brodu and D. Lague, “3D terrestrial lidar data classiﬁcation of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 68, pp. 121–134, 2012

work page 2012
[15]

Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,

J. Demantké, C. Mallet, N. David, and B. Vallet, “Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVIII-5/W12, pp. 97–102, 2012

work page 2012
[16]

Classiﬁcation of Aerial Photogrammetric 3D Point Clouds,

C. Becker, N. Häni, E. Rosinskaya, E. d’Angelo, and C. Strecha, “Classiﬁcation of Aerial Photogrammetric 3D Point Clouds,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 3–10, 2017

work page 2017
[17]

3D Urban GIS From Laser Altimeter And 2D Map Data,

N. Haala, C. Brenner, and K.-h. Anders, “3D Urban GIS From Laser Altimeter And 2D Map Data,” in Interna- tional Archives of Photogrammetry and Remote Sensing, 1998, pp. 339–346

work page 1998
[18]

Extraction of buildings and trees in urban environments,

N. Haala and C. Brenner, “Extraction of buildings and trees in urban environments,” ISPRS Journal of Pho- togrammetry and Remote Sensing, vol. 54, no. 2, pp. 130–137, 1999

work page 1999
[19]

Slope Based Filtering of Laser Altimetry Data,

G. V osselman, “Slope Based Filtering of Laser Altimetry Data,”International Archives of Photogrammetry and Remote Sensing, vol. 33(Part 3B), pp. 935–942, 2000

work page 2000
[20]

Digital terrain models from airborne laser scanner data – a grid based approach,

R. Wack and A. Wimmer, “Digital terrain models from airborne laser scanner data – a grid based approach,” International Archives of Photogrammetry and Remote Sensing, vol. 34 (Part 3B), pp. 293–296, 2002

work page 2002
[21]

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,

J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1318–1334, 2013. 20 A PREPRINT - JULY 11, 2019

work page 2013
[22]

Human detection using depth information by Kinect,

L. Xia, C. Chen, and J. K. Aggarwal, “Human detection using depth information by Kinect,” in CVPR 2011 WORKSHOPS, 2011, pp. 15–22

work page 2011
[23]

Hierarchical image segmentation algorithm in depth image processing,

J. Yin and S. Kong, “Hierarchical image segmentation algorithm in depth image processing,” Journal of Multimedia, vol. 8, no. 5, pp. 512–518, 2013

work page 2013
[24]

Segmentation based classiﬁcation of 3D urban point clouds: A super-voxel based approach with evaluation,

A. K. Aijazi, P. Checchin, and L. Trassoudaine, “Segmentation based classiﬁcation of 3D urban point clouds: A super-voxel based approach with evaluation,” Remote Sensing, vol. 5, no. 4, pp. 1624–1650, 2013

work page 2013
[25]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105

work page 2012
[26]

Object Recognition with Gradient-Based Learning,

Y . LeCun, P. Haffner, L. Bottou, and Y . Bengio, “Object Recognition with Gradient-Based Learning,” inShape, Contour and Grouping in Computer Vision, ser. Lecture Notes in Computer Science, D. A. Forsyth, J. L. Mundy, V . di Gesú, and R. Cipolla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999, pp. 319–345

work page 1999
[27]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016
[28]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

work page 2014
[29]

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

Ssd: Single shot multibox detector,

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision. Springer, 2016, pp. 21–37

work page 2016
[31]

You only look once: Uniﬁed, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uniﬁed, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788

work page 2016
[32]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440

work page 2015
[33]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,

V . Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017

work page 2017
[34]

Simultaneous Detection and Segmentation,

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous Detection and Segmentation,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 297–312

work page 2014
[35]

Multiscale Combinatorial Grouping,

P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Multiscale Combinatorial Grouping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 328–335

work page 2014
[36]

Learning to Segment Object Candidates,

P. O. Pinheiro, R. Collobert, and P. Dollar, “Learning to Segment Object Candidates,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 1990–1998

work page 2015
[37]

Learning to Refine Object Segments

P. O. Pinheiro, T.-Y . Lin, R. Collobert, and P. Dollàr, “Learning to Reﬁne Object Segments,”arXiv:1603.08695 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Mask R-CNN

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” arXiv:1703.06870 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

A large-scale hierarchical multi-view RGB-D object dataset,

K. Lai, L. Bo, X. Ren, and D. Fox, “A large-scale hierarchical multi-view RGB-D object dataset,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 1817–1824

work page 2011
[40]

Indoor Segmentation and Support Inference from RGBD Images,

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGBD Images,” in Computer Vision – ECCV 2012, ser. Lecture Notes in Computer Science, A. Fitzgibbon, S. Lazebnik, P. Perona, Y . Sato, and C. Schmid, Eds. Springer Berlin Heidelberg, 2012, pp. 746–760

work page 2012
[41]

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,

J. Xiao, A. Owens, and A. Torralba, “SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1625–1632

work page 2013
[42]

ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,

J. Martínez-Gómez, I. García-Varea, M. Cazorla, and V . Morell, “ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,” The International Journal of Robotics Research, vol. 34, no. 14, pp. 1681–1687, 2015

work page 2015
[43]

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,

S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 567–576

work page 2015
[44]

A Benchmark for 3D Mesh Segmentation,

X. Chen, A. Golovinskiy, and T. Funkhouser, “A Benchmark for 3D Mesh Segmentation,” in ACM SIGGRAPH 2009 Papers, ser. SIGGRAPH ’09. New York, NY , USA: ACM, 2009, pp. 73:1–73:12. 21 A PREPRINT - JULY 11, 2019

work page 2009
[45]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” arXiv:1512.03012 [cs], 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[46]

A Scalable Active Framework for Region Annotation in 3D Shape Collections,

L. Yi, V . G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A Scalable Active Framework for Region Annotation in 3D Shape Collections,” ACM Trans. Graph., vol. 35, no. 6, pp. 210:1–210:12, 2016

work page 2016
[47]

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2D-3D-Semantic Data for Indoor Scene Understanding,” arXiv:1702.01105 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[48]

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” arXiv:1702.04405 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-ﬂy surface reintegration,

A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-ﬂy surface reintegration,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 76a, 2017

work page 2017
[50]

Contextual classiﬁcation with functional Max-Margin Markov Networks,

D. Munoz, J. A. Bagnell, N. Vandapel, and M. Hebert, “Contextual classiﬁcation with functional Max-Margin Markov Networks,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 975–982

work page 2009
[51]

An occlusion-aware feature for range images,

A. Quadros, J. P. Underwood, and B. Douillard, “An occlusion-aware feature for range images,” in 2012 IEEE International Conference on Robotics and Automation, 2012, pp. 4428–4435

work page 2012
[52]

Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classiﬁcation methods,

A. Serna, B. Marcotegui, F. Goulette, and J.-E. Deschaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classiﬁcation methods,” in4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014, Angers, France, 2014

work page 2014
[53]

TerraMobilita/iQmulus urban point cloud analysis benchmark,

B. Vallet, M. Brédif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics, vol. 49, pp. 126–133, 2015

work page 2015
[54]

An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,

J. Gehrung, M. Hebel, M. Arens, and U. Stilla, “An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 107–114, 2017

work page 2017
[55]

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classiﬁcation Benchmark,”arXiv:1704.03847 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classiﬁcation,

X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classiﬁcation,”The International Journal of Robotics Research, vol. 37, no. 6, pp. 545–557, 2018

work page 2018
[57]

Convolutional-recursive deep learning for 3d object classiﬁcation,

R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y . Ng, “Convolutional-recursive deep learning for 3d object classiﬁcation,” inAdvances in Neural Information Processing Systems, 2012, pp. 656–664

work page 2012
[58]

Multimodal deep learning for robust RGB-D object recognition,

A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 681–687

work page 2015
[59]

Indoor Semantic Segmentation using depth information

C. Couprie, C. Farabet, L. Najman, and Y . LeCun, “Indoor Semantic Segmentation using depth information,” arXiv:1301.3572 [cs], 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[60]

Learning Hierarchical Features for Scene Labeling,

C. Farabet, C. Couprie, L. Najman, and Y . LeCun, “Learning Hierarchical Features for Scene Labeling,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–1929, 2013

work page 1915
[61]

Learning rich features from RGB-D images for object detection and segmentation,

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in European Conference on Computer Vision. Springer, 2014, pp. 345–360

work page 2014
[62]

Structured Forests for Fast Edge Detection,

P. Dollar and C. L. Zitnick, “Structured Forests for Fast Edge Detection,” inProceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848

work page 2013
[63]

Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,

S. Gupta, P. Arbelaez, and J. Malik, “Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2013, pp. 564–571

work page 2013
[64]

Automatic corine land cover classiﬁcation from airborne lidar data,

J. Balado, P. Arias, L. Díaz-Vilariño, and L. M. González-deSantos, “Automatic corine land cover classiﬁcation from airborne lidar data,” Procedia Computer Science, vol. 126, pp. 186–194, 2018

work page 2018
[65]

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,

Z. Li, Y . Gan, X. Liang, Y . Yu, H. Cheng, and L. Lin, “LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,” /paper/LSTM-CF%3A-Unifying-Context-Modeling-and-Fusion-with-Li- Gan/df4b5974b22e7c46611daf1926c4d2a7400145ad, 2016

work page 2016
[66]

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs,” arXiv:1412.7062 [cs], 2014. 22 A PREPRINT - JULY 11, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2014
[67]

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,” in Computer Vision – ACCV 2016, ser. Lecture Notes in Computer Science, S.-H. Lai, V . Lepetit, K. Nishino, and Y . Sato, Eds. Springer International Publishing, 2017, pp. 213–228

work page 2016
[68]

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

A. Zeng, K.-T. Yu, S. Song, D. Suo, E. Walker Jr., A. Rodriguez, and J. Xiao, “Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge,” arXiv:1609.09475 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[69]

Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

L. Ma, J. Stückler, C. Kerl, and D. Cremers, “Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras,” arXiv:1703.08866 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[70]

V oxNet: A 3D Convolutional Neural Network for real-time object recognition,

D. Maturana and S. Scherer, “V oxNet: A 3D Convolutional Neural Network for real-time object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 922–928

work page 2015
[71]

3D ShapeNets: A Deep Representation for V olumetric Shapes,

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for V olumetric Shapes,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1912–1920

work page 2015
[72]

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,

S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 808–816

work page 2016
[73]

Sliding Shapes for 3D Object Detection in Depth Images,

——, “Sliding Shapes for 3D Object Detection in Depth Images,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 634–651

work page 2014
[74]

Volumetric and Multi-View CNNs for Object Classification on 3D Data

C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas, “V olumetric and Multi-View CNNs for Object Classiﬁcation on 3D Data,”arXiv:1604.03265 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[75]

Network In Network

M. Lin, Q. Chen, and S. Yan, “Network In Network,” arXiv:1312.4400 [cs], 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[76]

Point cloud labeling using 3D Convolutional Neural Network,

J. Huang and S. You, “Point cloud labeling using 3D Convolutional Neural Network,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 2670–2675

work page 2016
[77]

SEGCloud: Semantic Segmentation of 3D Point Clouds,

L. Tchapmi, C. Choy, I. Armeni, J. Gwak, and S. Savarese, “SEGCloud: Semantic Segmentation of 3D Point Clouds,” in 2017 International Conference on 3D Vision (3DV), 2017, pp. 537–547

work page 2017
[78]

Multi-view convolutional neural networks for 3d shape recognition,

H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in The IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015
[79]

Learning methods for generic object recognition with invariance to pose and lighting,

Y . LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recognition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2, 2004, pp. II–104 V ol.2

work page 2004
[80]

3D Shape Segmentation With Projective Convolutional Networks,

E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri, “3D Shape Segmentation With Projective Convolutional Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 3779–3788

work page 2017

Showing first 80 references.

[1] [1]

3D free-form object recognition in range images using local surface patches,

H. Chen and B. Bhanu, “3D free-form object recognition in range images using local surface patches,” Pattern Recognition Letters, vol. 28, no. 10, pp. 1252–1262, 2007

work page 2007

[2] [2]

Using Spin Images for Efﬁcient Object Recognition in Cluttered 3D Scenes,

A. E. Johnson and M. Hebert, “Using Spin Images for Efﬁcient Object Recognition in Cluttered 3D Scenes,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 21, no. 5, pp. 433–449, 1999

work page 1999

[3] [3]

Intrinsic shape signatures: A shape descriptor for 3D object recognition,

Y . Zhong, “Intrinsic shape signatures: A shape descriptor for 3D object recognition,” in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009, pp. 689–696

work page 2009

[4] [4]

A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,

J. Sun, M. Ovsjanikov, and L. Guibas, “A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,” Computer Graphics Forum, vol. 28, no. 5, pp. 1383–1392, 2009

work page 2009

[5] [5]

Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,

B. Matei, Y . Shan, H. S. Sawhney, Y . Tan, R. Kumar, D. Huber, and M. Hebert, “Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1111–1126, 2006

work page 2006

[6] [6]

Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,

L. Shang and M. Greenspan, “Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,” International Journal of Computer Vision, vol. 89, no. 2, pp. 211–228, 2010

work page 2010

[7] [7]

Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,

Y . Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan, “Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,” International Journal of Computer Vision, vol. 105, no. 1, pp. 63–86, 2013

work page 2013

[8] [8]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[9] [9]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975

work page 1975

[10] [10]

Fast Approximate Nearest Neighbors with Automatic Algorithm Conﬁguration,

M. Muja and D. Lowe, “Fast Approximate Nearest Neighbors with Automatic Algorithm Conﬁguration,” in Proceedings of the Fourth International Conference on Computer Vision Theory and Applications. Lisboa, Portugal: SciTePress - Science and and Technology Publications, 2009, pp. 331–340

work page 2009

[11] [11]

Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efﬁcient classiﬁers,

M. Weinmann, B. Jutzi, S. Hinz, and C. Mallet, “Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efﬁcient classiﬁers,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 105, pp. 286–304, 2015

work page 2015

[12] [12]

Contextual classiﬁcation of lidar data and building object detection in urban areas,

J. Niemeyer, F. Rottensteiner, and U. Soergel, “Contextual classiﬁcation of lidar data and building object detection in urban areas,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 87, pp. 152–165, 2014

work page 2014

[13] [13]

Multi-scale Feature Extraction on Point-Sampled Surfaces,

M. Pauly, R. Keiser, and M. Gross, “Multi-scale Feature Extraction on Point-Sampled Surfaces,” Computer Graphics Forum, vol. 22, no. 3, pp. 281–289, 2003

work page 2003

[14] [14]

3D terrestrial lidar data classiﬁcation of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,

N. Brodu and D. Lague, “3D terrestrial lidar data classiﬁcation of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 68, pp. 121–134, 2012

work page 2012

[15] [15]

Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,

J. Demantké, C. Mallet, N. David, and B. Vallet, “Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVIII-5/W12, pp. 97–102, 2012

work page 2012

[16] [16]

Classiﬁcation of Aerial Photogrammetric 3D Point Clouds,

C. Becker, N. Häni, E. Rosinskaya, E. d’Angelo, and C. Strecha, “Classiﬁcation of Aerial Photogrammetric 3D Point Clouds,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 3–10, 2017

work page 2017

[17] [17]

3D Urban GIS From Laser Altimeter And 2D Map Data,

N. Haala, C. Brenner, and K.-h. Anders, “3D Urban GIS From Laser Altimeter And 2D Map Data,” in Interna- tional Archives of Photogrammetry and Remote Sensing, 1998, pp. 339–346

work page 1998

[18] [18]

Extraction of buildings and trees in urban environments,

N. Haala and C. Brenner, “Extraction of buildings and trees in urban environments,” ISPRS Journal of Pho- togrammetry and Remote Sensing, vol. 54, no. 2, pp. 130–137, 1999

work page 1999

[19] [19]

Slope Based Filtering of Laser Altimetry Data,

G. V osselman, “Slope Based Filtering of Laser Altimetry Data,”International Archives of Photogrammetry and Remote Sensing, vol. 33(Part 3B), pp. 935–942, 2000

work page 2000

[20] [20]

Digital terrain models from airborne laser scanner data – a grid based approach,

R. Wack and A. Wimmer, “Digital terrain models from airborne laser scanner data – a grid based approach,” International Archives of Photogrammetry and Remote Sensing, vol. 34 (Part 3B), pp. 293–296, 2002

work page 2002

[21] [21]

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,

J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1318–1334, 2013. 20 A PREPRINT - JULY 11, 2019

work page 2013

[22] [22]

Human detection using depth information by Kinect,

L. Xia, C. Chen, and J. K. Aggarwal, “Human detection using depth information by Kinect,” in CVPR 2011 WORKSHOPS, 2011, pp. 15–22

work page 2011

[23] [23]

Hierarchical image segmentation algorithm in depth image processing,

J. Yin and S. Kong, “Hierarchical image segmentation algorithm in depth image processing,” Journal of Multimedia, vol. 8, no. 5, pp. 512–518, 2013

work page 2013

[24] [24]

Segmentation based classiﬁcation of 3D urban point clouds: A super-voxel based approach with evaluation,

A. K. Aijazi, P. Checchin, and L. Trassoudaine, “Segmentation based classiﬁcation of 3D urban point clouds: A super-voxel based approach with evaluation,” Remote Sensing, vol. 5, no. 4, pp. 1624–1650, 2013

work page 2013

[25] [25]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105

work page 2012

[26] [26]

Object Recognition with Gradient-Based Learning,

Y . LeCun, P. Haffner, L. Bottou, and Y . Bengio, “Object Recognition with Gradient-Based Learning,” inShape, Contour and Grouping in Computer Vision, ser. Lecture Notes in Computer Science, D. A. Forsyth, J. L. Mundy, V . di Gesú, and R. Cipolla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999, pp. 319–345

work page 1999

[27] [27]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

work page 2016

[28] [28]

Rich feature hierarchies for accurate object detection and semantic segmentation,

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

work page 2014

[29] [29]

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[30] [30]

Ssd: Single shot multibox detector,

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision. Springer, 2016, pp. 21–37

work page 2016

[31] [31]

You only look once: Uniﬁed, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uniﬁed, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788

work page 2016

[32] [32]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440

work page 2015

[33] [33]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,

V . Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017

work page 2017

[34] [34]

Simultaneous Detection and Segmentation,

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous Detection and Segmentation,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 297–312

work page 2014

[35] [35]

Multiscale Combinatorial Grouping,

P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Multiscale Combinatorial Grouping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 328–335

work page 2014

[36] [36]

Learning to Segment Object Candidates,

P. O. Pinheiro, R. Collobert, and P. Dollar, “Learning to Segment Object Candidates,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 1990–1998

work page 2015

[37] [37]

Learning to Refine Object Segments

P. O. Pinheiro, T.-Y . Lin, R. Collobert, and P. Dollàr, “Learning to Reﬁne Object Segments,”arXiv:1603.08695 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Mask R-CNN

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” arXiv:1703.06870 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

A large-scale hierarchical multi-view RGB-D object dataset,

K. Lai, L. Bo, X. Ren, and D. Fox, “A large-scale hierarchical multi-view RGB-D object dataset,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 1817–1824

work page 2011

[40] [40]

Indoor Segmentation and Support Inference from RGBD Images,

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGBD Images,” in Computer Vision – ECCV 2012, ser. Lecture Notes in Computer Science, A. Fitzgibbon, S. Lazebnik, P. Perona, Y . Sato, and C. Schmid, Eds. Springer Berlin Heidelberg, 2012, pp. 746–760

work page 2012

[41] [41]

SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,

J. Xiao, A. Owens, and A. Torralba, “SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1625–1632

work page 2013

[42] [42]

ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,

J. Martínez-Gómez, I. García-Varea, M. Cazorla, and V . Morell, “ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,” The International Journal of Robotics Research, vol. 34, no. 14, pp. 1681–1687, 2015

work page 2015

[43] [43]

SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,

S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 567–576

work page 2015

[44] [44]

A Benchmark for 3D Mesh Segmentation,

X. Chen, A. Golovinskiy, and T. Funkhouser, “A Benchmark for 3D Mesh Segmentation,” in ACM SIGGRAPH 2009 Papers, ser. SIGGRAPH ’09. New York, NY , USA: ACM, 2009, pp. 73:1–73:12. 21 A PREPRINT - JULY 11, 2019

work page 2009

[45] [45]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” arXiv:1512.03012 [cs], 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[46] [46]

A Scalable Active Framework for Region Annotation in 3D Shape Collections,

L. Yi, V . G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A Scalable Active Framework for Region Annotation in 3D Shape Collections,” ACM Trans. Graph., vol. 35, no. 6, pp. 210:1–210:12, 2016

work page 2016

[47] [47]

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2D-3D-Semantic Data for Indoor Scene Understanding,” arXiv:1702.01105 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[48] [48]

ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” arXiv:1702.04405 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[49] [49]

Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-ﬂy surface reintegration,

A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-ﬂy surface reintegration,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 76a, 2017

work page 2017

[50] [50]

Contextual classiﬁcation with functional Max-Margin Markov Networks,

D. Munoz, J. A. Bagnell, N. Vandapel, and M. Hebert, “Contextual classiﬁcation with functional Max-Margin Markov Networks,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 975–982

work page 2009

[51] [51]

An occlusion-aware feature for range images,

A. Quadros, J. P. Underwood, and B. Douillard, “An occlusion-aware feature for range images,” in 2012 IEEE International Conference on Robotics and Automation, 2012, pp. 4428–4435

work page 2012

[52] [52]

Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classiﬁcation methods,

A. Serna, B. Marcotegui, F. Goulette, and J.-E. Deschaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classiﬁcation methods,” in4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014, Angers, France, 2014

work page 2014

[53] [53]

TerraMobilita/iQmulus urban point cloud analysis benchmark,

B. Vallet, M. Brédif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics, vol. 49, pp. 126–133, 2015

work page 2015

[54] [54]

An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,

J. Gehrung, M. Hebel, M. Arens, and U. Stilla, “An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 107–114, 2017

work page 2017

[55] [55]

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classiﬁcation Benchmark,”arXiv:1704.03847 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[56] [56]

Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classiﬁcation,

X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classiﬁcation,”The International Journal of Robotics Research, vol. 37, no. 6, pp. 545–557, 2018

work page 2018

[57] [57]

Convolutional-recursive deep learning for 3d object classiﬁcation,

R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y . Ng, “Convolutional-recursive deep learning for 3d object classiﬁcation,” inAdvances in Neural Information Processing Systems, 2012, pp. 656–664

work page 2012

[58] [58]

Multimodal deep learning for robust RGB-D object recognition,

A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 681–687

work page 2015

[59] [59]

Indoor Semantic Segmentation using depth information

C. Couprie, C. Farabet, L. Najman, and Y . LeCun, “Indoor Semantic Segmentation using depth information,” arXiv:1301.3572 [cs], 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[60] [60]

Learning Hierarchical Features for Scene Labeling,

C. Farabet, C. Couprie, L. Najman, and Y . LeCun, “Learning Hierarchical Features for Scene Labeling,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–1929, 2013

work page 1915

[61] [61]

Learning rich features from RGB-D images for object detection and segmentation,

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in European Conference on Computer Vision. Springer, 2014, pp. 345–360

work page 2014

[62] [62]

Structured Forests for Fast Edge Detection,

P. Dollar and C. L. Zitnick, “Structured Forests for Fast Edge Detection,” inProceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848

work page 2013

[63] [63]

Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,

S. Gupta, P. Arbelaez, and J. Malik, “Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2013, pp. 564–571

work page 2013

[64] [64]

Automatic corine land cover classiﬁcation from airborne lidar data,

J. Balado, P. Arias, L. Díaz-Vilariño, and L. M. González-deSantos, “Automatic corine land cover classiﬁcation from airborne lidar data,” Procedia Computer Science, vol. 126, pp. 186–194, 2018

work page 2018

[65] [65]

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,

Z. Li, Y . Gan, X. Liang, Y . Yu, H. Cheng, and L. Lin, “LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,” /paper/LSTM-CF%3A-Unifying-Context-Modeling-and-Fusion-with-Li- Gan/df4b5974b22e7c46611daf1926c4d2a7400145ad, 2016

work page 2016

[66] [66]

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs,” arXiv:1412.7062 [cs], 2014. 22 A PREPRINT - JULY 11, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2014

[67] [67]

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,” in Computer Vision – ACCV 2016, ser. Lecture Notes in Computer Science, S.-H. Lai, V . Lepetit, K. Nishino, and Y . Sato, Eds. Springer International Publishing, 2017, pp. 213–228

work page 2016

[68] [68]

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

A. Zeng, K.-T. Yu, S. Song, D. Suo, E. Walker Jr., A. Rodriguez, and J. Xiao, “Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge,” arXiv:1609.09475 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[69] [69]

Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

L. Ma, J. Stückler, C. Kerl, and D. Cremers, “Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras,” arXiv:1703.08866 [cs], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[70] [70]

V oxNet: A 3D Convolutional Neural Network for real-time object recognition,

D. Maturana and S. Scherer, “V oxNet: A 3D Convolutional Neural Network for real-time object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 922–928

work page 2015

[71] [71]

3D ShapeNets: A Deep Representation for V olumetric Shapes,

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for V olumetric Shapes,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1912–1920

work page 2015

[72] [72]

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,

S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 808–816

work page 2016

[73] [73]

Sliding Shapes for 3D Object Detection in Depth Images,

——, “Sliding Shapes for 3D Object Detection in Depth Images,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 634–651

work page 2014

[74] [74]

Volumetric and Multi-View CNNs for Object Classification on 3D Data

C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas, “V olumetric and Multi-View CNNs for Object Classiﬁcation on 3D Data,”arXiv:1604.03265 [cs], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[75] [75]

Network In Network

M. Lin, Q. Chen, and S. Yan, “Network In Network,” arXiv:1312.4400 [cs], 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[76] [76]

Point cloud labeling using 3D Convolutional Neural Network,

J. Huang and S. You, “Point cloud labeling using 3D Convolutional Neural Network,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 2670–2675

work page 2016

[77] [77]

SEGCloud: Semantic Segmentation of 3D Point Clouds,

L. Tchapmi, C. Choy, I. Armeni, J. Gwak, and S. Savarese, “SEGCloud: Semantic Segmentation of 3D Point Clouds,” in 2017 International Conference on 3D Vision (3DV), 2017, pp. 537–547

work page 2017

[78] [78]

Multi-view convolutional neural networks for 3d shape recognition,

H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in The IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015

[79] [79]

Learning methods for generic object recognition with invariance to pose and lighting,

Y . LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recognition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2, 2004, pp. II–104 V ol.2

work page 2004

[80] [80]

3D Shape Segmentation With Projective Convolutional Networks,

E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri, “3D Shape Segmentation With Projective Convolutional Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 3779–3788

work page 2017