pith. sign in

arxiv: 1907.04444 · v1 · pith:2NYD6V66new · submitted 2019-07-09 · 💻 cs.CV

A review on deep learning techniques for 3D sensed data classification

Pith reviewed 2026-05-25 00:42 UTC · model grok-4.3

classification 💻 cs.CV
keywords deep learning3D data classificationpoint cloudsRGB-Dmulti-viewvolumetricend-to-end architecturesunstructured data
0
0 comments X

The pith

Deep learning methods for 3D sensed data classification fall into four main architecture categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how deep learning has advanced 2D image understanding but remains less developed for 3D sensed data such as point clouds. It covers background concepts and traditional methods before organizing current work into RGB-D, multi-view, volumetric, and fully end-to-end designs, while documenting datasets for each. The review closes by using existing literature to identify where future research would yield the highest value for applications like robotics navigation and remote sensing.

Core claim

The authors establish that the state-of-the-art deep learning architectures for unstructured Euclidean 3D data can be grouped into RGB-D based methods, multi-view methods, volumetric methods, and fully end-to-end architecture designs, each supported by specific datasets, and that mapping these categories clarifies the path toward more capable classification systems.

What carries the argument

The four-category taxonomy of RGB-D, multi-view, volumetric, and end-to-end architecture designs for processing 3D sensed data.

If this is right

  • Indoor robotics navigation systems can adopt more reliable 3D classification once the reviewed methods mature.
  • National-scale remote sensing applications gain automated understanding of sensed data.
  • Researchers gain documented datasets for benchmarking new classification models.
  • Future work concentrates on the research areas the discussion identifies as most valuable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy supplies a baseline that later reviews can use to measure how the field has progressed.
  • Hybrid methods that combine elements from more than one category may emerge as a natural next step.
  • Periodic updates to the dataset list would keep the overview useful as new collections appear.

Load-bearing premise

That the four categories and the listed datasets together give a representative picture of the field without major omissions at the time of writing.

What would settle it

A widely used deep learning method for 3D data classification that cannot be placed in any of the four architecture categories.

Figures

Figures reproduced from arXiv: 1907.04444 by David Griffiths, Jan Boehm.

Figure 1
Figure 1. Figure 1: Network architecture to process RGB and depth images separately to learn low level features. Multiple RNNs [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline for object detection and instance segmentation of RGB-D images. First a random forest classifier is [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: a) 3D amodal region proposal network used for the deep sliding network architecture. Two receptive fields [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CNN with anisotropic probing kernels. Elongated 3D convolutions are used to first extract features from [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Multi-view CNN. Rendered 2D images are acquired with virtual cameras and fed into independent CNNs [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: PointNet classification and segmentation network architectures. The network consumes raw [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: PointNet++ architecture. Hierarchical feature learning is introduced to learn features at various scales. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Frustum point net architecture. 2D CNN object detection is used to determine objects in RGB-D depth maps. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: DGCNN network architecture for unsupervised context prediction for point clouds. The intermediate latent [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper is a survey reviewing deep learning techniques for 3D sensed data classification. It covers background concepts and traditional methodologies, then examines four main approach categories (RGB-D, multi-view, volumetric, and fully end-to-end architectures), documents associated datasets, and concludes with a literature-based discussion of future research directions in the field.

Significance. If the review accurately and representatively synthesizes the literature, it would offer a useful consolidation of the state of 3D deep learning as of 2019, particularly for applications in robotics and remote sensing where 3D data processing lags behind 2D. The explicit documentation of datasets and forward-looking discussion add practical value for researchers entering the area.

minor comments (3)
  1. [Abstract] Abstract: grammatical error in 'such as point clouds, is comparatively immature' (subject-verb agreement); should be 'are comparatively immature'.
  2. [Abstract] Abstract: unnecessary semicolon in 'including; RGB-D'; rephrase to 'including RGB-D, multi-view, volumetric and fully end-to-end architecture designs' for clarity.
  3. [Abstract] The manuscript should ensure consistent terminology between the title ('3D sensed data classification') and abstract ('processing unstructured Euclidean data') to avoid reader confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their supportive summary, recognition of the paper's potential value for researchers in robotics and remote sensing, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity: pure literature review with no derivations or predictions

full rationale

This is a survey paper that summarizes background concepts, existing architectures (RGB-D, multi-view, volumetric, end-to-end), datasets, and future directions drawn from external literature. No original equations, fitted parameters, predictions, or derivation chains are present. All claims are descriptive citations of prior work; the representativeness of coverage is an external validity issue, not a circularity issue. No steps reduce to self-definition, fitted inputs, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature review and introduces no new mathematical derivations, empirical claims, or modeling choices that would require free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5673 in / 956 out tokens · 17174 ms · 2026-05-25T00:42:21.675476+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages · 24 internal anchors

  1. [1]

    3D free-form object recognition in range images using local surface patches,

    H. Chen and B. Bhanu, “3D free-form object recognition in range images using local surface patches,” Pattern Recognition Letters, vol. 28, no. 10, pp. 1252–1262, 2007

  2. [2]

    Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes,

    A. E. Johnson and M. Hebert, “Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 21, no. 5, pp. 433–449, 1999

  3. [3]

    Intrinsic shape signatures: A shape descriptor for 3D object recognition,

    Y . Zhong, “Intrinsic shape signatures: A shape descriptor for 3D object recognition,” in 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 2009, pp. 689–696

  4. [4]

    A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,

    J. Sun, M. Ovsjanikov, and L. Guibas, “A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion,” Computer Graphics Forum, vol. 28, no. 5, pp. 1383–1392, 2009

  5. [5]

    Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,

    B. Matei, Y . Shan, H. S. Sawhney, Y . Tan, R. Kumar, D. Huber, and M. Hebert, “Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1111–1126, 2006

  6. [6]

    Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,

    L. Shang and M. Greenspan, “Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding,” International Journal of Computer Vision, vol. 89, no. 2, pp. 211–228, 2010

  7. [7]

    Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,

    Y . Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan, “Rotational Projection Statistics for 3D Local Surface Description and Object Recognition,” International Journal of Computer Vision, vol. 105, no. 1, pp. 63–86, 2013

  8. [8]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

  9. [9]

    Multidimensional binary search trees used for associative searching,

    J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Communications of the ACM, vol. 18, no. 9, pp. 509–517, 1975

  10. [10]

    Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration,

    M. Muja and D. Lowe, “Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration,” in Proceedings of the Fourth International Conference on Computer Vision Theory and Applications. Lisboa, Portugal: SciTePress - Science and and Technology Publications, 2009, pp. 331–340

  11. [11]

    Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efficient classifiers,

    M. Weinmann, B. Jutzi, S. Hinz, and C. Mallet, “Semantic point cloud interpretation based on optimal neighbor- hoods, relevant features and efficient classifiers,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 105, pp. 286–304, 2015

  12. [12]

    Contextual classification of lidar data and building object detection in urban areas,

    J. Niemeyer, F. Rottensteiner, and U. Soergel, “Contextual classification of lidar data and building object detection in urban areas,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 87, pp. 152–165, 2014

  13. [13]

    Multi-scale Feature Extraction on Point-Sampled Surfaces,

    M. Pauly, R. Keiser, and M. Gross, “Multi-scale Feature Extraction on Point-Sampled Surfaces,” Computer Graphics Forum, vol. 22, no. 3, pp. 281–289, 2003

  14. [14]

    3D terrestrial lidar data classification of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,

    N. Brodu and D. Lague, “3D terrestrial lidar data classification of complex natural scenes using a multi-scale dimensionality criterion: Applications in geomorphology,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 68, pp. 121–134, 2012

  15. [15]

    Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,

    J. Demantké, C. Mallet, N. David, and B. Vallet, “Dimensionality Based Scale Selection in 3D LiDAR Point Clouds,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVIII-5/W12, pp. 97–102, 2012

  16. [16]

    Classification of Aerial Photogrammetric 3D Point Clouds,

    C. Becker, N. Häni, E. Rosinskaya, E. d’Angelo, and C. Strecha, “Classification of Aerial Photogrammetric 3D Point Clouds,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 3–10, 2017

  17. [17]

    3D Urban GIS From Laser Altimeter And 2D Map Data,

    N. Haala, C. Brenner, and K.-h. Anders, “3D Urban GIS From Laser Altimeter And 2D Map Data,” in Interna- tional Archives of Photogrammetry and Remote Sensing, 1998, pp. 339–346

  18. [18]

    Extraction of buildings and trees in urban environments,

    N. Haala and C. Brenner, “Extraction of buildings and trees in urban environments,” ISPRS Journal of Pho- togrammetry and Remote Sensing, vol. 54, no. 2, pp. 130–137, 1999

  19. [19]

    Slope Based Filtering of Laser Altimetry Data,

    G. V osselman, “Slope Based Filtering of Laser Altimetry Data,”International Archives of Photogrammetry and Remote Sensing, vol. 33(Part 3B), pp. 935–942, 2000

  20. [20]

    Digital terrain models from airborne laser scanner data – a grid based approach,

    R. Wack and A. Wimmer, “Digital terrain models from airborne laser scanner data – a grid based approach,” International Archives of Photogrammetry and Remote Sensing, vol. 34 (Part 3B), pp. 293–296, 2002

  21. [21]

    Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,

    J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision With Microsoft Kinect Sensor: A Review,” IEEE Transactions on Cybernetics, vol. 43, no. 5, pp. 1318–1334, 2013. 20 A PREPRINT - JULY 11, 2019

  22. [22]

    Human detection using depth information by Kinect,

    L. Xia, C. Chen, and J. K. Aggarwal, “Human detection using depth information by Kinect,” in CVPR 2011 WORKSHOPS, 2011, pp. 15–22

  23. [23]

    Hierarchical image segmentation algorithm in depth image processing,

    J. Yin and S. Kong, “Hierarchical image segmentation algorithm in depth image processing,” Journal of Multimedia, vol. 8, no. 5, pp. 512–518, 2013

  24. [24]

    Segmentation based classification of 3D urban point clouds: A super-voxel based approach with evaluation,

    A. K. Aijazi, P. Checchin, and L. Trassoudaine, “Segmentation based classification of 3D urban point clouds: A super-voxel based approach with evaluation,” Remote Sensing, vol. 5, no. 4, pp. 1624–1650, 2013

  25. [25]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105

  26. [26]

    Object Recognition with Gradient-Based Learning,

    Y . LeCun, P. Haffner, L. Bottou, and Y . Bengio, “Object Recognition with Gradient-Based Learning,” inShape, Contour and Grouping in Computer Vision, ser. Lecture Notes in Computer Science, D. A. Forsyth, J. L. Mundy, V . di Gesú, and R. Cipolla, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999, pp. 319–345

  27. [27]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

  28. [28]

    Rich feature hierarchies for accurate object detection and semantic segmentation,

    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

  29. [29]

    OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

    P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013

  30. [30]

    Ssd: Single shot multibox detector,

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision. Springer, 2016, pp. 21–37

  31. [31]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788

  32. [32]

    Fully convolutional networks for semantic segmentation,

    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440

  33. [33]

    SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,

    V . Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017

  34. [34]

    Simultaneous Detection and Segmentation,

    B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous Detection and Segmentation,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 297–312

  35. [35]

    Multiscale Combinatorial Grouping,

    P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, “Multiscale Combinatorial Grouping,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 328–335

  36. [36]

    Learning to Segment Object Candidates,

    P. O. Pinheiro, R. Collobert, and P. Dollar, “Learning to Segment Object Candidates,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 1990–1998

  37. [37]

    Learning to Refine Object Segments

    P. O. Pinheiro, T.-Y . Lin, R. Collobert, and P. Dollàr, “Learning to Refine Object Segments,”arXiv:1603.08695 [cs], 2016

  38. [38]

    Mask R-CNN

    K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” arXiv:1703.06870 [cs], 2017

  39. [39]

    A large-scale hierarchical multi-view RGB-D object dataset,

    K. Lai, L. Bo, X. Ren, and D. Fox, “A large-scale hierarchical multi-view RGB-D object dataset,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 1817–1824

  40. [40]

    Indoor Segmentation and Support Inference from RGBD Images,

    N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor Segmentation and Support Inference from RGBD Images,” in Computer Vision – ECCV 2012, ser. Lecture Notes in Computer Science, A. Fitzgibbon, S. Lazebnik, P. Perona, Y . Sato, and C. Schmid, Eds. Springer Berlin Heidelberg, 2012, pp. 746–760

  41. [41]

    SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,

    J. Xiao, A. Owens, and A. Torralba, “SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1625–1632

  42. [42]

    ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,

    J. Martínez-Gómez, I. García-Varea, M. Cazorla, and V . Morell, “ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset,” The International Journal of Robotics Research, vol. 34, no. 14, pp. 1681–1687, 2015

  43. [43]

    SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,

    S. Song, S. P. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 567–576

  44. [44]

    A Benchmark for 3D Mesh Segmentation,

    X. Chen, A. Golovinskiy, and T. Funkhouser, “A Benchmark for 3D Mesh Segmentation,” in ACM SIGGRAPH 2009 Papers, ser. SIGGRAPH ’09. New York, NY , USA: ACM, 2009, pp. 73:1–73:12. 21 A PREPRINT - JULY 11, 2019

  45. [45]

    ShapeNet: An Information-Rich 3D Model Repository

    A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “ShapeNet: An Information-Rich 3D Model Repository,” arXiv:1512.03012 [cs], 2015

  46. [46]

    A Scalable Active Framework for Region Annotation in 3D Shape Collections,

    L. Yi, V . G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas, “A Scalable Active Framework for Region Annotation in 3D Shape Collections,” ACM Trans. Graph., vol. 35, no. 6, pp. 210:1–210:12, 2016

  47. [47]

    Joint 2D-3D-Semantic Data for Indoor Scene Understanding

    I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2D-3D-Semantic Data for Indoor Scene Understanding,” arXiv:1702.01105 [cs], 2017

  48. [48]

    ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

    A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” arXiv:1702.04405 [cs], 2017

  49. [49]

    Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,

    A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 76a, 2017

  50. [50]

    Contextual classification with functional Max-Margin Markov Networks,

    D. Munoz, J. A. Bagnell, N. Vandapel, and M. Hebert, “Contextual classification with functional Max-Margin Markov Networks,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 975–982

  51. [51]

    An occlusion-aware feature for range images,

    A. Quadros, J. P. Underwood, and B. Douillard, “An occlusion-aware feature for range images,” in 2012 IEEE International Conference on Robotics and Automation, 2012, pp. 4428–4435

  52. [52]

    Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods,

    A. Serna, B. Marcotegui, F. Goulette, and J.-E. Deschaud, “Paris-rue-Madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods,” in4th International Conference on Pattern Recognition, Applications and Methods ICPRAM 2014, Angers, France, 2014

  53. [53]

    TerraMobilita/iQmulus urban point cloud analysis benchmark,

    B. Vallet, M. Brédif, A. Serna, B. Marcotegui, and N. Paparoditis, “TerraMobilita/iQmulus urban point cloud analysis benchmark,” Computers & Graphics, vol. 49, pp. 126–133, 2015

  54. [54]

    An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,

    J. Gehrung, M. Hebel, M. Arens, and U. Stilla, “An Approach To Extract Moving Object From MLS Data Using A V olumetric Background Representation,” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. IV-1/W1, pp. 107–114, 2017

  55. [55]

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys, “Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark,”arXiv:1704.03847 [cs], 2017

  56. [56]

    Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification,

    X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification,”The International Journal of Robotics Research, vol. 37, no. 6, pp. 545–557, 2018

  57. [57]

    Convolutional-recursive deep learning for 3d object classification,

    R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y . Ng, “Convolutional-recursive deep learning for 3d object classification,” inAdvances in Neural Information Processing Systems, 2012, pp. 656–664

  58. [58]

    Multimodal deep learning for robust RGB-D object recognition,

    A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard, “Multimodal deep learning for robust RGB-D object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 681–687

  59. [59]

    Indoor Semantic Segmentation using depth information

    C. Couprie, C. Farabet, L. Najman, and Y . LeCun, “Indoor Semantic Segmentation using depth information,” arXiv:1301.3572 [cs], 2013

  60. [60]

    Learning Hierarchical Features for Scene Labeling,

    C. Farabet, C. Couprie, L. Najman, and Y . LeCun, “Learning Hierarchical Features for Scene Labeling,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1915–1929, 2013

  61. [61]

    Learning rich features from RGB-D images for object detection and segmentation,

    S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in European Conference on Computer Vision. Springer, 2014, pp. 345–360

  62. [62]

    Structured Forests for Fast Edge Detection,

    P. Dollar and C. L. Zitnick, “Structured Forests for Fast Edge Detection,” inProceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848

  63. [63]

    Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,

    S. Gupta, P. Arbelaez, and J. Malik, “Perceptual Organization and Recognition of Indoor Scenes from RGB- D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2013, pp. 564–571

  64. [64]

    Automatic corine land cover classification from airborne lidar data,

    J. Balado, P. Arias, L. Díaz-Vilariño, and L. M. González-deSantos, “Automatic corine land cover classification from airborne lidar data,” Procedia Computer Science, vol. 126, pp. 186–194, 2018

  65. [65]

    LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,

    Z. Li, Y . Gan, X. Liang, Y . Yu, H. Cheng, and L. Lin, “LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling,” /paper/LSTM-CF%3A-Unifying-Context-Modeling-and-Fusion-with-Li- Gan/df4b5974b22e7c46611daf1926c4d2a7400145ad, 2016

  66. [66]

    Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs,” arXiv:1412.7062 [cs], 2014. 22 A PREPRINT - JULY 11, 2019

  67. [67]

    FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,

    C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,” in Computer Vision – ACCV 2016, ser. Lecture Notes in Computer Science, S.-H. Lai, V . Lepetit, K. Nishino, and Y . Sato, Eds. Springer International Publishing, 2017, pp. 213–228

  68. [68]

    Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

    A. Zeng, K.-T. Yu, S. Song, D. Suo, E. Walker Jr., A. Rodriguez, and J. Xiao, “Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge,” arXiv:1609.09475 [cs], 2016

  69. [69]

    Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

    L. Ma, J. Stückler, C. Kerl, and D. Cremers, “Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras,” arXiv:1703.08866 [cs], 2017

  70. [70]

    V oxNet: A 3D Convolutional Neural Network for real-time object recognition,

    D. Maturana and S. Scherer, “V oxNet: A 3D Convolutional Neural Network for real-time object recognition,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 922–928

  71. [71]

    3D ShapeNets: A Deep Representation for V olumetric Shapes,

    Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A Deep Representation for V olumetric Shapes,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1912–1920

  72. [72]

    Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,

    S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 808–816

  73. [73]

    Sliding Shapes for 3D Object Detection in Depth Images,

    ——, “Sliding Shapes for 3D Object Detection in Depth Images,” in Computer Vision – ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Springer International Publishing, 2014, pp. 634–651

  74. [74]

    Volumetric and Multi-View CNNs for Object Classification on 3D Data

    C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas, “V olumetric and Multi-View CNNs for Object Classification on 3D Data,”arXiv:1604.03265 [cs], 2016

  75. [75]

    Network In Network

    M. Lin, Q. Chen, and S. Yan, “Network In Network,” arXiv:1312.4400 [cs], 2013

  76. [76]

    Point cloud labeling using 3D Convolutional Neural Network,

    J. Huang and S. You, “Point cloud labeling using 3D Convolutional Neural Network,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 2670–2675

  77. [77]

    SEGCloud: Semantic Segmentation of 3D Point Clouds,

    L. Tchapmi, C. Choy, I. Armeni, J. Gwak, and S. Savarese, “SEGCloud: Semantic Segmentation of 3D Point Clouds,” in 2017 International Conference on 3D Vision (3DV), 2017, pp. 537–547

  78. [78]

    Multi-view convolutional neural networks for 3d shape recognition,

    H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in The IEEE International Conference on Computer Vision (ICCV), December 2015

  79. [79]

    Learning methods for generic object recognition with invariance to pose and lighting,

    Y . LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recognition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2, 2004, pp. II–104 V ol.2

  80. [80]

    3D Shape Segmentation With Projective Convolutional Networks,

    E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri, “3D Shape Segmentation With Projective Convolutional Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 3779–3788

Showing first 80 references.