pith. sign in

arxiv: 1907.09236 · v1 · pith:AMQ4NANFnew · submitted 2019-07-22 · 💻 cs.CV · cs.LG

RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques

Pith reviewed 2026-05-24 18:20 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords RGB-D object detectiondeep learninghand-crafted featurescomputer visionsurveymachine learning3D scannersperformance evaluation
0
0 comments X

The pith

Deep learning techniques have revolutionized RGB-D object detection by achieving unprecedented performance levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews the progression of RGB-D object detection from traditional methods that combine hand-crafted features with machine learning algorithms to more recent deep learning approaches. It structures the review into two main parts, one for each category, summarizes the most common pipelines, and discusses benefits along with limitations. A reader would care because the work traces how large training datasets have enabled major gains in accuracy for applications like robotics, surveillance, and medical imaging.

Core claim

The paper surveys key contributions in RGB-D object detection and establishes that deep learning techniques, coupled with the availability of large training datasets, have revolutionized the field and achieved an unprecedented level of performance compared to earlier hand-crafted feature methods.

What carries the argument

The two-part structure that separates hand-crafted feature methods combined with machine learning from deep learning methods, used to compare pipelines, benefits, and limitations.

If this is right

  • Traditional methods rely on hand-crafted features paired with machine learning algorithms.
  • Deep learning approaches deliver higher performance when large datasets are available.
  • Common pipelines for each category are summarized for direct reference.
  • Benefits, limitations, and future research directions are identified for each type of method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The survey suggests that dataset size is a primary driver separating the performance of the two categories.
  • Applications such as medical diagnosis stand to gain from the same shift to deep learning seen in robotics.
  • Hybrid systems that combine hand-crafted features with deep networks could address remaining limitations in data-scarce settings.

Load-bearing premise

The selected papers form a representative and unbiased sample of the key contributions in both traditional and deep learning categories.

What would settle it

Discovery of a major RGB-D object detection paper or benchmark result that is omitted from the survey or directly contradicts the claimed performance gains from deep learning would challenge the review.

Figures

Figures reproduced from arXiv: 1907.09236 by Hamid Laga, Isaac Ronald Ward, Mohammed Bennamoun.

Figure 3.1
Figure 3.1. Figure 3.1: Illustration of some extrinsic challenges in object detection. (a) Objects of [PITH_FULL_IMAGE:figures/full_fig_p003_3_1.png] view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Illustration of some intrinsic challenges in object detection. (a-c) Intra [PITH_FULL_IMAGE:figures/full_fig_p004_3_2.png] view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Taxonomy of the state-of-the-art traditional and deep learning methods. [PITH_FULL_IMAGE:figures/full_fig_p006_3_3.png] view at source ↗
Figure 3.4
Figure 3.4. Figure 3.4: (a) Illustration of the Intersection over Union (IoU) metric in 2D. (b) Illus [PITH_FULL_IMAGE:figures/full_fig_p021_3_4.png] view at source ↗
read the original abstract

Object detection from RGB images is a long-standing problem in image processing and computer vision. It has applications in various domains including robotics, surveillance, human-computer interaction, and medical diagnosis. With the availability of low cost 3D scanners, a large number of RGB-D object detection approaches have been proposed in the past years. This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning. Deep learning techniques, coupled with the availability of large training datasets, have now revolutionized the field of computer vision, including RGB-D object detection, achieving an unprecedented level of performance. We survey the key contributions, summarize the most commonly used pipelines, discuss their benefits and limitations, and highlight some important directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey chapter on RGB-D object detection. It divides the literature into two parts: (1) hand-crafted features combined with classical machine-learning algorithms and (2) deep-learning pipelines. The authors claim to survey key contributions, summarize commonly used pipelines, discuss benefits and limitations, and identify future directions, asserting that deep learning plus large datasets has revolutionized performance in the field.

Significance. If the coverage is representative, the survey would supply a structured entry point for researchers entering RGB-D detection, clarifying the transition from feature-engineering to end-to-end learning and highlighting open problems. The two-part organization and explicit discussion of limitations are useful organizational choices.

major comments (1)
  1. [Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.
minor comments (1)
  1. [Abstract] The abstract states that the chapter is structured into two parts, but the manuscript should explicitly label the corresponding sections (e.g., §2 and §3) so readers can locate the hand-crafted versus deep-learning material without ambiguity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the scope and transparency of our survey. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.

    Authors: We agree that including an explicit description of the literature search protocol would improve transparency and allow readers to evaluate the survey's representativeness. In the revised manuscript we will insert a short subsection (or paragraph) in the introduction that specifies the databases queried (IEEE Xplore, ACM Digital Library, Google Scholar, arXiv), the primary search keywords and Boolean strings, the publication date range covered, and the inclusion/exclusion criteria used to identify key contributions and representative pipelines. This addition will directly support the claims made in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: survey contains no derivations, predictions, or self-referential claims

full rationale

This is a literature survey with no equations, fitted parameters, predictions, or first-principles derivations. The central claim that deep learning has revolutionized RGB-D detection is presented as a summary of external work rather than a result derived from the survey's own selection or citations. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support any load-bearing step. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey containing no new mathematical models, fitted parameters, axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5698 in / 944 out tokens · 16652 ms · 2026-05-24T18:20:31.715936+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 16 internal anchors

  1. [1]

    URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php

    Cornell grasping dataset. URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php. Accessed: 2018-12-13

  2. [2]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012)

    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012). DOI 10.1109/TPAMI.2012.120

  3. [3]

    In: IAS (2014)

    Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: IAS (2014)

  4. [4]

    73–80 (2010)

    Alexe, B., Deselaers, T., Ferrari, V .: What is an object? In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2010). DOI 10.1109/ CVPR.2010.5540226

  5. [5]

    In: Computer Vision and Pattern Recognition (2014)

    Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing. In: Computer Vision and Pattern Recognition (2014)

  6. [6]

    IEEE Transactions on Robotics 33(3), 547–564 (2017)

    Asif, U., Bennamoun, M., Sohel, F.A.: Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Transactions on Robotics 33(3), 547–564 (2017). DOI 10.1109/TRO.2016.2638453

  7. [7]

    In: Proceedings of the 5th International Joint Conference on Artificial Intelligence - V olume 2, IJCAI’77, pp

    Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In: Proceedings of the 5th International Joint Conference on Artificial Intelligence - V olume 2, IJCAI’77, pp. 659–

  8. [8]

    URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

    Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1977). URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

  9. [9]

    In: European Conference on Computer Vision, pp

    Bleyer, M., Rhemann, C., Rother, C.: Extracting 3d scene-consistent object proposals and depth from stereo images. In: European Conference on Computer Vision, pp. 467–481. Springer (2012)

  10. [10]

    The International Journal of Robotics Research 33(4), 581–599 (2014)

    Bo, L., Ren, X., Fox, D.: Learning hierarchical sparse features for rgb-(d) object recognition. The International Journal of Robotics Research 33(4), 581–599 (2014)

  11. [11]

    In: BMVC (2009)

    Buch, N.E., Orwell, J., Velastin, S.A.: 3d extended histogram of oriented gradients (3dhog) for classification of road users in urban scenes. In: BMVC (2009)

  12. [12]

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

    Chen, H., Li, Y .: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

  13. [13]

    In: IEEE CVPR, vol

    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for au- tonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)

  14. [14]

    In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol

    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 vol. 1 (2005). DOI 10.1109/CVPR.2005.177

  15. [15]

    Lowe: Distinctive image features from scale-invariant keypoints

    David G. Lowe: Distinctive image features from scale-invariant keypoints. International Jour- nal of Computer Vision (IJCV) (2004)

  16. [16]

    In: Conference on Computer Vision and Pattern Recognition (CVPR), vol

    Deng, Z., Latecki, L.J.: Amodal detection of 3D objects: Inferring 3D bounding boxes from 2D ones in RGB-depth images. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 2 (2017)

  17. [17]

    Schapire, R.: Explaining AdaBoost, pp

    E. Schapire, R.: Explaining AdaBoost, pp. 37–52 (2013). DOI 10.1007/978-3-642-41136-6-5

  18. [18]

    Multimodal Deep Learning for Robust RGB-D Object Recognition

    Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M.A., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. CoRR abs/1507.06821 (2015). URL http: //arxiv.org/abs/1507.06821

  19. [19]

    In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp

    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: V ote3deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1355–1361. IEEE (2017)

  20. [20]

    In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

  21. [21]

    2004 Conference on Computer Vision and Pattern Recognition Workshop pp

    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training exam- ples: An incremental bayesian approach tested on 101 object categories. 2004 Conference on Computer Vision and Pattern Recognition Workshop pp. 178–178 (2004)

  22. [22]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with dis- criminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

  23. [23]

    In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for rgb-d salient object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2343–2350 (2016). DOI 10.1109/CVPR.2016.257

  24. [24]

    In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision bench- mark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

  25. [25]

    In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp

    Getto, R., Fellner, D.W.: 3d object retrieval with parametric templates. In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp. 47–54. Eurographics Association, Goslar Germany, Germany (2015). DOI 10.2312/3dor.20151054. URLhttps: //doi.org/10.2312/3dor.20151054

  26. [26]

    Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

    Gidaris, S., Komodakis, N.: Attend refine repeat: Active box proposal generation via in-out localization. CoRR abs/1606.04446 (2016). URL http://arxiv.org/abs/1606. 04446

  27. [27]

    Fast R-CNN

    Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). URL http://arxiv.org/ abs/1504.08083

  28. [28]

    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. Rep. 7694, Cal- ifornia Institute of Technology (2007). URL http://authors.library.caltech. edu/7694

  29. [29]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp

    Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3d models to rgb-d images of clut- tered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp. 4731–4740 (2015)

  30. [30]

    Learning Rich Features from RGB-D Images for Object Detection and Segmentation

    Gupta, S., Girshick, R.B., Arbelaez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. CoRRabs/1407.5736 (2014). URL http://arxiv. org/abs/1407.5736

  31. [31]

    IEEE Signal Processing Magazine 35(1), 84–100 (2018)

    Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine 35(1), 84–100 (2018). DOI 10.1109/MSP.2017.2749125

  32. [32]

    IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990)

    He, D., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990). DOI 10.1109/TGRS.1990.572934 28 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun

  33. [33]

    Deeply supervised salient object detection with short connections

    Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. CoRR abs/1611.04849 (2016). URL http://arxiv. org/abs/1611.04849

  34. [34]

    Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

    Jermyn, I.H., Kurtek, S., Laga, H., Srivastava, A.: Elastic shape analysis of three-dimensional objects. Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

  35. [35]

    In: European Conference on Computer Vision, pp

    Jiang, H.: Finding approximate convex shapes in rgbd images. In: European Conference on Computer Vision, pp. 582–596. Springer (2014)

  36. [36]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Jiang, H., Xiao, J.: A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2171–2178 (2013)

  37. [37]

    In: 2014 IEEE International Conference on Image Processing (ICIP), pp

    Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115– 1119 (2014). DOI 10.1109/ICIP.2014.7025222

  38. [38]

    Geometric Loss Functions for Camera Pose Regression with Deep Learning

    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learn- ing. CoRR abs/1704.00390 (2017). URL http://arxiv.org/abs/1704.00390

  39. [39]

    Morgan and Claypool Publishers (2018)

    Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A Guide to Convolutional Neural Networks for Computer Vision. Morgan and Claypool Publishers (2018)

  40. [40]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Khan, S.H., He, X., Bennamoun, M., Sohel, F., Togneri, R.: Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4603–4611 (2015)

  41. [41]

    In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012). URL http://dl.acm.org/citation.cfm?id=2999134.2999257

  42. [42]

    John Wiley & Sons (2018)

    Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. John Wiley & Sons (2018)

  43. [43]

    Wiley (2019)

    Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. Wiley (2019)

  44. [44]

    ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

    Laga, H., Mortara, M., Spagnuolo, M.: Geometry and context for semantic correspondences and functionality recognition in man-made 3d shapes. ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

  45. [45]

    IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

    Laga, H., Xie, Q., Jermyn, I.H., Srivastava, A.: Numerical inversion of srnf maps for elastic shape analysis of genus-zero surfaces. IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

  46. [46]

    In: The IEEE International Conference on Computer Vision (ICCV) (2017)

    Lahoud, J., Ghanem, B.: 2D-Driven 3D Object Detection in RGB-D Images. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

  47. [47]

    In: Consumer Depth Cameras for Computer Vision, pp

    Lai, K., Bo, L., Ren, X., Fox, D.: Rgb-d object recognition: Features, algorithms, and a large scale benchmark. In: Consumer Depth Cameras for Computer Vision, pp. 167–192. Springer (2013)

  48. [48]

    In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp

    Lei, Z., Chai, W., Zhao, S., Song, H., Li, F.: Saliency detection for rgb-d images using op- timization. In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp. 440–443 (2017). DOI 10.1109/ICCSE.2017.8085532

  49. [49]

    3D Fully Convolutional Network for Vehicle Detection in Point Cloud

    Li, B.: 3d fully convolutional network for vehicle detection in point cloud. CoRR abs/1611.08069 (2016). URL http://arxiv.org/abs/1611.08069

  50. [50]

    Vehicle Detection from 3D Lidar Using Fully Convolutional Network

    Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)

  51. [51]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017)

    Li, N., Ye, J., Ji, Y ., Ling, H., Yu, J.: Saliency detection on light field. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017). DOI 10.1109/TPAMI. 2016.2610425

  52. [52]

    In: Proceedings of the IEEE International Conference on Computer Vision, pp

    Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)

  53. [53]

    In: CVPR, vol

    Lin, T.Y ., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)

  54. [54]

    Fully Convolutional Networks for Semantic Segmentation

    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). URL http://arxiv.org/abs/1411.4038 3 A Survey on RGB-D image-based Object Detection 29

  55. [55]

    In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

    Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471– 3478 (2015). DOI 10.1109/ICRA.2015.7139679

  56. [56]

    In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp

    Maturana, D., Scherer, S.: V oxNet: A 3D Convolutional Neural Network for real-time object recognition. In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp. 922– 928 (2015)

  57. [57]

    In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

    Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

  58. [58]

    In: 2017 International Conference on Field Programmable Technology (ICFPT), pp

    Nakahara, H., Yonekawa, H., Sato, S.: An object detector based on multiscale sliding window search using a fully pipelined binarized cnn on an fpga. In: 2017 International Conference on Field Programmable Technology (ICFPT), pp. 168–175 (2017). DOI 10.1109/FPT.2017. 8280135

  59. [59]

    In: ECCV (2012)

    Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgb-d images. In: ECCV (2012)

  60. [60]

    In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp

    Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136 (2011). DOI 10.1109/ISMAR.2011.6092378

  61. [61]

    In: ECCV (2014)

    Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgb-d salient object detection: A benchmark and algorithms. In: ECCV (2014)

  62. [62]

    Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

    Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing for image segmentation and object proposal generation. In: arXiv:1503.00848 (2015)

  63. [63]

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  64. [64]

    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classifica- tion and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1(2), 4 (2017)

  65. [65]

    Volumetric and Multi-View CNNs for Object Classification on 3D Data

    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: V olumetric and multi-view cnns for object classification on 3d data. CoRR abs/1604.03265 (2016). URL http://arxiv. org/abs/1604.03265

  66. [66]

    In: Advances in Neural Information Processing Systems, pp

    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

  67. [67]

    IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

    Qu, L., He, S., Zhang, J., Tian, J., Tang, Y ., Yang, Q.: Rgb-d salient object detection via deep fusion. IEEE Transactions on Image Processing 26(5), 2274–2285 (2017). DOI 10.1109/TIP. 2017.2682981

  68. [68]

    In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp

    Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y .: Exploiting global priors for rgb-d saliency detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 25–32 (2015). DOI 10.1109/CVPRW.2015.7301391

  69. [69]

    In: Advances in neural information processing systems, pp

    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

  70. [70]

    Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). URL http://arxiv. org/abs/1506.01497

  71. [71]

    In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1525–1533 (2016). DOI 10.1109/CVPR.2016.169

  72. [72]

    In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp

    Rusinkiewicz, S., Levoy, M.: Efficient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001). DOI 10.1109/IM.2001.924423

  73. [73]

    International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015)

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition 30 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015). DOI 10.1007/s11263-015-0816-y

  74. [74]

    Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images

    Sahin, C., Kouskouridas, R., Kim, T.: Iterative hough forest with histogram of control points for 6 dof object registration from depth images. CoRR abs/1603.02617 (2016). URL http: //arxiv.org/abs/1603.02617

  75. [75]

    A Learning-based Variable Size Part Extraction Architecture for 6D Object Pose Recovery in Depth

    Sahin, C., Kouskouridas, R., Kim, T.: A learning-based variable size part extraction ar- chitecture for 6d object pose recovery in depth. CoRR abs/1701.02166 (2017). URL http://arxiv.org/abs/1701.02166

  76. [76]

    In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

    Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335 (2015). DOI 10.1109/ICRA.2015.7139363

  77. [77]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

    Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

  78. [78]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog- nition. arXiv preprint arXiv:1409.1556 (2014)

  79. [79]

    IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

    Song, H., Liu, Z., Xie, Y ., Wu, L., Huang, M.: RGBD co-saliency detection via bagging-based clustering. IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

  80. [80]

    In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–

Showing first 80 references.