RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques

Hamid Laga; Isaac Ronald Ward; Mohammed Bennamoun

arxiv: 1907.09236 · v1 · pith:AMQ4NANFnew · submitted 2019-07-22 · 💻 cs.CV · cs.LG

RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques

Isaac Ronald Ward , Hamid Laga , Mohammed Bennamoun This is my paper

Pith reviewed 2026-05-24 18:20 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords RGB-D object detectiondeep learninghand-crafted featurescomputer visionsurveymachine learning3D scannersperformance evaluation

0 comments

The pith

Deep learning techniques have revolutionized RGB-D object detection by achieving unprecedented performance levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews the progression of RGB-D object detection from traditional methods that combine hand-crafted features with machine learning algorithms to more recent deep learning approaches. It structures the review into two main parts, one for each category, summarizes the most common pipelines, and discusses benefits along with limitations. A reader would care because the work traces how large training datasets have enabled major gains in accuracy for applications like robotics, surveillance, and medical imaging.

Core claim

The paper surveys key contributions in RGB-D object detection and establishes that deep learning techniques, coupled with the availability of large training datasets, have revolutionized the field and achieved an unprecedented level of performance compared to earlier hand-crafted feature methods.

What carries the argument

The two-part structure that separates hand-crafted feature methods combined with machine learning from deep learning methods, used to compare pipelines, benefits, and limitations.

If this is right

Traditional methods rely on hand-crafted features paired with machine learning algorithms.
Deep learning approaches deliver higher performance when large datasets are available.
Common pipelines for each category are summarized for direct reference.
Benefits, limitations, and future research directions are identified for each type of method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The survey suggests that dataset size is a primary driver separating the performance of the two categories.
Applications such as medical diagnosis stand to gain from the same shift to deep learning seen in robotics.
Hybrid systems that combine hand-crafted features with deep networks could address remaining limitations in data-scarce settings.

Load-bearing premise

The selected papers form a representative and unbiased sample of the key contributions in both traditional and deep learning categories.

What would settle it

Discovery of a major RGB-D object detection paper or benchmark result that is omitted from the survey or directly contradicts the claimed performance gains from deep learning would challenge the review.

Figures

Figures reproduced from arXiv: 1907.09236 by Hamid Laga, Isaac Ronald Ward, Mohammed Bennamoun.

**Figure 3.1.** Figure 3.1: Illustration of some extrinsic challenges in object detection. (a) Objects of [PITH_FULL_IMAGE:figures/full_fig_p003_3_1.png] view at source ↗

**Figure 3.2.** Figure 3.2: Illustration of some intrinsic challenges in object detection. (a-c) Intra [PITH_FULL_IMAGE:figures/full_fig_p004_3_2.png] view at source ↗

**Figure 3.3.** Figure 3.3: Taxonomy of the state-of-the-art traditional and deep learning methods. [PITH_FULL_IMAGE:figures/full_fig_p006_3_3.png] view at source ↗

**Figure 3.4.** Figure 3.4: (a) Illustration of the Intersection over Union (IoU) metric in 2D. (b) Illus [PITH_FULL_IMAGE:figures/full_fig_p021_3_4.png] view at source ↗

read the original abstract

Object detection from RGB images is a long-standing problem in image processing and computer vision. It has applications in various domains including robotics, surveillance, human-computer interaction, and medical diagnosis. With the availability of low cost 3D scanners, a large number of RGB-D object detection approaches have been proposed in the past years. This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning. Deep learning techniques, coupled with the availability of large training datasets, have now revolutionized the field of computer vision, including RGB-D object detection, achieving an unprecedented level of performance. We survey the key contributions, summarize the most commonly used pipelines, discuss their benefits and limitations, and highlight some important directions for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A basic survey that splits RGB-D detection into hand-crafted and deep learning sections but provides no search protocol or new analysis.

read the letter

This is a literature survey on RGB-D object detection that divides the field into older hand-crafted feature methods and more recent deep learning approaches. It adds no new method, experiment, or resolution of an open problem. The abstract lays out a two-part structure and notes that deep learning plus large datasets has driven performance gains, which aligns with the broader trend in computer vision at the time of writing in 2019. The paper summarizes common pipelines, lists benefits and limitations of each category, and flags some future research directions. That organization can help someone map the subfield quickly. The main limitation is the absence of any stated literature search protocol, databases used, date range, or inclusion criteria. Without those details the claim of covering key contributions and most common pipelines cannot be checked for completeness or selection bias. The stress-test concern holds on the basis of the abstract; the full text would need to supply the missing protocol for the survey to be taken as representative. The work contains no equations, derivations, or fitted claims, so there is no circularity issue. It is aimed at readers who want an entry point into RGB-D detection rather than specialists seeking fresh results. A serious editor could send it for peer review because a well-executed survey still serves a navigation role, provided the authors add the search methodology and update coverage where gaps appear. I would not bring it to a reading group or cite it in my own work.

Referee Report

1 major / 1 minor

Summary. The manuscript is a survey chapter on RGB-D object detection. It divides the literature into two parts: (1) hand-crafted features combined with classical machine-learning algorithms and (2) deep-learning pipelines. The authors claim to survey key contributions, summarize commonly used pipelines, discuss benefits and limitations, and identify future directions, asserting that deep learning plus large datasets has revolutionized performance in the field.

Significance. If the coverage is representative, the survey would supply a structured entry point for researchers entering RGB-D detection, clarifying the transition from feature-engineering to end-to-end learning and highlighting open problems. The two-part organization and explicit discussion of limitations are useful organizational choices.

major comments (1)

[Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.

minor comments (1)

[Abstract] The abstract states that the chapter is structured into two parts, but the manuscript should explicitly label the corresponding sections (e.g., §2 and §3) so readers can locate the hand-crafted versus deep-learning material without ambiguity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the scope and transparency of our survey. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.

Authors: We agree that including an explicit description of the literature search protocol would improve transparency and allow readers to evaluate the survey's representativeness. In the revised manuscript we will insert a short subsection (or paragraph) in the introduction that specifies the databases queried (IEEE Xplore, ACM Digital Library, Google Scholar, arXiv), the primary search keywords and Boolean strings, the publication date range covered, and the inclusion/exclusion criteria used to identify key contributions and representative pipelines. This addition will directly support the claims made in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: survey contains no derivations, predictions, or self-referential claims

full rationale

This is a literature survey with no equations, fitted parameters, predictions, or first-principles derivations. The central claim that deep learning has revolutionized RGB-D detection is presented as a summary of external work rather than a result derived from the survey's own selection or citations. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support any load-bearing step. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey containing no new mathematical models, fitted parameters, axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5698 in / 944 out tokens · 16652 ms · 2026-05-24T18:20:31.715936+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 16 internal anchors

[1]

URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php

Cornell grasping dataset. URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php. Accessed: 2018-12-13

work page 2018
[2]

IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012)

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012). DOI 10.1109/TPAMI.2012.120

work page doi:10.1109/tpami.2012.120 2012
[3]

In: IAS (2014)

Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: IAS (2014)

work page 2014
[4]

73–80 (2010)

Alexe, B., Deselaers, T., Ferrari, V .: What is an object? In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2010). DOI 10.1109/ CVPR.2010.5540226

work page arXiv 2010
[5]

In: Computer Vision and Pattern Recognition (2014)

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing. In: Computer Vision and Pattern Recognition (2014)

work page 2014
[6]

IEEE Transactions on Robotics 33(3), 547–564 (2017)

Asif, U., Bennamoun, M., Sohel, F.A.: Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Transactions on Robotics 33(3), 547–564 (2017). DOI 10.1109/TRO.2016.2638453

work page doi:10.1109/tro.2016.2638453 2017
[7]

In: Proceedings of the 5th International Joint Conference on Artiﬁcial Intelligence - V olume 2, IJCAI’77, pp

Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In: Proceedings of the 5th International Joint Conference on Artiﬁcial Intelligence - V olume 2, IJCAI’77, pp. 659–

work page
[8]

URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1977). URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

work page arXiv 1977
[9]

In: European Conference on Computer Vision, pp

Bleyer, M., Rhemann, C., Rother, C.: Extracting 3d scene-consistent object proposals and depth from stereo images. In: European Conference on Computer Vision, pp. 467–481. Springer (2012)

work page 2012
[10]

The International Journal of Robotics Research 33(4), 581–599 (2014)

Bo, L., Ren, X., Fox, D.: Learning hierarchical sparse features for rgb-(d) object recognition. The International Journal of Robotics Research 33(4), 581–599 (2014)

work page 2014
[11]

In: BMVC (2009)

Buch, N.E., Orwell, J., Velastin, S.A.: 3d extended histogram of oriented gradients (3dhog) for classiﬁcation of road users in urban scenes. In: BMVC (2009)

work page 2009
[12]

In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

Chen, H., Li, Y .: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

work page 2018
[13]

In: IEEE CVPR, vol

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for au- tonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)

work page 2017
[14]

In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 vol. 1 (2005). DOI 10.1109/CVPR.2005.177

work page doi:10.1109/cvpr.2005.177 2005
[15]

Lowe: Distinctive image features from scale-invariant keypoints

David G. Lowe: Distinctive image features from scale-invariant keypoints. International Jour- nal of Computer Vision (IJCV) (2004)

work page 2004
[16]

In: Conference on Computer Vision and Pattern Recognition (CVPR), vol

Deng, Z., Latecki, L.J.: Amodal detection of 3D objects: Inferring 3D bounding boxes from 2D ones in RGB-depth images. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 2 (2017)

work page 2017
[17]

Schapire, R.: Explaining AdaBoost, pp

E. Schapire, R.: Explaining AdaBoost, pp. 37–52 (2013). DOI 10.1007/978-3-642-41136-6-5

work page doi:10.1007/978-3-642-41136-6-5 2013
[18]

Multimodal Deep Learning for Robust RGB-D Object Recognition

Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M.A., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. CoRR abs/1507.06821 (2015). URL http: //arxiv.org/abs/1507.06821

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp

Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: V ote3deep: Fast object detection in 3D point clouds using efﬁcient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1355–1361. IEEE (2017)

work page 2017
[20]

In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

work page 2010
[21]

2004 Conference on Computer Vision and Pattern Recognition Workshop pp

Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training exam- ples: An incremental bayesian approach tested on 101 object categories. 2004 Conference on Computer Vision and Pattern Recognition Workshop pp. 178–178 (2004)

work page 2004
[22]

IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with dis- criminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

work page 2010
[23]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for rgb-d salient object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2343–2350 (2016). DOI 10.1109/CVPR.2016.257

work page doi:10.1109/cvpr.2016.257 2016
[24]

In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision bench- mark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

work page 2012
[25]

In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp

Getto, R., Fellner, D.W.: 3d object retrieval with parametric templates. In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp. 47–54. Eurographics Association, Goslar Germany, Germany (2015). DOI 10.2312/3dor.20151054. URLhttps: //doi.org/10.2312/3dor.20151054

work page doi:10.2312/3dor.20151054 2015
[26]

Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

Gidaris, S., Komodakis, N.: Attend reﬁne repeat: Active box proposal generation via in-out localization. CoRR abs/1606.04446 (2016). URL http://arxiv.org/abs/1606. 04446

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

Fast R-CNN

Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). URL http://arxiv.org/ abs/1504.08083

work page internal anchor Pith review Pith/arXiv arXiv 2015
[28]

Grifﬁn, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. Rep. 7694, Cal- ifornia Institute of Technology (2007). URL http://authors.library.caltech. edu/7694

work page 2007
[29]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp

Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3d models to rgb-d images of clut- tered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp. 4731–4740 (2015)

work page 2015
[30]

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Gupta, S., Girshick, R.B., Arbelaez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. CoRRabs/1407.5736 (2014). URL http://arxiv. org/abs/1407.5736

work page internal anchor Pith review Pith/arXiv arXiv 2014
[31]

IEEE Signal Processing Magazine 35(1), 84–100 (2018)

Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-speciﬁc object detection: A survey. IEEE Signal Processing Magazine 35(1), 84–100 (2018). DOI 10.1109/MSP.2017.2749125

work page doi:10.1109/msp.2017.2749125 2018
[32]

IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990)

He, D., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990). DOI 10.1109/TGRS.1990.572934 28 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun

work page doi:10.1109/tgrs.1990.572934 1990
[33]

Deeply supervised salient object detection with short connections

Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. CoRR abs/1611.04849 (2016). URL http://arxiv. org/abs/1611.04849

work page internal anchor Pith review Pith/arXiv arXiv 2016
[34]

Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

Jermyn, I.H., Kurtek, S., Laga, H., Srivastava, A.: Elastic shape analysis of three-dimensional objects. Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

work page 2017
[35]

In: European Conference on Computer Vision, pp

Jiang, H.: Finding approximate convex shapes in rgbd images. In: European Conference on Computer Vision, pp. 582–596. Springer (2014)

work page 2014
[36]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Jiang, H., Xiao, J.: A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2171–2178 (2013)

work page 2013
[37]

In: 2014 IEEE International Conference on Image Processing (ICIP), pp

Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115– 1119 (2014). DOI 10.1109/ICIP.2014.7025222

work page doi:10.1109/icip.2014.7025222 2014
[38]

Geometric Loss Functions for Camera Pose Regression with Deep Learning

Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learn- ing. CoRR abs/1704.00390 (2017). URL http://arxiv.org/abs/1704.00390

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Morgan and Claypool Publishers (2018)

Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A Guide to Convolutional Neural Networks for Computer Vision. Morgan and Claypool Publishers (2018)

work page 2018
[40]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Khan, S.H., He, X., Bennamoun, M., Sohel, F., Togneri, R.: Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4603–4611 (2015)

work page 2015
[41]

In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012). URL http://dl.acm.org/citation.cfm?id=2999134.2999257

work page arXiv 2012
[42]

John Wiley & Sons (2018)

Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. John Wiley & Sons (2018)

work page 2018
[43]

Wiley (2019)

Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. Wiley (2019)

work page 2019
[44]

ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

Laga, H., Mortara, M., Spagnuolo, M.: Geometry and context for semantic correspondences and functionality recognition in man-made 3d shapes. ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

work page 2013
[45]

IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

Laga, H., Xie, Q., Jermyn, I.H., Srivastava, A.: Numerical inversion of srnf maps for elastic shape analysis of genus-zero surfaces. IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

work page 2017
[46]

In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Lahoud, J., Ghanem, B.: 2D-Driven 3D Object Detection in RGB-D Images. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

work page 2017
[47]

In: Consumer Depth Cameras for Computer Vision, pp

Lai, K., Bo, L., Ren, X., Fox, D.: Rgb-d object recognition: Features, algorithms, and a large scale benchmark. In: Consumer Depth Cameras for Computer Vision, pp. 167–192. Springer (2013)

work page 2013
[48]

In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp

Lei, Z., Chai, W., Zhao, S., Song, H., Li, F.: Saliency detection for rgb-d images using op- timization. In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp. 440–443 (2017). DOI 10.1109/ICCSE.2017.8085532

work page doi:10.1109/iccse.2017.8085532 2017
[49]

3D Fully Convolutional Network for Vehicle Detection in Point Cloud

Li, B.: 3d fully convolutional network for vehicle detection in point cloud. CoRR abs/1611.08069 (2016). URL http://arxiv.org/abs/1611.08069

work page internal anchor Pith review Pith/arXiv arXiv 2016
[50]

Vehicle Detection from 3D Lidar Using Fully Convolutional Network

Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[51]

IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017)

Li, N., Ye, J., Ji, Y ., Ling, H., Yu, J.: Saliency detection on light ﬁeld. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017). DOI 10.1109/TPAMI. 2016.2610425

work page doi:10.1109/tpami 2017
[52]

In: Proceedings of the IEEE International Conference on Computer Vision, pp

Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)

work page 2013
[53]

In: CVPR, vol

Lin, T.Y ., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)

work page 2017
[54]

Fully Convolutional Networks for Semantic Segmentation

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). URL http://arxiv.org/abs/1411.4038 3 A Survey on RGB-D image-based Object Detection 29

work page internal anchor Pith review Pith/arXiv arXiv 2014
[55]

In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471– 3478 (2015). DOI 10.1109/ICRA.2015.7139679

work page doi:10.1109/icra.2015.7139679 2015
[56]

In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp

Maturana, D., Scherer, S.: V oxNet: A 3D Convolutional Neural Network for real-time object recognition. In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp. 922– 928 (2015)

work page 2015
[57]

In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

Menze, M., Geiger, A.: Object scene ﬂow for autonomous vehicles. In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

work page 2015
[58]

In: 2017 International Conference on Field Programmable Technology (ICFPT), pp

Nakahara, H., Yonekawa, H., Sato, S.: An object detector based on multiscale sliding window search using a fully pipelined binarized cnn on an fpga. In: 2017 International Conference on Field Programmable Technology (ICFPT), pp. 168–175 (2017). DOI 10.1109/FPT.2017. 8280135

work page doi:10.1109/fpt.2017 2017
[59]

In: ECCV (2012)

Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgb-d images. In: ECCV (2012)

work page 2012
[60]

In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp

Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136 (2011). DOI 10.1109/ISMAR.2011.6092378

work page doi:10.1109/ismar.2011.6092378 2011
[61]

In: ECCV (2014)

Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgb-d salient object detection: A benchmark and algorithms. In: ECCV (2014)

work page 2014
[62]

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing for image segmentation and object proposal generation. In: arXiv:1503.00848 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[63]

In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

work page 2018
[64]

Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classiﬁca- tion and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1(2), 4 (2017)

work page 2017
[65]

Volumetric and Multi-View CNNs for Object Classification on 3D Data

Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: V olumetric and multi-view cnns for object classiﬁcation on 3d data. CoRR abs/1604.03265 (2016). URL http://arxiv. org/abs/1604.03265

work page internal anchor Pith review Pith/arXiv arXiv 2016
[66]

In: Advances in Neural Information Processing Systems, pp

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

work page 2017
[67]

IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

Qu, L., He, S., Zhang, J., Tian, J., Tang, Y ., Yang, Q.: Rgb-d salient object detection via deep fusion. IEEE Transactions on Image Processing 26(5), 2274–2285 (2017). DOI 10.1109/TIP. 2017.2682981

work page doi:10.1109/tip 2017
[68]

In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp

Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y .: Exploiting global priors for rgb-d saliency detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 25–32 (2015). DOI 10.1109/CVPRW.2015.7301391

work page doi:10.1109/cvprw.2015.7301391 2015
[69]

In: Advances in neural information processing systems, pp

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

work page 2015
[70]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). URL http://arxiv. org/abs/1506.01497

work page internal anchor Pith review Pith/arXiv arXiv 2015
[71]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1525–1533 (2016). DOI 10.1109/CVPR.2016.169

work page doi:10.1109/cvpr.2016.169 2016
[72]

In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp

Rusinkiewicz, S., Levoy, M.: Efﬁcient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001). DOI 10.1109/IM.2001.924423

work page doi:10.1109/im.2001.924423 2001
[73]

International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015)

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition 30 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015). DOI 10.1007/s11263-015-0816-y

work page doi:10.1007/s11263-015-0816-y 2015
[74]

Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images

Sahin, C., Kouskouridas, R., Kim, T.: Iterative hough forest with histogram of control points for 6 dof object registration from depth images. CoRR abs/1603.02617 (2016). URL http: //arxiv.org/abs/1603.02617

work page internal anchor Pith review Pith/arXiv arXiv 2016
[75]

A Learning-based Variable Size Part Extraction Architecture for 6D Object Pose Recovery in Depth

Sahin, C., Kouskouridas, R., Kim, T.: A learning-based variable size part extraction ar- chitecture for 6d object pose recovery in depth. CoRR abs/1701.02166 (2017). URL http://arxiv.org/abs/1701.02166

work page internal anchor Pith review Pith/arXiv arXiv 2017
[76]

In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335 (2015). DOI 10.1109/ICRA.2015.7139363

work page doi:10.1109/icra.2015.7139363 2015
[77]

IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efﬁcient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

work page 2013
[78]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog- nition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[79]

IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

Song, H., Liu, Z., Xie, Y ., Wu, L., Huang, M.: RGBD co-saliency detection via bagging-based clustering. IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

work page 2016
[80]

In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–

work page 2015

Showing first 80 references.

[1] [1]

URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php

Cornell grasping dataset. URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php. Accessed: 2018-12-13

work page 2018

[2] [2]

IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012)

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012). DOI 10.1109/TPAMI.2012.120

work page doi:10.1109/tpami.2012.120 2012

[3] [3]

In: IAS (2014)

Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: IAS (2014)

work page 2014

[4] [4]

73–80 (2010)

Alexe, B., Deselaers, T., Ferrari, V .: What is an object? In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2010). DOI 10.1109/ CVPR.2010.5540226

work page arXiv 2010

[5] [5]

In: Computer Vision and Pattern Recognition (2014)

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing. In: Computer Vision and Pattern Recognition (2014)

work page 2014

[6] [6]

IEEE Transactions on Robotics 33(3), 547–564 (2017)

Asif, U., Bennamoun, M., Sohel, F.A.: Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Transactions on Robotics 33(3), 547–564 (2017). DOI 10.1109/TRO.2016.2638453

work page doi:10.1109/tro.2016.2638453 2017

[7] [7]

In: Proceedings of the 5th International Joint Conference on Artiﬁcial Intelligence - V olume 2, IJCAI’77, pp

Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In: Proceedings of the 5th International Joint Conference on Artiﬁcial Intelligence - V olume 2, IJCAI’77, pp. 659–

work page

[8] [8]

URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1977). URL http: //dl.acm.org/citation.cfm?id=1622943.1622971

work page arXiv 1977

[9] [9]

In: European Conference on Computer Vision, pp

Bleyer, M., Rhemann, C., Rother, C.: Extracting 3d scene-consistent object proposals and depth from stereo images. In: European Conference on Computer Vision, pp. 467–481. Springer (2012)

work page 2012

[10] [10]

The International Journal of Robotics Research 33(4), 581–599 (2014)

Bo, L., Ren, X., Fox, D.: Learning hierarchical sparse features for rgb-(d) object recognition. The International Journal of Robotics Research 33(4), 581–599 (2014)

work page 2014

[11] [11]

In: BMVC (2009)

Buch, N.E., Orwell, J., Velastin, S.A.: 3d extended histogram of oriented gradients (3dhog) for classiﬁcation of road users in urban scenes. In: BMVC (2009)

work page 2009

[12] [12]

In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

Chen, H., Li, Y .: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27

work page 2018

[13] [13]

In: IEEE CVPR, vol

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for au- tonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)

work page 2017

[14] [14]

In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 vol. 1 (2005). DOI 10.1109/CVPR.2005.177

work page doi:10.1109/cvpr.2005.177 2005

[15] [15]

Lowe: Distinctive image features from scale-invariant keypoints

David G. Lowe: Distinctive image features from scale-invariant keypoints. International Jour- nal of Computer Vision (IJCV) (2004)

work page 2004

[16] [16]

In: Conference on Computer Vision and Pattern Recognition (CVPR), vol

Deng, Z., Latecki, L.J.: Amodal detection of 3D objects: Inferring 3D bounding boxes from 2D ones in RGB-depth images. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 2 (2017)

work page 2017

[17] [17]

Schapire, R.: Explaining AdaBoost, pp

E. Schapire, R.: Explaining AdaBoost, pp. 37–52 (2013). DOI 10.1007/978-3-642-41136-6-5

work page doi:10.1007/978-3-642-41136-6-5 2013

[18] [18]

Multimodal Deep Learning for Robust RGB-D Object Recognition

Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M.A., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. CoRR abs/1507.06821 (2015). URL http: //arxiv.org/abs/1507.06821

work page internal anchor Pith review Pith/arXiv arXiv 2015

[19] [19]

In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp

Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: V ote3deep: Fast object detection in 3D point clouds using efﬁcient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1355–1361. IEEE (2017)

work page 2017

[20] [20]

In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)

work page 2010

[21] [21]

2004 Conference on Computer Vision and Pattern Recognition Workshop pp

Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training exam- ples: An incremental bayesian approach tested on 101 object categories. 2004 Conference on Computer Vision and Pattern Recognition Workshop pp. 178–178 (2004)

work page 2004

[22] [22]

IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with dis- criminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)

work page 2010

[23] [23]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for rgb-d salient object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2343–2350 (2016). DOI 10.1109/CVPR.2016.257

work page doi:10.1109/cvpr.2016.257 2016

[24] [24]

In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision bench- mark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

work page 2012

[25] [25]

In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp

Getto, R., Fellner, D.W.: 3d object retrieval with parametric templates. In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp. 47–54. Eurographics Association, Goslar Germany, Germany (2015). DOI 10.2312/3dor.20151054. URLhttps: //doi.org/10.2312/3dor.20151054

work page doi:10.2312/3dor.20151054 2015

[26] [26]

Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

Gidaris, S., Komodakis, N.: Attend reﬁne repeat: Active box proposal generation via in-out localization. CoRR abs/1606.04446 (2016). URL http://arxiv.org/abs/1606. 04446

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

Fast R-CNN

Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). URL http://arxiv.org/ abs/1504.08083

work page internal anchor Pith review Pith/arXiv arXiv 2015

[28] [28]

Grifﬁn, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. Rep. 7694, Cal- ifornia Institute of Technology (2007). URL http://authors.library.caltech. edu/7694

work page 2007

[29] [29]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp

Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3d models to rgb-d images of clut- tered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp. 4731–4740 (2015)

work page 2015

[30] [30]

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Gupta, S., Girshick, R.B., Arbelaez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. CoRRabs/1407.5736 (2014). URL http://arxiv. org/abs/1407.5736

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [31]

IEEE Signal Processing Magazine 35(1), 84–100 (2018)

Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-speciﬁc object detection: A survey. IEEE Signal Processing Magazine 35(1), 84–100 (2018). DOI 10.1109/MSP.2017.2749125

work page doi:10.1109/msp.2017.2749125 2018

[32] [32]

IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990)

He, D., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990). DOI 10.1109/TGRS.1990.572934 28 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun

work page doi:10.1109/tgrs.1990.572934 1990

[33] [33]

Deeply supervised salient object detection with short connections

Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. CoRR abs/1611.04849 (2016). URL http://arxiv. org/abs/1611.04849

work page internal anchor Pith review Pith/arXiv arXiv 2016

[34] [34]

Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

Jermyn, I.H., Kurtek, S., Laga, H., Srivastava, A.: Elastic shape analysis of three-dimensional objects. Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)

work page 2017

[35] [35]

In: European Conference on Computer Vision, pp

Jiang, H.: Finding approximate convex shapes in rgbd images. In: European Conference on Computer Vision, pp. 582–596. Springer (2014)

work page 2014

[36] [36]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Jiang, H., Xiao, J.: A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2171–2178 (2013)

work page 2013

[37] [37]

In: 2014 IEEE International Conference on Image Processing (ICIP), pp

Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115– 1119 (2014). DOI 10.1109/ICIP.2014.7025222

work page doi:10.1109/icip.2014.7025222 2014

[38] [38]

Geometric Loss Functions for Camera Pose Regression with Deep Learning

Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learn- ing. CoRR abs/1704.00390 (2017). URL http://arxiv.org/abs/1704.00390

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Morgan and Claypool Publishers (2018)

Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A Guide to Convolutional Neural Networks for Computer Vision. Morgan and Claypool Publishers (2018)

work page 2018

[40] [40]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Khan, S.H., He, X., Bennamoun, M., Sohel, F., Togneri, R.: Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4603–4611 (2015)

work page 2015

[41] [41]

In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012). URL http://dl.acm.org/citation.cfm?id=2999134.2999257

work page arXiv 2012

[42] [42]

John Wiley & Sons (2018)

Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. John Wiley & Sons (2018)

work page 2018

[43] [43]

Wiley (2019)

Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. Wiley (2019)

work page 2019

[44] [44]

ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

Laga, H., Mortara, M., Spagnuolo, M.: Geometry and context for semantic correspondences and functionality recognition in man-made 3d shapes. ACM Transactions on Graphics (TOG) 32(5), 150 (2013)

work page 2013

[45] [45]

IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

Laga, H., Xie, Q., Jermyn, I.H., Srivastava, A.: Numerical inversion of srnf maps for elastic shape analysis of genus-zero surfaces. IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)

work page 2017

[46] [46]

In: The IEEE International Conference on Computer Vision (ICCV) (2017)

Lahoud, J., Ghanem, B.: 2D-Driven 3D Object Detection in RGB-D Images. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

work page 2017

[47] [47]

In: Consumer Depth Cameras for Computer Vision, pp

Lai, K., Bo, L., Ren, X., Fox, D.: Rgb-d object recognition: Features, algorithms, and a large scale benchmark. In: Consumer Depth Cameras for Computer Vision, pp. 167–192. Springer (2013)

work page 2013

[48] [48]

In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp

Lei, Z., Chai, W., Zhao, S., Song, H., Li, F.: Saliency detection for rgb-d images using op- timization. In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp. 440–443 (2017). DOI 10.1109/ICCSE.2017.8085532

work page doi:10.1109/iccse.2017.8085532 2017

[49] [49]

3D Fully Convolutional Network for Vehicle Detection in Point Cloud

Li, B.: 3d fully convolutional network for vehicle detection in point cloud. CoRR abs/1611.08069 (2016). URL http://arxiv.org/abs/1611.08069

work page internal anchor Pith review Pith/arXiv arXiv 2016

[50] [50]

Vehicle Detection from 3D Lidar Using Fully Convolutional Network

Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[51] [51]

IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017)

Li, N., Ye, J., Ji, Y ., Ling, H., Yu, J.: Saliency detection on light ﬁeld. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017). DOI 10.1109/TPAMI. 2016.2610425

work page doi:10.1109/tpami 2017

[52] [52]

In: Proceedings of the IEEE International Conference on Computer Vision, pp

Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)

work page 2013

[53] [53]

In: CVPR, vol

Lin, T.Y ., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)

work page 2017

[54] [54]

Fully Convolutional Networks for Semantic Segmentation

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). URL http://arxiv.org/abs/1411.4038 3 A Survey on RGB-D image-based Object Detection 29

work page internal anchor Pith review Pith/arXiv arXiv 2014

[55] [55]

In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471– 3478 (2015). DOI 10.1109/ICRA.2015.7139679

work page doi:10.1109/icra.2015.7139679 2015

[56] [56]

In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp

Maturana, D., Scherer, S.: V oxNet: A 3D Convolutional Neural Network for real-time object recognition. In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp. 922– 928 (2015)

work page 2015

[57] [57]

In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

Menze, M., Geiger, A.: Object scene ﬂow for autonomous vehicles. In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)

work page 2015

[58] [58]

In: 2017 International Conference on Field Programmable Technology (ICFPT), pp

Nakahara, H., Yonekawa, H., Sato, S.: An object detector based on multiscale sliding window search using a fully pipelined binarized cnn on an fpga. In: 2017 International Conference on Field Programmable Technology (ICFPT), pp. 168–175 (2017). DOI 10.1109/FPT.2017. 8280135

work page doi:10.1109/fpt.2017 2017

[59] [59]

In: ECCV (2012)

Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgb-d images. In: ECCV (2012)

work page 2012

[60] [60]

In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp

Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136 (2011). DOI 10.1109/ISMAR.2011.6092378

work page doi:10.1109/ismar.2011.6092378 2011

[61] [61]

In: ECCV (2014)

Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgb-d salient object detection: A benchmark and algorithms. In: ECCV (2014)

work page 2014

[62] [62]

Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation

Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing for image segmentation and object proposal generation. In: arXiv:1503.00848 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[63] [63]

In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

work page 2018

[64] [64]

Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classiﬁca- tion and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1(2), 4 (2017)

work page 2017

[65] [65]

Volumetric and Multi-View CNNs for Object Classification on 3D Data

Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: V olumetric and multi-view cnns for object classiﬁcation on 3d data. CoRR abs/1604.03265 (2016). URL http://arxiv. org/abs/1604.03265

work page internal anchor Pith review Pith/arXiv arXiv 2016

[66] [66]

In: Advances in Neural Information Processing Systems, pp

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

work page 2017

[67] [67]

IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

Qu, L., He, S., Zhang, J., Tian, J., Tang, Y ., Yang, Q.: Rgb-d salient object detection via deep fusion. IEEE Transactions on Image Processing 26(5), 2274–2285 (2017). DOI 10.1109/TIP. 2017.2682981

work page doi:10.1109/tip 2017

[68] [68]

In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp

Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y .: Exploiting global priors for rgb-d saliency detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 25–32 (2015). DOI 10.1109/CVPRW.2015.7301391

work page doi:10.1109/cvprw.2015.7301391 2015

[69] [69]

In: Advances in neural information processing systems, pp

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

work page 2015

[70] [70]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). URL http://arxiv. org/abs/1506.01497

work page internal anchor Pith review Pith/arXiv arXiv 2015

[71] [71]

In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1525–1533 (2016). DOI 10.1109/CVPR.2016.169

work page doi:10.1109/cvpr.2016.169 2016

[72] [72]

In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp

Rusinkiewicz, S., Levoy, M.: Efﬁcient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001). DOI 10.1109/IM.2001.924423

work page doi:10.1109/im.2001.924423 2001

[73] [73]

International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015)

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition 30 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015). DOI 10.1007/s11263-015-0816-y

work page doi:10.1007/s11263-015-0816-y 2015

[74] [74]

Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images

Sahin, C., Kouskouridas, R., Kim, T.: Iterative hough forest with histogram of control points for 6 dof object registration from depth images. CoRR abs/1603.02617 (2016). URL http: //arxiv.org/abs/1603.02617

work page internal anchor Pith review Pith/arXiv arXiv 2016

[75] [75]

A Learning-based Variable Size Part Extraction Architecture for 6D Object Pose Recovery in Depth

Sahin, C., Kouskouridas, R., Kim, T.: A learning-based variable size part extraction ar- chitecture for 6d object pose recovery in depth. CoRR abs/1701.02166 (2017). URL http://arxiv.org/abs/1701.02166

work page internal anchor Pith review Pith/arXiv arXiv 2017

[76] [76]

In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp

Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335 (2015). DOI 10.1109/ICRA.2015.7139363

work page doi:10.1109/icra.2015.7139363 2015

[77] [77]

IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efﬁcient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)

work page 2013

[78] [78]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog- nition. arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[79] [79]

IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

Song, H., Liu, Z., Xie, Y ., Wu, L., Huang, M.: RGBD co-saliency detection via bagging-based clustering. IEEE Signal Processing Letters 23(12), 1722–1726 (2016)

work page 2016

[80] [80]

In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–

work page 2015