RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques
Pith reviewed 2026-05-24 18:20 UTC · model grok-4.3
The pith
Deep learning techniques have revolutionized RGB-D object detection by achieving unprecedented performance levels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper surveys key contributions in RGB-D object detection and establishes that deep learning techniques, coupled with the availability of large training datasets, have revolutionized the field and achieved an unprecedented level of performance compared to earlier hand-crafted feature methods.
What carries the argument
The two-part structure that separates hand-crafted feature methods combined with machine learning from deep learning methods, used to compare pipelines, benefits, and limitations.
If this is right
- Traditional methods rely on hand-crafted features paired with machine learning algorithms.
- Deep learning approaches deliver higher performance when large datasets are available.
- Common pipelines for each category are summarized for direct reference.
- Benefits, limitations, and future research directions are identified for each type of method.
Where Pith is reading between the lines
- The survey suggests that dataset size is a primary driver separating the performance of the two categories.
- Applications such as medical diagnosis stand to gain from the same shift to deep learning seen in robotics.
- Hybrid systems that combine hand-crafted features with deep networks could address remaining limitations in data-scarce settings.
Load-bearing premise
The selected papers form a representative and unbiased sample of the key contributions in both traditional and deep learning categories.
What would settle it
Discovery of a major RGB-D object detection paper or benchmark result that is omitted from the survey or directly contradicts the claimed performance gains from deep learning would challenge the review.
Figures
read the original abstract
Object detection from RGB images is a long-standing problem in image processing and computer vision. It has applications in various domains including robotics, surveillance, human-computer interaction, and medical diagnosis. With the availability of low cost 3D scanners, a large number of RGB-D object detection approaches have been proposed in the past years. This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning. Deep learning techniques, coupled with the availability of large training datasets, have now revolutionized the field of computer vision, including RGB-D object detection, achieving an unprecedented level of performance. We survey the key contributions, summarize the most commonly used pipelines, discuss their benefits and limitations, and highlight some important directions for future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey chapter on RGB-D object detection. It divides the literature into two parts: (1) hand-crafted features combined with classical machine-learning algorithms and (2) deep-learning pipelines. The authors claim to survey key contributions, summarize commonly used pipelines, discuss benefits and limitations, and identify future directions, asserting that deep learning plus large datasets has revolutionized performance in the field.
Significance. If the coverage is representative, the survey would supply a structured entry point for researchers entering RGB-D detection, clarifying the transition from feature-engineering to end-to-end learning and highlighting open problems. The two-part organization and explicit discussion of limitations are useful organizational choices.
major comments (1)
- [Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.
minor comments (1)
- [Abstract] The abstract states that the chapter is structured into two parts, but the manuscript should explicitly label the corresponding sections (e.g., §2 and §3) so readers can locate the hand-crafted versus deep-learning material without ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the scope and transparency of our survey. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the chapter provides a 'comprehensive survey' of 'key contributions' and 'most commonly used pipelines' is not supported by any description of the literature-search protocol, databases queried, date range, or inclusion/exclusion rules. Without these elements the representativeness of the cited works cannot be verified, undermining the central assertion of scope.
Authors: We agree that including an explicit description of the literature search protocol would improve transparency and allow readers to evaluate the survey's representativeness. In the revised manuscript we will insert a short subsection (or paragraph) in the introduction that specifies the databases queried (IEEE Xplore, ACM Digital Library, Google Scholar, arXiv), the primary search keywords and Boolean strings, the publication date range covered, and the inclusion/exclusion criteria used to identify key contributions and representative pipelines. This addition will directly support the claims made in the abstract. revision: yes
Circularity Check
No circularity: survey contains no derivations, predictions, or self-referential claims
full rationale
This is a literature survey with no equations, fitted parameters, predictions, or first-principles derivations. The central claim that deep learning has revolutionized RGB-D detection is presented as a summary of external work rather than a result derived from the survey's own selection or citations. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support any load-bearing step. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This chapter provides a comprehensive survey of the recent developments in this field. We structure the chapter into two parts; the focus of the first part is on techniques that are based on hand-crafted features combined with machine learning algorithms. The focus of the second part is on the more recent work, which is based on deep learning.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php
Cornell grasping dataset. URL http://pr.cs.cornell.edu/grasping/rect_ data/data.php. Accessed: 2018-12-13
work page 2018
-
[2]
IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(11), 2274–2282 (2012). DOI 10.1109/TPAMI.2012.120
-
[3]
Alexandre, L.A.: 3d object recognition using convolutional neural networks with transfer learning between input channels. In: IAS (2014)
work page 2014
-
[4]
Alexe, B., Deselaers, T., Ferrari, V .: What is an object? In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2010). DOI 10.1109/ CVPR.2010.5540226
-
[5]
In: Computer Vision and Pattern Recognition (2014)
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing. In: Computer Vision and Pattern Recognition (2014)
work page 2014
-
[6]
IEEE Transactions on Robotics 33(3), 547–564 (2017)
Asif, U., Bennamoun, M., Sohel, F.A.: Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Transactions on Robotics 33(3), 547–564 (2017). DOI 10.1109/TRO.2016.2638453
-
[7]
Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In: Proceedings of the 5th International Joint Conference on Artificial Intelligence - V olume 2, IJCAI’77, pp. 659–
-
[8]
URL http: //dl.acm.org/citation.cfm?id=1622943.1622971
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1977). URL http: //dl.acm.org/citation.cfm?id=1622943.1622971
-
[9]
In: European Conference on Computer Vision, pp
Bleyer, M., Rhemann, C., Rother, C.: Extracting 3d scene-consistent object proposals and depth from stereo images. In: European Conference on Computer Vision, pp. 467–481. Springer (2012)
work page 2012
-
[10]
The International Journal of Robotics Research 33(4), 581–599 (2014)
Bo, L., Ren, X., Fox, D.: Learning hierarchical sparse features for rgb-(d) object recognition. The International Journal of Robotics Research 33(4), 581–599 (2014)
work page 2014
-
[11]
Buch, N.E., Orwell, J., Velastin, S.A.: 3d extended histogram of oriented gradients (3dhog) for classification of road users in urban scenes. In: BMVC (2009)
work page 2009
-
[12]
Chen, H., Li, Y .: Progressively complementarity-aware fusion network for rgb-d salient object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 3 A Survey on RGB-D image-based Object Detection 27
work page 2018
-
[13]
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for au- tonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)
work page 2017
-
[14]
In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 vol. 1 (2005). DOI 10.1109/CVPR.2005.177
-
[15]
Lowe: Distinctive image features from scale-invariant keypoints
David G. Lowe: Distinctive image features from scale-invariant keypoints. International Jour- nal of Computer Vision (IJCV) (2004)
work page 2004
-
[16]
In: Conference on Computer Vision and Pattern Recognition (CVPR), vol
Deng, Z., Latecki, L.J.: Amodal detection of 3D objects: Inferring 3D bounding boxes from 2D ones in RGB-depth images. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 2 (2017)
work page 2017
-
[17]
Schapire, R.: Explaining AdaBoost, pp
E. Schapire, R.: Explaining AdaBoost, pp. 37–52 (2013). DOI 10.1007/978-3-642-41136-6-5
-
[18]
Multimodal Deep Learning for Robust RGB-D Object Recognition
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M.A., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. CoRR abs/1507.06821 (2015). URL http: //arxiv.org/abs/1507.06821
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[19]
In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp
Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: V ote3deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In: Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1355–1361. IEEE (2017)
work page 2017
-
[20]
In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. In: 2010 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR) (2010)
work page 2010
-
[21]
2004 Conference on Computer Vision and Pattern Recognition Workshop pp
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training exam- ples: An incremental bayesian approach tested on 101 object categories. 2004 Conference on Computer Vision and Pattern Recognition Workshop pp. 178–178 (2004)
work page 2004
-
[22]
IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with dis- criminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)
work page 2010
-
[23]
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp
Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for rgb-d salient object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2343–2350 (2016). DOI 10.1109/CVPR.2016.257
-
[24]
In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision bench- mark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
work page 2012
-
[25]
In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp
Getto, R., Fellner, D.W.: 3d object retrieval with parametric templates. In: Proceedings of the 2015 Eurographics Workshop on 3D Object Retrieval, 3DOR ’15, pp. 47–54. Eurographics Association, Goslar Germany, Germany (2015). DOI 10.2312/3dor.20151054. URLhttps: //doi.org/10.2312/3dor.20151054
-
[26]
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Gidaris, S., Komodakis, N.: Attend refine repeat: Active box proposal generation via in-out localization. CoRR abs/1606.04446 (2016). URL http://arxiv.org/abs/1606. 04446
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[27]
Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). URL http://arxiv.org/ abs/1504.08083
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[28]
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. Rep. 7694, Cal- ifornia Institute of Technology (2007). URL http://authors.library.caltech. edu/7694
work page 2007
-
[29]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3d models to rgb-d images of clut- tered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog- nition, pp. 4731–4740 (2015)
work page 2015
-
[30]
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Gupta, S., Girshick, R.B., Arbelaez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. CoRRabs/1407.5736 (2014). URL http://arxiv. org/abs/1407.5736
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[31]
IEEE Signal Processing Magazine 35(1), 84–100 (2018)
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine 35(1), 84–100 (2018). DOI 10.1109/MSP.2017.2749125
-
[32]
IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990)
He, D., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Transactions on Geoscience and Remote Sensing 28(4), 509–512 (1990). DOI 10.1109/TGRS.1990.572934 28 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun
-
[33]
Deeply supervised salient object detection with short connections
Hou, Q., Cheng, M., Hu, X., Borji, A., Tu, Z., Torr, P.H.S.: Deeply supervised salient object detection with short connections. CoRR abs/1611.04849 (2016). URL http://arxiv. org/abs/1611.04849
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)
Jermyn, I.H., Kurtek, S., Laga, H., Srivastava, A.: Elastic shape analysis of three-dimensional objects. Synthesis Lectures on Computer Vision 12(1), 1–185 (2017)
work page 2017
-
[35]
In: European Conference on Computer Vision, pp
Jiang, H.: Finding approximate convex shapes in rgbd images. In: European Conference on Computer Vision, pp. 582–596. Springer (2014)
work page 2014
-
[36]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Jiang, H., Xiao, J.: A linear approach to matching cuboids in rgbd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2171–2178 (2013)
work page 2013
-
[37]
In: 2014 IEEE International Conference on Image Processing (ICIP), pp
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 1115– 1119 (2014). DOI 10.1109/ICIP.2014.7025222
-
[38]
Geometric Loss Functions for Camera Pose Regression with Deep Learning
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learn- ing. CoRR abs/1704.00390 (2017). URL http://arxiv.org/abs/1704.00390
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Morgan and Claypool Publishers (2018)
Khan, S., Rahmani, H., Shah, S.A.A., Bennamoun, M.: A Guide to Convolutional Neural Networks for Computer Vision. Morgan and Claypool Publishers (2018)
work page 2018
-
[40]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
Khan, S.H., He, X., Bennamoun, M., Sohel, F., Togneri, R.: Separating objects and clutter in indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4603–4611 (2015)
work page 2015
-
[41]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems - V olume 1, NIPS’12, pp. 1097–1105. Curran Associates Inc., USA (2012). URL http://dl.acm.org/citation.cfm?id=2999134.2999257
-
[42]
Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. John Wiley & Sons (2018)
work page 2018
-
[43]
Laga, H., Guo, Y ., Tabia, H., Fisher, R.B., Bennamoun, M.: 3D Shape Analysis: Fundamentals, Theory, and Applications. Wiley (2019)
work page 2019
-
[44]
ACM Transactions on Graphics (TOG) 32(5), 150 (2013)
Laga, H., Mortara, M., Spagnuolo, M.: Geometry and context for semantic correspondences and functionality recognition in man-made 3d shapes. ACM Transactions on Graphics (TOG) 32(5), 150 (2013)
work page 2013
-
[45]
IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)
Laga, H., Xie, Q., Jermyn, I.H., Srivastava, A.: Numerical inversion of srnf maps for elastic shape analysis of genus-zero surfaces. IEEE transactions on pattern analysis and machine intelligence 39(12), 2451–2464 (2017)
work page 2017
-
[46]
In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Lahoud, J., Ghanem, B.: 2D-Driven 3D Object Detection in RGB-D Images. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
work page 2017
-
[47]
In: Consumer Depth Cameras for Computer Vision, pp
Lai, K., Bo, L., Ren, X., Fox, D.: Rgb-d object recognition: Features, algorithms, and a large scale benchmark. In: Consumer Depth Cameras for Computer Vision, pp. 167–192. Springer (2013)
work page 2013
-
[48]
In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp
Lei, Z., Chai, W., Zhao, S., Song, H., Li, F.: Saliency detection for rgb-d images using op- timization. In: 2017 12th International Conference on Computer Science and Education (ICCSE), pp. 440–443 (2017). DOI 10.1109/ICCSE.2017.8085532
-
[49]
3D Fully Convolutional Network for Vehicle Detection in Point Cloud
Li, B.: 3d fully convolutional network for vehicle detection in point cloud. CoRR abs/1611.08069 (2016). URL http://arxiv.org/abs/1611.08069
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[50]
Vehicle Detection from 3D Lidar Using Fully Convolutional Network
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[51]
IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017)
Li, N., Ye, J., Ji, Y ., Ling, H., Yu, J.: Saliency detection on light field. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8), 1605–1616 (2017). DOI 10.1109/TPAMI. 2016.2610425
-
[52]
In: Proceedings of the IEEE International Conference on Computer Vision, pp
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1417–1424 (2013)
work page 2013
-
[53]
Lin, T.Y ., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, vol. 1, p. 4 (2017)
work page 2017
-
[54]
Fully Convolutional Networks for Semantic Segmentation
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). URL http://arxiv.org/abs/1411.4038 3 A Survey on RGB-D image-based Object Detection 29
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[55]
In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp
Maturana, D., Scherer, S.: 3d convolutional neural networks for landing zone detection from lidar. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471– 3478 (2015). DOI 10.1109/ICRA.2015.7139679
-
[56]
In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp
Maturana, D., Scherer, S.: V oxNet: A 3D Convolutional Neural Network for real-time object recognition. In: Ieee/rsj International Conference on Intelligent Robots and Systems, pp. 922– 928 (2015)
work page 2015
-
[57]
In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Conference on Com- puter Vision and Pattern Recognition (CVPR) (2015)
work page 2015
-
[58]
In: 2017 International Conference on Field Programmable Technology (ICFPT), pp
Nakahara, H., Yonekawa, H., Sato, S.: An object detector based on multiscale sliding window search using a fully pipelined binarized cnn on an fpga. In: 2017 International Conference on Field Programmable Technology (ICFPT), pp. 168–175 (2017). DOI 10.1109/FPT.2017. 8280135
-
[59]
Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from rgb-d images. In: ECCV (2012)
work page 2012
-
[60]
In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136 (2011). DOI 10.1109/ISMAR.2011.6092378
-
[61]
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: Rgb-d salient object detection: A benchmark and algorithms. In: ECCV (2014)
work page 2014
-
[62]
Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation
Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial group- ing for image segmentation and object proposal generation. In: arXiv:1503.00848 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[63]
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
work page 2018
-
[64]
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classifica- tion and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1(2), 4 (2017)
work page 2017
-
[65]
Volumetric and Multi-View CNNs for Object Classification on 3D Data
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: V olumetric and multi-view cnns for object classification on 3d data. CoRR abs/1604.03265 (2016). URL http://arxiv. org/abs/1604.03265
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[66]
In: Advances in Neural Information Processing Systems, pp
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
work page 2017
-
[67]
IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)
Qu, L., He, S., Zhang, J., Tian, J., Tang, Y ., Yang, Q.: Rgb-d salient object detection via deep fusion. IEEE Transactions on Image Processing 26(5), 2274–2285 (2017). DOI 10.1109/TIP. 2017.2682981
work page doi:10.1109/tip 2017
-
[68]
In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp
Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y .: Exploiting global priors for rgb-d saliency detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 25–32 (2015). DOI 10.1109/CVPRW.2015.7301391
-
[69]
In: Advances in neural information processing systems, pp
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
work page 2015
-
[70]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). URL http://arxiv. org/abs/1506.01497
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[71]
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp
Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1525–1533 (2016). DOI 10.1109/CVPR.2016.169
-
[72]
In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp
Rusinkiewicz, S., Levoy, M.: Efficient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001). DOI 10.1109/IM.2001.924423
-
[73]
International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition 30 Isaac Ronald Ward, Hamid Laga, and Mohammed Bennamoun Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015). DOI 10.1007/s11263-015-0816-y
-
[74]
Sahin, C., Kouskouridas, R., Kim, T.: Iterative hough forest with histogram of control points for 6 dof object registration from depth images. CoRR abs/1603.02617 (2016). URL http: //arxiv.org/abs/1603.02617
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[75]
A Learning-based Variable Size Part Extraction Architecture for 6D Object Pose Recovery in Depth
Sahin, C., Kouskouridas, R., Kim, T.: A learning-based variable size part extraction ar- chitecture for 6d object pose recovery in depth. CoRR abs/1701.02166 (2017). URL http://arxiv.org/abs/1701.02166
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[76]
In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp
Schwarz, M., Schulz, H., Behnke, S.: Rgb-d object recognition and pose estimation based on pre-trained convolutional neural network features. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335 (2015). DOI 10.1109/ICRA.2015.7139363
-
[77]
IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12), 2821–2840 (2013)
work page 2013
-
[78]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog- nition. arXiv preprint arXiv:1409.1556 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[79]
IEEE Signal Processing Letters 23(12), 1722–1726 (2016)
Song, H., Liu, Z., Xie, Y ., Wu, L., Huang, M.: RGBD co-saliency detection via bagging-based clustering. IEEE Signal Processing Letters 23(12), 1722–1726 (2016)
work page 2016
-
[80]
In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp
Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: A rgb-d scene understanding benchmark suite. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 567–
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.