pith. sign in

arxiv: 2605.17131 · v1 · pith:TUUTOMZTnew · submitted 2026-05-16 · 💻 cs.CV · cs.AI· cs.LG

A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

Pith reviewed 2026-05-20 15:09 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords point clouddeep learningclassificationsegmentation3D visionsurveybackbone architecturesbenchmarks
0
0 comments X

The pith

Survey groups point cloud deep learning models by backbone structure and benchmarks their performance on classification and segmentation tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper provides a systematic overview of deep learning methods for three core 3D vision tasks using point cloud data: classification, part segmentation, and semantic segmentation. It starts by defining point cloud properties and the difficulties created by their unordered, irregular structure plus sensor noise. The survey then organizes notable models according to backbone designs and reports how they perform on common benchmarks. It adds discussion of what architectural choices add or limit, plus remaining open problems and possible next steps. A reader would use this to see the main strategies for turning raw 3D scans into usable predictions.

Core claim

Notable works are grouped by backbone structure, their results are compared on standard benchmarks, and this yields direct observations about which design choices advance performance and which ones still face limits when processing unordered point clouds.

What carries the argument

Backbone structure categorization that sorts models by how they impose order, capture local geometry, enforce permutation invariance, or apply self-attention to point cloud inputs.

If this is right

  • Designers can use the backbone groupings to pick or combine components that already show strong benchmark results for a given task.
  • Limitations noted for current architectures indicate concrete targets for reducing sensitivity to noise and missing points.
  • The listed open challenges supply a short list of problems that new methods should address to move the field forward.
  • Benchmark numbers supply reference points for measuring whether a proposed model improves on prior categories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Later work could extend the same categorization to include more recent attention-heavy or graph-based variants and re-run the benchmark tables.
  • The survey's separation of backbone types could be tested by building a small hybrid model that mixes two categories and checking whether it exceeds the reported limits.
  • Insights on architectural trade-offs may transfer to downstream uses such as object detection in robotics, where point clouds arrive from moving sensors.

Load-bearing premise

The selected papers and benchmarks together give a fair, unbiased picture of the whole field.

What would settle it

A widely cited point cloud model omitted from the survey that achieves clearly better accuracy or uses an entirely new backbone approach not covered in the insights.

Figures

Figures reproduced from arXiv: 2605.17131 by Balakrishnan Prabhakaran, Hiranya Garbha Kumar, Minhas Kamal.

Figure 1
Figure 1. Figure 1: Common 3D scene representation modalities illustrated by a Klein bottle. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Classification of point cloud datasets from multiple perspectives. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparative illustration of discriminative and generative modeling paradigms. (a) Discriminative models learn the conditional [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Point clouds are unordered, irregular, random, and do not represent any surface. As a result, even the synthetically generated [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Taxonomy of commonly used deep learning architectures. Note that architectural categories of discriminative and generative [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison among different methods for classification and segmentation tasks. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A simplified architecture of 3DShapeNets (2015) [ [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A simplified architecture of MVCNN (2015) [ [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A simplified architecture of PointConv (2019) [ [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: A simplified architecture of PointNet (2017) [ [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: A simplified architecture of PointNet++ (2017) [ [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: A simplified architecture of PointViewGCN (2021) [ [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: A simplified architecture of PointBERT (2022) [ [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: A simplified architecture of OmniVec (2024) [ [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Sources for point cloud data acquisition. [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Most widely used 3D benchmark datasets for classification and segmentation. [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗
read the original abstract

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey on deep learning for point cloud classification, part segmentation, and semantic segmentation. It formally defines point cloud data and its challenges (unordered, irregular, sensor noise), categorizes notable works by backbone structure, evaluates performance on popular benchmarks, offers insights into architectural innovations and limitations, and outlines open challenges and future directions.

Significance. If the selected works form a representative sample and benchmark comparisons are reliable, the survey could provide a useful structured overview of the field, helping researchers navigate architectural trends in 3D vision. The discussion of limitations and future directions adds value by identifying gaps, but this depends on the completeness and fairness of coverage.

major comments (2)
  1. [Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.
  2. [Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.
minor comments (2)
  1. [Categorization] Ensure all backbone categories (e.g., point-based, graph-based, transformer-based) are explicitly defined with examples in the taxonomy section for reader clarity.
  2. [Overall] Add a summary table listing key papers, their backbones, and reported metrics to improve navigability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey manuscript. The comments highlight important aspects of transparency that we will address in the revision. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.

    Authors: We agree that explicit documentation of the selection process strengthens a systematic survey. Our categorization of notable works was guided by a review of high-impact papers (prioritizing citation counts and influence on follow-up research) from major venues and arXiv, covering developments from approximately 2017 onward to capture the evolution of backbone architectures. To address the concern, we will add a new subsection titled 'Literature Selection and Scope' in the introduction. This will specify the search strategy (keywords and databases), time frame, inclusion criteria (e.g., focus on deep learning methods with reported benchmark results), and our definition of 'notable' (pioneering contributions or strong empirical performance within each architectural category). revision: yes

  2. Referee: [Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.

    Authors: The referee is correct that this detail was not stated. The tabulated results are compiled verbatim from the numbers reported in the original papers, without re-implementation or unified re-evaluation under controlled conditions. This is a common practice in surveys given the computational cost and implementation variations across models. In the revised manuscript, we will explicitly clarify this in the benchmark evaluation sections (e.g., at the start of Sections 4 and 5) and add a paragraph discussing the limitations of such comparisons, including potential differences in data splits, preprocessing, and training protocols. We will also note any publicly available code or standardized benchmarks that could support more controlled future comparisons. revision: yes

Circularity Check

0 steps flagged

Survey of external models with no internal derivation or self-referential reduction

full rationale

This is a literature survey that organizes previously published point-cloud architectures by backbone type and tabulates their reported benchmark numbers. No equations, fitted parameters, or new predictions are introduced whose values are forced by the paper's own definitions or inputs. All claims rest on external publications and standard datasets; the selection process, while undocumented in detail, does not create a closed loop in which a result is derived from itself. The paper therefore contains no circular steps of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey the paper rests on standard domain definitions of point cloud data rather than new free parameters or invented entities.

axioms (1)
  • domain assumption Point cloud data is inherently unordered and irregular, exacerbated by sensor noise and occlusions.
    Invoked in the abstract as the source of unique challenges for machine learning methods.

pith-pipeline@v0.9.0 · 5703 in / 1186 out tokens · 44146 ms · 2026-05-20T15:09:19.397330+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

139 extracted references · 139 canonical work pages · 11 internal anchors

  1. [1]

    Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, and Bastian Leibe. 2025. DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation.arXiv e-prints(2025), arXiv–2503

  2. [2]

    Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese

    Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  3. [3]

    Matan Atzmon, Haggai Maron, and Yaron Lipman. 2018. Point convolutional neural networks by extension operators.ACM Transactions on Graphics (ToG)37, 4, Article 71 (July 2018), 12 pages. doi:10.1145/3197517.3201301

  4. [4]

    Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X Chang, and Matthias Nießner. 2019. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2614–2623. doi:10.48550/arXiv.1811.11187

  5. [5]

    Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, and Elad Shulman. 2021. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Tr...

  6. [6]

    Behley, M

    J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. InProceedings of the IEEE/CVF International Conf. on Computer Vision (ICCV)

  7. [7]

    Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam, and Jonathan Li. 2020. Review: Deep Learning on 3D Point Clouds. Remote Sensing12, 11 (Jan. 2020), 1729. doi:10.3390/rs12111729

  8. [8]

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InCVPR. 11621–11631

  9. [9]

    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang

  10. [10]

    Matterport3D: Learning from RGB-D Data in Indoor Environments.International Conference on 3D Vision (3DV)(2017)

  11. [11]

    Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu

    Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015.ShapeNet: An Information-Rich 3D Model Repository. Technical Report. Stanford University — Princeton University — Toyota Technological Institute at Chicago

  12. [12]

    Qi, Hao Su, Kaichun Mo, and Leonidas J

    R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77–85. doi:10.1109/CVPR.2017.16

  13. [13]

    Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of The Devil in The Details: Delving Deep into Convolutional Nets.arXiv preprint arXiv:1405.3531(2014)

  14. [14]

    Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. 2024. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS)36 (2024)

  15. [15]

    Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, and Odest Chadwicke Jenkins. 2022. ClearPose: Large-scale Transparent Object Dataset and Benchmark. InProceedings of the European Conference on Computer Vision (ECCV). Vol. 13668. Springer Nature Switzerland, Cham, 381–396

  16. [16]

    Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  17. [17]

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects.arXiv preprint arXiv:2212.08051(2022)

  18. [18]

    Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. 2024. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  19. [19]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. InConference on Computer Vision and Pattern Recognition (CVPR)

  20. [20]

    Mark De Deuge, Alastair Quadros, Calvin Hung, and Bertrand Douillard. 2013. Unsupervised Feature Learning for Classification of Outdoor 3D Scans. InAustralasian Conference on Robotics and Automation 2013 (ACRA 13). Sydney, Australia

  21. [21]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805(2018)

  22. [22]

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representations (ICLR). htt...

  23. [23]

    Eldar, M

    Y. Eldar, M. Lindenbaum, M. Porat, and Y.Y. Zeevi. 1997. The farthest point strategy for progressive image sampling.IEEE Transactions on Image Processing6, 9 (1997), 1305–1315. doi:10.1109/83.623193

  24. [24]

    Jeffrey L Elman. 1990. Finding Structure in Time.Cognitive science14, 2 (1990), 179–211

  25. [25]

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. 2021. Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion D...

  26. [26]

    Fayjie, Mathijs Lens, and Patrick Vandewalle

    Abdur R. Fayjie, Mathijs Lens, and Patrick Vandewalle. 2025. Few-Shot Segmentation of 3D Point Clouds Under Real-World Distributional Shifts in Railroad Infrastructure. 25, 4 (Feb 2025), 1072. doi:10.3390/s25041072

  27. [27]

    Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 264–272. doi:10.1109/CVPR.2018.00035

  28. [28]

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR)(2013)

  29. [29]

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3354–3361. doi:10.1109/CVPR.2012.6248074

  30. [30]

    Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint arXiv:2312.00752(2023)

  31. [31]

    Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InFirst Conference on Language Modeling

  32. [32]

    Guo, J.-X

    Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer.Computational Visual Media7, 2 (01 Jun 2021), 187–199. doi:10.1007/s41095-021-0229-5

  33. [33]

    Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2021. Deep Learning for 3D Point Clouds: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)43, 12 (Dec. 2021), 4338–4364. doi:10.1109/TPAMI.2020.3005434

  34. [34]

    Savinov, L

    Timo Hackel, N. Savinov, L. Ladicky, Jan D. Wegner, K. Schindler, and M. Pollefeys. 2017. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. InISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. IV-1-W1. 91–98

  35. [35]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. doi:10.1109/CVPR.2016.90

  36. [36]

    Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation14, 8 (08 2002), 1771–1800. doi:10.1162/089976602760128018

  37. [37]

    A fast learning algorithm for deep belief nets

    Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets.Neural Computation18, 7 (07 2006), 1527–1554. arXiv:https://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf doi:10.1162/neco.2006.18.7.1527

  38. [38]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840–6851

  39. [39]

    Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long Short-Term Memory.Neural Computation9, 8 (1997), 1735–1780

  40. [40]

    Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. 2016. SceneNN: A Scene Meshes Dataset with Annotations. InInternational Conference on 3D Vision (3DV)

  41. [41]

    Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. 2013. A Category-Level 3D Object Dataset: Putting The Kinect to Work.Consumer depth cameras for computer vision: research topics and applications(2013), 141–165

  42. [42]

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4, Article 139 (2023), 14 pages. doi:10.1145/3592433

  43. [43]

    Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114(2013)

  44. [44]

    Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

  45. [45]

    Roman Klokov and Victor Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In2017 IEEE International Conference on Computer Vision (ICCV). 863–872. doi:10.1109/ICCV.2017.99

  46. [46]

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS), F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc

  47. [47]

    Truc Le and Ye Duan. 2018. PointGrid: A Deep Network for 3D Shape Understanding. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9204–9214. doi:10.1109/CVPR.2018.00959

  48. [48]

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.Nature521, 7553 (May 2015), 436–444. doi:10.1038/nature14539

  49. [49]

    Howard, Wayne Hubbard, and Lawrence Jackel

    Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, R. Howard, Wayne Hubbard, and Lawrence Jackel. 1989. Handwritten Digit Recognition with a Back-Propagation Network. InAdvances in Neural Information Processing Systems (NeurIPS), D. Touretzky (Ed.), Vol. 2. Morgan-Kaufmann

  50. [50]

    Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2011. Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks.Commun. ACM54 (10 2011), 95–103. doi:10.1145/2001269.2001295

  51. [51]

    Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, and Junsong Yuan. 2025. Deep Hierarchical Learning for 3D Semantic Segmentation. International Journal of Computer Vision (IJCV)133, 7 (jul 2025), 4420–4441. doi:10.1007/s11263-025-02387-6

  52. [52]

    Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020. Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. InProceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA, 2020-10-12)(MM ’20). Association for Computing Machinery, 238–...

  53. [53]

    Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: convolution on X-transformed points. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., 828–838

  54. [54]

    Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang

  55. [55]

    InProceedings of the AAAI Conference on AI, Vol

    Pamba: Enhancing Global Interaction in Point Clouds via State Space Model. InProceedings of the AAAI Conference on AI, Vol. 39. 5092–5100

  56. [56]

    Yiyi Liao, Jun Xie, and Andreas Geiger. 2022. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D.Pattern Analysis and Machine Intelligence (PAMI)(2022). Manuscript submitted to ACM 26 Minhas Kamal et al

  57. [57]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV). Springer, 740–755

  58. [58]

    Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8778–8785

  59. [59]

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  60. [60]

    Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, and Bastian Leibe. 2020. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.International Journal of Computer Vision (IJCV)(2020)

  61. [61]

    Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2836–2844. doi:10.1109/CVPR46437.2021.00286

  62. [62]

    Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. 2022. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. InInternational Conference on Learning Representations (ICLR). https://openreview.net/forum?id=3Pbra-_u76D

  63. [63]

    Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel X Chang, and Manolis Savva. 2022. MultiScan: Scalable RGBD scanning for 3D environments with articulated objects. InAdvances in Neural Information Processing Systems (NeurIPS)

  64. [64]

    Daniel Maturana and Sebastian A. Scherer. 2015. VoxNet: A 3D Convolutional Neural Network for real-time object recognition.2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(2015), 922–928. https://api.semanticscholar.org/CorpusID:14620252

  65. [65]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (Dec. 2021), 99–106. doi:10.1145/3503250

  66. [66]

    Seyed Saber Mohammadi, Yiming Wang, and Alessio Del Bue. 2021. Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds. In 2021 IEEE International Conference on Image Processing (ICIP). 3103–3107. doi:10.1109/ICIP42928.2021.9506426

  67. [67]

    Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751(2022)

  68. [68]

    Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. InInternational Conference on Machine Learning (ICML). PMLR, 2014–2023

  69. [69]

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. Curran Associates, Inc

  70. [70]

    Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. 2023. Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML)(Honolulu, Hawaii, USA)(ICML). JMLR.org, Article 1171, 21 pages

  71. [71]

    Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, and Kaisheng Ma. 2024. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.arXiv preprint arXiv:2402.17766(2024)

  72. [72]

    Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. 2022. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems (NeurIPS), Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openrevi...

  73. [73]

    Bo Qiu, Yuzhou Zhou, Lei Dai, Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, and Bisheng Yang. 2024. WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation. 25, 12 (Dec 2024), 20900–20916. doi:10.1109/TITS.2024.3469546

  74. [74]

    Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, and Patrick Vandewalle. 2025. BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation. InComputer Vision – ECCV 2024 Workshops(Cham, 2025), Alessio Del Bue, Cristian Canton, Jordi Pont-Tuset, and Tatiana Tommasi (Eds.). Springer Nature Switz...

  75. [75]

    Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. 2020. Accelerating 3D Deep Learning with PyTorch3D.arXiv:2007.08501(2020)

  76. [76]

    Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning Deep 3D Representations at High Resolutions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  77. [77]

    Plenoc- trees for real-time rendering of neural radiance fields,

    Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. 2021. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In2021 IEEE/CVF International Conference on Computer Vision (ICCV)(Montreal, QC, Canada, 2021-10). IEEE, 10892–10902. doi:10.1109/IC...

  78. [78]

    Jason Tyler Rolfe. 2016. Discrete variational autoencoders.arXiv preprint arXiv:1609.02200(2016)

  79. [79]

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241

  80. [80]

    David Rozenberszki, Or Litany, and Angela Dai. 2022. Language-Grounded Indoor 3D Semantic Segmentation in the Wild. InProceedings of the European Conference on Computer Vision (ECCV)

Showing first 80 references.