A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

Balakrishnan Prabhakaran; Hiranya Garbha Kumar; Minhas Kamal

arxiv: 2605.17131 · v1 · pith:TUUTOMZTnew · submitted 2026-05-16 · 💻 cs.CV · cs.AI· cs.LG

A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

Minhas Kamal , Hiranya Garbha Kumar , Balakrishnan Prabhakaran This is my paper

Pith reviewed 2026-05-20 15:09 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords point clouddeep learningclassificationsegmentation3D visionsurveybackbone architecturesbenchmarks

0 comments

The pith

Survey groups point cloud deep learning models by backbone structure and benchmarks their performance on classification and segmentation tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper provides a systematic overview of deep learning methods for three core 3D vision tasks using point cloud data: classification, part segmentation, and semantic segmentation. It starts by defining point cloud properties and the difficulties created by their unordered, irregular structure plus sensor noise. The survey then organizes notable models according to backbone designs and reports how they perform on common benchmarks. It adds discussion of what architectural choices add or limit, plus remaining open problems and possible next steps. A reader would use this to see the main strategies for turning raw 3D scans into usable predictions.

Core claim

Notable works are grouped by backbone structure, their results are compared on standard benchmarks, and this yields direct observations about which design choices advance performance and which ones still face limits when processing unordered point clouds.

What carries the argument

Backbone structure categorization that sorts models by how they impose order, capture local geometry, enforce permutation invariance, or apply self-attention to point cloud inputs.

If this is right

Designers can use the backbone groupings to pick or combine components that already show strong benchmark results for a given task.
Limitations noted for current architectures indicate concrete targets for reducing sensitivity to noise and missing points.
The listed open challenges supply a short list of problems that new methods should address to move the field forward.
Benchmark numbers supply reference points for measuring whether a proposed model improves on prior categories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Later work could extend the same categorization to include more recent attention-heavy or graph-based variants and re-run the benchmark tables.
The survey's separation of backbone types could be tested by building a small hybrid model that mixes two categories and checking whether it exceeds the reported limits.
Insights on architectural trade-offs may transfer to downstream uses such as object detection in robotics, where point clouds arrive from moving sensors.

Load-bearing premise

The selected papers and benchmarks together give a fair, unbiased picture of the whole field.

What would settle it

A widely cited point cloud model omitted from the survey that achieves clearly better accuracy or uses an entirely new backbone approach not covered in the insights.

Figures

Figures reproduced from arXiv: 2605.17131 by Balakrishnan Prabhakaran, Hiranya Garbha Kumar, Minhas Kamal.

**Figure 2.** Figure 2: Classification of point cloud datasets from multiple perspectives. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparative illustration of discriminative and generative modeling paradigms. (a) Discriminative models learn the conditional [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Point clouds are unordered, irregular, random, and do not represent any surface. As a result, even the synthetically generated [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Taxonomy of commonly used deep learning architectures. Note that architectural categories of discriminative and generative [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison among different methods for classification and segmentation tasks. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: A simplified architecture of 3DShapeNets (2015) [ [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: A simplified architecture of MVCNN (2015) [ [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: A simplified architecture of PointConv (2019) [ [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: A simplified architecture of PointNet (2017) [ [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: A simplified architecture of PointNet++ (2017) [ [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: A simplified architecture of PointViewGCN (2021) [ [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: A simplified architecture of PointBERT (2022) [ [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: A simplified architecture of OmniVec (2024) [ [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

**Figure 15.** Figure 15: Sources for point cloud data acquisition. [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗

**Figure 16.** Figure 16: Most widely used 3D benchmark datasets for classification and segmentation. [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗

read the original abstract

Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a survey on deep learning for point cloud classification, part segmentation, and semantic segmentation. It formally defines point cloud data and its challenges (unordered, irregular, sensor noise), categorizes notable works by backbone structure, evaluates performance on popular benchmarks, offers insights into architectural innovations and limitations, and outlines open challenges and future directions.

Significance. If the selected works form a representative sample and benchmark comparisons are reliable, the survey could provide a useful structured overview of the field, helping researchers navigate architectural trends in 3D vision. The discussion of limitations and future directions adds value by identifying gaps, but this depends on the completeness and fairness of coverage.

major comments (2)

[Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.
[Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.

minor comments (2)

[Categorization] Ensure all backbone categories (e.g., point-based, graph-based, transformer-based) are explicitly defined with examples in the taxonomy section for reader clarity.
[Overall] Add a summary table listing key papers, their backbones, and reported metrics to improve navigability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey manuscript. The comments highlight important aspects of transparency that we will address in the revision. We respond to each major comment below.

read point-by-point responses

Referee: [Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.

Authors: We agree that explicit documentation of the selection process strengthens a systematic survey. Our categorization of notable works was guided by a review of high-impact papers (prioritizing citation counts and influence on follow-up research) from major venues and arXiv, covering developments from approximately 2017 onward to capture the evolution of backbone architectures. To address the concern, we will add a new subsection titled 'Literature Selection and Scope' in the introduction. This will specify the search strategy (keywords and databases), time frame, inclusion criteria (e.g., focus on deep learning methods with reported benchmark results), and our definition of 'notable' (pioneering contributions or strong empirical performance within each architectural category). revision: yes
Referee: [Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.

Authors: The referee is correct that this detail was not stated. The tabulated results are compiled verbatim from the numbers reported in the original papers, without re-implementation or unified re-evaluation under controlled conditions. This is a common practice in surveys given the computational cost and implementation variations across models. In the revised manuscript, we will explicitly clarify this in the benchmark evaluation sections (e.g., at the start of Sections 4 and 5) and add a paragraph discussing the limitations of such comparisons, including potential differences in data splits, preprocessing, and training protocols. We will also note any publicly available code or standardized benchmarks that could support more controlled future comparisons. revision: yes

Circularity Check

0 steps flagged

Survey of external models with no internal derivation or self-referential reduction

full rationale

This is a literature survey that organizes previously published point-cloud architectures by backbone type and tabulates their reported benchmark numbers. No equations, fitted parameters, or new predictions are introduced whose values are forced by the paper's own definitions or inputs. All claims rest on external publications and standard datasets; the selection process, while undocumented in detail, does not create a closed loop in which a result is derived from itself. The paper therefore contains no circular steps of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey the paper rests on standard domain definitions of point cloud data rather than new free parameters or invented entities.

axioms (1)

domain assumption Point cloud data is inherently unordered and irregular, exacerbated by sensor noise and occlusions.
Invoked in the abstract as the source of unique challenges for machine learning methods.

pith-pipeline@v0.9.0 · 5703 in / 1186 out tokens · 44146 ms · 2026-05-20T15:09:19.397330+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

139 extracted references · 139 canonical work pages · 11 internal anchors

[1]

Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, and Bastian Leibe. 2025. DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation.arXiv e-prints(2025), arXiv–2503

work page 2025
[2]

Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese

Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016
[3]

Matan Atzmon, Haggai Maron, and Yaron Lipman. 2018. Point convolutional neural networks by extension operators.ACM Transactions on Graphics (ToG)37, 4, Article 71 (July 2018), 12 pages. doi:10.1145/3197517.3201301

work page doi:10.1145/3197517.3201301 2018
[4]

Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X Chang, and Matthias Nießner. 2019. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2614–2623. doi:10.48550/arXiv.1811.11187

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.11187 2019
[5]

Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, and Elad Shulman. 2021. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Tr...

work page 2021
[6]

Behley, M

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. InProceedings of the IEEE/CVF International Conf. on Computer Vision (ICCV)

work page 2019
[7]

Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam, and Jonathan Li. 2020. Review: Deep Learning on 3D Point Clouds. Remote Sensing12, 11 (Jan. 2020), 1729. doi:10.3390/rs12111729

work page doi:10.3390/rs12111729 2020
[8]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InCVPR. 11621–11631

work page 2020
[9]

Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang

work page
[10]

Matterport3D: Learning from RGB-D Data in Indoor Environments.International Conference on 3D Vision (3DV)(2017)

work page 2017
[11]

Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015.ShapeNet: An Information-Rich 3D Model Repository. Technical Report. Stanford University — Princeton University — Toyota Technological Institute at Chicago

work page 2015
[12]

Qi, Hao Su, Kaichun Mo, and Leonidas J

R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77–85. doi:10.1109/CVPR.2017.16

work page doi:10.1109/cvpr.2017.16 2017
[13]

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of The Devil in The Details: Delving Deep into Convolutional Nets.arXiv preprint arXiv:1405.3531(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. 2024. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS)36 (2024)

work page 2024
[15]

Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, and Odest Chadwicke Jenkins. 2022. ClearPose: Large-scale Transparent Object Dataset and Benchmark. InProceedings of the European Conference on Computer Vision (ECCV). Vol. 13668. Springer Nature Switzerland, Cham, 381–396

work page 2022
[16]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2017
[17]

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects.arXiv preprint arXiv:2212.08051(2022)

work page arXiv 2022
[18]

Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. 2024. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2024
[19]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. InConference on Computer Vision and Pattern Recognition (CVPR)

work page 2009
[20]

Mark De Deuge, Alastair Quadros, Calvin Hung, and Bertrand Douillard. 2013. Unsupervised Feature Learning for Classification of Outdoor 3D Scans. InAustralasian Conference on Robotics and Automation 2013 (ACRA 13). Sydney, Australia

work page 2013
[21]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representations (ICLR). htt...

work page 2021
[23]

Eldar, M

Y. Eldar, M. Lindenbaum, M. Porat, and Y.Y. Zeevi. 1997. The farthest point strategy for progressive image sampling.IEEE Transactions on Image Processing6, 9 (1997), 1305–1315. doi:10.1109/83.623193

work page doi:10.1109/83.623193 1997
[24]

Jeffrey L Elman. 1990. Finding Structure in Time.Cognitive science14, 2 (1990), 179–211

work page 1990
[25]

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. 2021. Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion D...

work page doi:10.1109/iccv48922.2021.00957 2021
[26]

Fayjie, Mathijs Lens, and Patrick Vandewalle

Abdur R. Fayjie, Mathijs Lens, and Patrick Vandewalle. 2025. Few-Shot Segmentation of 3D Point Clouds Under Real-World Distributional Shifts in Railroad Infrastructure. 25, 4 (Feb 2025), 1072. doi:10.3390/s25041072

work page doi:10.3390/s25041072 2025
[27]

Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 264–272. doi:10.1109/CVPR.2018.00035

work page doi:10.1109/cvpr.2018.00035 2018
[28]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR)(2013)

work page 2013
[29]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3354–3361. doi:10.1109/CVPR.2012.6248074

work page doi:10.1109/cvpr.2012.6248074 2012
[30]

Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InFirst Conference on Language Modeling

work page 2024
[32]

Guo, J.-X

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer.Computational Visual Media7, 2 (01 Jun 2021), 187–199. doi:10.1007/s41095-021-0229-5

work page doi:10.1007/s41095-021-0229-5 2021
[33]

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2021. Deep Learning for 3D Point Clouds: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)43, 12 (Dec. 2021), 4338–4364. doi:10.1109/TPAMI.2020.3005434

work page doi:10.1109/tpami.2020.3005434 2021
[34]

Savinov, L

Timo Hackel, N. Savinov, L. Ladicky, Jan D. Wegner, K. Schindler, and M. Pollefeys. 2017. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. InISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. IV-1-W1. 91–98

work page 2017
[35]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. doi:10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[36]

Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation14, 8 (08 2002), 1771–1800. doi:10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002
[37]

A fast learning algorithm for deep belief nets

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets.Neural Computation18, 7 (07 2006), 1527–1554. arXiv:https://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf doi:10.1162/neco.2006.18.7.1527

work page doi:10.1162/neco.2006.18.7.1527 2006
[38]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840–6851

work page 2020
[39]

Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long Short-Term Memory.Neural Computation9, 8 (1997), 1735–1780

work page 1997
[40]

Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. 2016. SceneNN: A Scene Meshes Dataset with Annotations. InInternational Conference on 3D Vision (3DV)

work page 2016
[41]

Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. 2013. A Category-Level 3D Object Dataset: Putting The Kinect to Work.Consumer depth cameras for computer vision: research topics and applications(2013), 141–165

work page 2013
[42]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4, Article 139 (2023), 14 pages. doi:10.1145/3592433

work page doi:10.1145/3592433 2023
[43]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114(2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[44]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[45]

Roman Klokov and Victor Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In2017 IEEE International Conference on Computer Vision (ICCV). 863–872. doi:10.1109/ICCV.2017.99

work page doi:10.1109/iccv.2017.99 2017
[46]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS), F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc

work page 2012
[47]

Truc Le and Ye Duan. 2018. PointGrid: A Deep Network for 3D Shape Understanding. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9204–9214. doi:10.1109/CVPR.2018.00959

work page doi:10.1109/cvpr.2018.00959 2018
[48]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.Nature521, 7553 (May 2015), 436–444. doi:10.1038/nature14539

work page doi:10.1038/nature14539 2015
[49]

Howard, Wayne Hubbard, and Lawrence Jackel

Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, R. Howard, Wayne Hubbard, and Lawrence Jackel. 1989. Handwritten Digit Recognition with a Back-Propagation Network. InAdvances in Neural Information Processing Systems (NeurIPS), D. Touretzky (Ed.), Vol. 2. Morgan-Kaufmann

work page 1989
[50]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2011. Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks.Commun. ACM54 (10 2011), 95–103. doi:10.1145/2001269.2001295

work page doi:10.1145/2001269.2001295 2011
[51]

Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, and Junsong Yuan. 2025. Deep Hierarchical Learning for 3D Semantic Segmentation. International Journal of Computer Vision (IJCV)133, 7 (jul 2025), 4420–4441. doi:10.1007/s11263-025-02387-6

work page doi:10.1007/s11263-025-02387-6 2025
[52]

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020. Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. InProceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA, 2020-10-12)(MM ’20). Association for Computing Machinery, 238–...

work page doi:10.1145/3394171.3413661 2020
[53]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: convolution on X-transformed points. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., 828–838

work page 2018
[54]

Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang

work page
[55]

InProceedings of the AAAI Conference on AI, Vol

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model. InProceedings of the AAAI Conference on AI, Vol. 39. 5092–5100

work page
[56]

Yiyi Liao, Jun Xie, and Andreas Geiger. 2022. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D.Pattern Analysis and Machine Intelligence (PAMI)(2022). Manuscript submitted to ACM 26 Minhas Kamal et al

work page 2022
[57]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV). Springer, 740–755

work page 2014
[58]

Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8778–8785

work page 2019
[59]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2015
[60]

Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, and Bastian Leibe. 2020. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.International Journal of Computer Vision (IJCV)(2020)

work page 2020
[61]

Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2836–2844. doi:10.1109/CVPR46437.2021.00286

work page doi:10.1109/cvpr46437.2021.00286 2021
[62]

Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. 2022. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. InInternational Conference on Learning Representations (ICLR). https://openreview.net/forum?id=3Pbra-_u76D

work page 2022
[63]

Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel X Chang, and Manolis Savva. 2022. MultiScan: Scalable RGBD scanning for 3D environments with articulated objects. InAdvances in Neural Information Processing Systems (NeurIPS)

work page 2022
[64]

Daniel Maturana and Sebastian A. Scherer. 2015. VoxNet: A 3D Convolutional Neural Network for real-time object recognition.2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(2015), 922–928. https://api.semanticscholar.org/CorpusID:14620252

work page 2015
[65]

Nerf: Representing scenes as neural radiance fields for view synthesis,

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (Dec. 2021), 99–106. doi:10.1145/3503250

work page doi:10.1145/3503250 2021
[66]

Seyed Saber Mohammadi, Yiming Wang, and Alessio Del Bue. 2021. Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds. In 2021 IEEE International Conference on Image Processing (ICIP). 3103–3107. doi:10.1109/ICIP42928.2021.9506426

work page doi:10.1109/icip42928.2021.9506426 2021
[67]

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[68]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. InInternational Conference on Machine Learning (ICML). PMLR, 2014–2023

work page 2016
[69]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. Curran Associates, Inc

work page 2017
[70]

Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. 2023. Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML)(Honolulu, Hawaii, USA)(ICML). JMLR.org, Article 1171, 21 pages

work page 2023
[71]

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, and Kaisheng Ma. 2024. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.arXiv preprint arXiv:2402.17766(2024)

work page arXiv 2024
[72]

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. 2022. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems (NeurIPS), Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openrevi...

work page 2022
[73]

Bo Qiu, Yuzhou Zhou, Lei Dai, Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, and Bisheng Yang. 2024. WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation. 25, 12 (Dec 2024), 20900–20916. doi:10.1109/TITS.2024.3469546

work page doi:10.1109/tits.2024.3469546 2024
[74]

Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, and Patrick Vandewalle. 2025. BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation. InComputer Vision – ECCV 2024 Workshops(Cham, 2025), Alessio Del Bue, Cristian Canton, Jordi Pont-Tuset, and Tatiana Tommasi (Eds.). Springer Nature Switz...

work page doi:10.1007/978-3-031-91672-4_19 2025
[75]

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. 2020. Accelerating 3D Deep Learning with PyTorch3D.arXiv:2007.08501(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[76]

Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning Deep 3D Representations at High Resolutions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2017
[77]

Plenoc- trees for real-time rendering of neural radiance fields,

Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. 2021. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In2021 IEEE/CVF International Conference on Computer Vision (ICCV)(Montreal, QC, Canada, 2021-10). IEEE, 10892–10902. doi:10.1109/IC...

work page doi:10.1109/iccv48922.2021.01073 2021
[78]

Jason Tyler Rolfe. 2016. Discrete variational autoencoders.arXiv preprint arXiv:1609.02200(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[79]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241

work page 2015
[80]

David Rozenberszki, Or Litany, and Angela Dai. 2022. Language-Grounded Indoor 3D Semantic Segmentation in the Wild. InProceedings of the European Conference on Computer Vision (ECCV)

work page 2022

Showing first 80 references.

[1] [1]

Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, and Bastian Leibe. 2025. DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation.arXiv e-prints(2025), arXiv–2503

work page 2025

[2] [2]

Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese

Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016

[3] [3]

Matan Atzmon, Haggai Maron, and Yaron Lipman. 2018. Point convolutional neural networks by extension operators.ACM Transactions on Graphics (ToG)37, 4, Article 71 (July 2018), 12 pages. doi:10.1145/3197517.3201301

work page doi:10.1145/3197517.3201301 2018

[4] [4]

Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X Chang, and Matthias Nießner. 2019. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2614–2623. doi:10.48550/arXiv.1811.11187

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.11187 2019

[5] [5]

Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, and Elad Shulman. 2021. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Tr...

work page 2021

[6] [6]

Behley, M

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. InProceedings of the IEEE/CVF International Conf. on Computer Vision (ICCV)

work page 2019

[7] [7]

Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam, and Jonathan Li. 2020. Review: Deep Learning on 3D Point Clouds. Remote Sensing12, 11 (Jan. 2020), 1729. doi:10.3390/rs12111729

work page doi:10.3390/rs12111729 2020

[8] [8]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InCVPR. 11621–11631

work page 2020

[9] [9]

Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang

work page

[10] [10]

Matterport3D: Learning from RGB-D Data in Indoor Environments.International Conference on 3D Vision (3DV)(2017)

work page 2017

[11] [11]

Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015.ShapeNet: An Information-Rich 3D Model Repository. Technical Report. Stanford University — Princeton University — Toyota Technological Institute at Chicago

work page 2015

[12] [12]

Qi, Hao Su, Kaichun Mo, and Leonidas J

R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77–85. doi:10.1109/CVPR.2017.16

work page doi:10.1109/cvpr.2017.16 2017

[13] [13]

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of The Devil in The Details: Delving Deep into Convolutional Nets.arXiv preprint arXiv:1405.3531(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. 2024. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS)36 (2024)

work page 2024

[15] [15]

Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, and Odest Chadwicke Jenkins. 2022. ClearPose: Large-scale Transparent Object Dataset and Benchmark. InProceedings of the European Conference on Computer Vision (ECCV). Vol. 13668. Springer Nature Switzerland, Cham, 381–396

work page 2022

[16] [16]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2017

[17] [17]

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects.arXiv preprint arXiv:2212.08051(2022)

work page arXiv 2022

[18] [18]

Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. 2024. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2024

[19] [19]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. InConference on Computer Vision and Pattern Recognition (CVPR)

work page 2009

[20] [20]

Mark De Deuge, Alastair Quadros, Calvin Hung, and Bertrand Douillard. 2013. Unsupervised Feature Learning for Classification of Outdoor 3D Scans. InAustralasian Conference on Robotics and Automation 2013 (ACRA 13). Sydney, Australia

work page 2013

[21] [21]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representations (ICLR). htt...

work page 2021

[23] [23]

Eldar, M

Y. Eldar, M. Lindenbaum, M. Porat, and Y.Y. Zeevi. 1997. The farthest point strategy for progressive image sampling.IEEE Transactions on Image Processing6, 9 (1997), 1305–1315. doi:10.1109/83.623193

work page doi:10.1109/83.623193 1997

[24] [24]

Jeffrey L Elman. 1990. Finding Structure in Time.Cognitive science14, 2 (1990), 179–211

work page 1990

[25] [25]

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. 2021. Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion D...

work page doi:10.1109/iccv48922.2021.00957 2021

[26] [26]

Fayjie, Mathijs Lens, and Patrick Vandewalle

Abdur R. Fayjie, Mathijs Lens, and Patrick Vandewalle. 2025. Few-Shot Segmentation of 3D Point Clouds Under Real-World Distributional Shifts in Railroad Infrastructure. 25, 4 (Feb 2025), 1072. doi:10.3390/s25041072

work page doi:10.3390/s25041072 2025

[27] [27]

Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 264–272. doi:10.1109/CVPR.2018.00035

work page doi:10.1109/cvpr.2018.00035 2018

[28] [28]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR)(2013)

work page 2013

[29] [29]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3354–3361. doi:10.1109/CVPR.2012.6248074

work page doi:10.1109/cvpr.2012.6248074 2012

[30] [30]

Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InFirst Conference on Language Modeling

work page 2024

[32] [32]

Guo, J.-X

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer.Computational Visual Media7, 2 (01 Jun 2021), 187–199. doi:10.1007/s41095-021-0229-5

work page doi:10.1007/s41095-021-0229-5 2021

[33] [33]

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2021. Deep Learning for 3D Point Clouds: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)43, 12 (Dec. 2021), 4338–4364. doi:10.1109/TPAMI.2020.3005434

work page doi:10.1109/tpami.2020.3005434 2021

[34] [34]

Savinov, L

Timo Hackel, N. Savinov, L. Ladicky, Jan D. Wegner, K. Schindler, and M. Pollefeys. 2017. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. InISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. IV-1-W1. 91–98

work page 2017

[35] [35]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. doi:10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[36] [36]

Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation14, 8 (08 2002), 1771–1800. doi:10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002

[37] [37]

A fast learning algorithm for deep belief nets

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets.Neural Computation18, 7 (07 2006), 1527–1554. arXiv:https://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf doi:10.1162/neco.2006.18.7.1527

work page doi:10.1162/neco.2006.18.7.1527 2006

[38] [38]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840–6851

work page 2020

[39] [39]

Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long Short-Term Memory.Neural Computation9, 8 (1997), 1735–1780

work page 1997

[40] [40]

Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. 2016. SceneNN: A Scene Meshes Dataset with Annotations. InInternational Conference on 3D Vision (3DV)

work page 2016

[41] [41]

Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. 2013. A Category-Level 3D Object Dataset: Putting The Kinect to Work.Consumer depth cameras for computer vision: research topics and applications(2013), 141–165

work page 2013

[42] [42]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4, Article 139 (2023), 14 pages. doi:10.1145/3592433

work page doi:10.1145/3592433 2023

[43] [43]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114(2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[44] [44]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[45] [45]

Roman Klokov and Victor Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In2017 IEEE International Conference on Computer Vision (ICCV). 863–872. doi:10.1109/ICCV.2017.99

work page doi:10.1109/iccv.2017.99 2017

[46] [46]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS), F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc

work page 2012

[47] [47]

Truc Le and Ye Duan. 2018. PointGrid: A Deep Network for 3D Shape Understanding. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9204–9214. doi:10.1109/CVPR.2018.00959

work page doi:10.1109/cvpr.2018.00959 2018

[48] [48]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.Nature521, 7553 (May 2015), 436–444. doi:10.1038/nature14539

work page doi:10.1038/nature14539 2015

[49] [49]

Howard, Wayne Hubbard, and Lawrence Jackel

Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, R. Howard, Wayne Hubbard, and Lawrence Jackel. 1989. Handwritten Digit Recognition with a Back-Propagation Network. InAdvances in Neural Information Processing Systems (NeurIPS), D. Touretzky (Ed.), Vol. 2. Morgan-Kaufmann

work page 1989

[50] [50]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2011. Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks.Commun. ACM54 (10 2011), 95–103. doi:10.1145/2001269.2001295

work page doi:10.1145/2001269.2001295 2011

[51] [51]

Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, and Junsong Yuan. 2025. Deep Hierarchical Learning for 3D Semantic Segmentation. International Journal of Computer Vision (IJCV)133, 7 (jul 2025), 4420–4441. doi:10.1007/s11263-025-02387-6

work page doi:10.1007/s11263-025-02387-6 2025

[52] [52]

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020. Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. InProceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA, 2020-10-12)(MM ’20). Association for Computing Machinery, 238–...

work page doi:10.1145/3394171.3413661 2020

[53] [53]

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: convolution on X-transformed points. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., 828–838

work page 2018

[54] [54]

Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang

work page

[55] [55]

InProceedings of the AAAI Conference on AI, Vol

Pamba: Enhancing Global Interaction in Point Clouds via State Space Model. InProceedings of the AAAI Conference on AI, Vol. 39. 5092–5100

work page

[56] [56]

Yiyi Liao, Jun Xie, and Andreas Geiger. 2022. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D.Pattern Analysis and Machine Intelligence (PAMI)(2022). Manuscript submitted to ACM 26 Minhas Kamal et al

work page 2022

[57] [57]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV). Springer, 740–755

work page 2014

[58] [58]

Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8778–8785

work page 2019

[59] [59]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2015

[60] [60]

Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, and Bastian Leibe. 2020. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.International Journal of Computer Vision (IJCV)(2020)

work page 2020

[61] [61]

Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2836–2844. doi:10.1109/CVPR46437.2021.00286

work page doi:10.1109/cvpr46437.2021.00286 2021

[62] [62]

Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. 2022. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. InInternational Conference on Learning Representations (ICLR). https://openreview.net/forum?id=3Pbra-_u76D

work page 2022

[63] [63]

Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel X Chang, and Manolis Savva. 2022. MultiScan: Scalable RGBD scanning for 3D environments with articulated objects. InAdvances in Neural Information Processing Systems (NeurIPS)

work page 2022

[64] [64]

Daniel Maturana and Sebastian A. Scherer. 2015. VoxNet: A 3D Convolutional Neural Network for real-time object recognition.2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(2015), 922–928. https://api.semanticscholar.org/CorpusID:14620252

work page 2015

[65] [65]

Nerf: Representing scenes as neural radiance fields for view synthesis,

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (Dec. 2021), 99–106. doi:10.1145/3503250

work page doi:10.1145/3503250 2021

[66] [66]

Seyed Saber Mohammadi, Yiming Wang, and Alessio Del Bue. 2021. Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds. In 2021 IEEE International Conference on Image Processing (ICIP). 3103–3107. doi:10.1109/ICIP42928.2021.9506426

work page doi:10.1109/icip42928.2021.9506426 2021

[67] [67]

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[68] [68]

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. InInternational Conference on Machine Learning (ICML). PMLR, 2014–2023

work page 2016

[69] [69]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. Curran Associates, Inc

work page 2017

[70] [70]

Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. 2023. Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML)(Honolulu, Hawaii, USA)(ICML). JMLR.org, Article 1171, 21 pages

work page 2023

[71] [71]

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, and Kaisheng Ma. 2024. ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.arXiv preprint arXiv:2402.17766(2024)

work page arXiv 2024

[72] [72]

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. 2022. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems (NeurIPS), Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openrevi...

work page 2022

[73] [73]

Bo Qiu, Yuzhou Zhou, Lei Dai, Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, and Bisheng Yang. 2024. WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation. 25, 12 (Dec 2024), 20900–20916. doi:10.1109/TITS.2024.3469546

work page doi:10.1109/tits.2024.3469546 2024

[74] [74]

Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, and Patrick Vandewalle. 2025. BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation. InComputer Vision – ECCV 2024 Workshops(Cham, 2025), Alessio Del Bue, Cristian Canton, Jordi Pont-Tuset, and Tatiana Tommasi (Eds.). Springer Nature Switz...

work page doi:10.1007/978-3-031-91672-4_19 2025

[75] [75]

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. 2020. Accelerating 3D Deep Learning with PyTorch3D.arXiv:2007.08501(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[76] [76]

Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning Deep 3D Representations at High Resolutions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2017

[77] [77]

Plenoc- trees for real-time rendering of neural radiance fields,

Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. 2021. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In2021 IEEE/CVF International Conference on Computer Vision (ICCV)(Montreal, QC, Canada, 2021-10). IEEE, 10892–10902. doi:10.1109/IC...

work page doi:10.1109/iccv48922.2021.01073 2021

[78] [78]

Jason Tyler Rolfe. 2016. Discrete variational autoencoders.arXiv preprint arXiv:1609.02200(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[79] [79]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241

work page 2015

[80] [80]

David Rozenberszki, Or Litany, and Angela Dai. 2022. Language-Grounded Indoor 3D Semantic Segmentation in the Wild. InProceedings of the European Conference on Computer Vision (ECCV)

work page 2022