A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation
Pith reviewed 2026-05-20 15:09 UTC · model grok-4.3
The pith
Survey groups point cloud deep learning models by backbone structure and benchmarks their performance on classification and segmentation tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Notable works are grouped by backbone structure, their results are compared on standard benchmarks, and this yields direct observations about which design choices advance performance and which ones still face limits when processing unordered point clouds.
What carries the argument
Backbone structure categorization that sorts models by how they impose order, capture local geometry, enforce permutation invariance, or apply self-attention to point cloud inputs.
If this is right
- Designers can use the backbone groupings to pick or combine components that already show strong benchmark results for a given task.
- Limitations noted for current architectures indicate concrete targets for reducing sensitivity to noise and missing points.
- The listed open challenges supply a short list of problems that new methods should address to move the field forward.
- Benchmark numbers supply reference points for measuring whether a proposed model improves on prior categories.
Where Pith is reading between the lines
- Later work could extend the same categorization to include more recent attention-heavy or graph-based variants and re-run the benchmark tables.
- The survey's separation of backbone types could be tested by building a small hybrid model that mixes two categories and checking whether it exceeds the reported limits.
- Insights on architectural trade-offs may transfer to downstream uses such as object detection in robotics, where point clouds arrive from moving sensors.
Load-bearing premise
The selected papers and benchmarks together give a fair, unbiased picture of the whole field.
What would settle it
A widely cited point cloud model omitted from the survey that achieves clearly better accuracy or uses an entirely new backbone approach not covered in the insights.
Figures
read the original abstract
Point cloud stands as the most widely adopted format for representing 3D shapes and scenes due to its simplicity and geometric fidelity. However, its inherent unordered and irregular nature, exacerbated by sensor noise and occlusions, introduces unique challenges for machine learning based methodologies. To combat these issues, diverse strategies have been developed, including converting to a format that has orderliness, extracting local geometry, and permutation-invariant or self-attention-based processing. In this paper, our focus is directed towards deep learning models for three fundamental tasks in 3D vision: point cloud classification, part segmentation, and semantic segmentation. We begin by formally defining point cloud data, followed by an in-depth discussion on its structural characteristics. Then, we categorize notable works based on their backbone structure and evaluate their performance on popular benchmarks. Beyond empirical comparison, we offer insights into architectural innovations and limitations. We also outline open challenges and promising future directions for 3D point cloud understanding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on deep learning for point cloud classification, part segmentation, and semantic segmentation. It formally defines point cloud data and its challenges (unordered, irregular, sensor noise), categorizes notable works by backbone structure, evaluates performance on popular benchmarks, offers insights into architectural innovations and limitations, and outlines open challenges and future directions.
Significance. If the selected works form a representative sample and benchmark comparisons are reliable, the survey could provide a useful structured overview of the field, helping researchers navigate architectural trends in 3D vision. The discussion of limitations and future directions adds value by identifying gaps, but this depends on the completeness and fairness of coverage.
major comments (2)
- [Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.
- [Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.
minor comments (2)
- [Categorization] Ensure all backbone categories (e.g., point-based, graph-based, transformer-based) are explicitly defined with examples in the taxonomy section for reader clarity.
- [Overall] Add a summary table listing key papers, their backbones, and reported metrics to improve navigability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey manuscript. The comments highlight important aspects of transparency that we will address in the revision. We respond to each major comment below.
read point-by-point responses
-
Referee: [Introduction] Introduction and abstract: The central claim of a 'systematic survey' that categorizes 'notable works' and derives insights from benchmark evaluations assumes a representative sample, yet no literature search protocol, inclusion/exclusion criteria, time frame, or definition of 'notable' is documented. This directly weakens the reliability of the categorization, empirical comparisons, and architectural insights.
Authors: We agree that explicit documentation of the selection process strengthens a systematic survey. Our categorization of notable works was guided by a review of high-impact papers (prioritizing citation counts and influence on follow-up research) from major venues and arXiv, covering developments from approximately 2017 onward to capture the evolution of backbone architectures. To address the concern, we will add a new subsection titled 'Literature Selection and Scope' in the introduction. This will specify the search strategy (keywords and databases), time frame, inclusion criteria (e.g., focus on deep learning methods with reported benchmark results), and our definition of 'notable' (pioneering contributions or strong empirical performance within each architectural category). revision: yes
-
Referee: [Benchmark Evaluation] Benchmark evaluation sections: Performance summaries on datasets such as ModelNet and ShapeNet do not specify whether numbers are taken verbatim from original papers (with potentially inconsistent protocols, preprocessing, or splits) or re-evaluated under controlled conditions. This affects the validity of cross-model comparisons and conclusions about architectural superiority.
Authors: The referee is correct that this detail was not stated. The tabulated results are compiled verbatim from the numbers reported in the original papers, without re-implementation or unified re-evaluation under controlled conditions. This is a common practice in surveys given the computational cost and implementation variations across models. In the revised manuscript, we will explicitly clarify this in the benchmark evaluation sections (e.g., at the start of Sections 4 and 5) and add a paragraph discussing the limitations of such comparisons, including potential differences in data splits, preprocessing, and training protocols. We will also note any publicly available code or standardized benchmarks that could support more controlled future comparisons. revision: yes
Circularity Check
Survey of external models with no internal derivation or self-referential reduction
full rationale
This is a literature survey that organizes previously published point-cloud architectures by backbone type and tabulates their reported benchmark numbers. No equations, fitted parameters, or new predictions are introduced whose values are forced by the paper's own definitions or inputs. All claims rest on external publications and standard datasets; the selection process, while undocumented in detail, does not create a closed loop in which a result is derived from itself. The paper therefore contains no circular steps of the enumerated kinds.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Point cloud data is inherently unordered and irregular, exacerbated by sensor noise and occlusions.
Reference graph
Works this paper leans on
-
[1]
Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, and Bastian Leibe. 2025. DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation.arXiv e-prints(2025), arXiv–2503
work page 2025
-
[2]
Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese
Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. 3D Semantic Parsing of Large-Scale Indoor Spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2016
-
[3]
Matan Atzmon, Haggai Maron, and Yaron Lipman. 2018. Point convolutional neural networks by extension operators.ACM Transactions on Graphics (ToG)37, 4, Article 71 (July 2018), 12 pages. doi:10.1145/3197517.3201301
-
[4]
Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X Chang, and Matthias Nießner. 2019. Scan2CAD: Learning CAD Model Alignment in RGB-D Scans. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2614–2623. doi:10.48550/arXiv.1811.11187
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.11187 2019
-
[5]
Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, and Elad Shulman. 2021. ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Tr...
work page 2021
- [6]
-
[7]
Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam, and Jonathan Li. 2020. Review: Deep Learning on 3D Point Clouds. Remote Sensing12, 11 (Jan. 2020), 1729. doi:10.3390/rs12111729
-
[8]
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InCVPR. 11621–11631
work page 2020
-
[9]
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang
-
[10]
Matterport3D: Learning from RGB-D Data in Indoor Environments.International Conference on 3D Vision (3DV)(2017)
work page 2017
-
[11]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015.ShapeNet: An Information-Rich 3D Model Repository. Technical Report. Stanford University — Princeton University — Toyota Technological Institute at Chicago
work page 2015
-
[12]
Qi, Hao Su, Kaichun Mo, and Leonidas J
R. Qi Charles, Hao Su, Mo Kaichun, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77–85. doi:10.1109/CVPR.2017.16
-
[13]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of The Devil in The Details: Delving Deep into Convolutional Nets.arXiv preprint arXiv:1405.3531(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[14]
Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. 2024. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS)36 (2024)
work page 2024
-
[15]
Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, and Odest Chadwicke Jenkins. 2022. ClearPose: Large-scale Transparent Object Dataset and Benchmark. InProceedings of the European Conference on Computer Vision (ECCV). Vol. 13668. Springer Nature Switzerland, Cham, 381–396
work page 2022
-
[16]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2017
- [17]
-
[18]
Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. 2024. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2024
-
[19]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. InConference on Computer Vision and Pattern Recognition (CVPR)
work page 2009
-
[20]
Mark De Deuge, Alastair Quadros, Calvin Hung, and Bertrand Douillard. 2013. Unsupervised Feature Learning for Classification of Outdoor 3D Scans. InAustralasian Conference on Robotics and Automation 2013 (ACRA 13). Sydney, Australia
work page 2013
-
[21]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InInternational Conference on Learning Representations (ICLR). htt...
work page 2021
-
[23]
Y. Eldar, M. Lindenbaum, M. Porat, and Y.Y. Zeevi. 1997. The farthest point strategy for progressive image sampling.IEEE Transactions on Image Processing6, 9 (1997), 1305–1315. doi:10.1109/83.623193
-
[24]
Jeffrey L Elman. 1990. Finding Structure in Time.Cognitive science14, 2 (1990), 179–211
work page 1990
-
[25]
Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. 2021. Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion D...
-
[26]
Fayjie, Mathijs Lens, and Patrick Vandewalle
Abdur R. Fayjie, Mathijs Lens, and Patrick Vandewalle. 2025. Few-Shot Segmentation of 3D Point Clouds Under Real-World Distributional Shifts in Railroad Infrastructure. 25, 4 (Feb 2025), 1072. doi:10.3390/s25041072
-
[27]
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 264–272. doi:10.1109/CVPR.2018.00035
-
[28]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets Robotics: The KITTI Dataset.International Journal of Robotics Research (IJRR)(2013)
work page 2013
-
[29]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3354–3361. doi:10.1109/CVPR.2012.6248074
-
[30]
Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces.arXiv preprint arXiv:2312.00752(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InFirst Conference on Language Modeling
work page 2024
-
[32]
Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, and Shi-Min Hu. 2021. PCT: Point cloud transformer.Computational Visual Media7, 2 (01 Jun 2021), 187–199. doi:10.1007/s41095-021-0229-5
-
[33]
Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2021. Deep Learning for 3D Point Clouds: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)43, 12 (Dec. 2021), 4338–4364. doi:10.1109/TPAMI.2020.3005434
-
[34]
Timo Hackel, N. Savinov, L. Ladicky, Jan D. Wegner, K. Schindler, and M. Pollefeys. 2017. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. InISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. IV-1-W1. 91–98
work page 2017
-
[35]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. doi:10.1109/CVPR.2016.90
-
[36]
Geoffrey E. Hinton. 2002. Training Products of Experts by Minimizing Contrastive Divergence.Neural Computation14, 8 (08 2002), 1771–1800. doi:10.1162/089976602760128018
-
[37]
A fast learning algorithm for deep belief nets
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets.Neural Computation18, 7 (07 2006), 1527–1554. arXiv:https://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf doi:10.1162/neco.2006.18.7.1527
-
[38]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. InAdvances in Neural Information Processing Systems (NeurIPS), H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840–6851
work page 2020
-
[39]
Sepp Hochreiter and Jurgen Schmidhuber. 1997. Long Short-Term Memory.Neural Computation9, 8 (1997), 1735–1780
work page 1997
-
[40]
Binh-Son Hua, Quang-Hieu Pham, Duc Thanh Nguyen, Minh-Khoi Tran, Lap-Fai Yu, and Sai-Kit Yeung. 2016. SceneNN: A Scene Meshes Dataset with Annotations. InInternational Conference on 3D Vision (3DV)
work page 2016
-
[41]
Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. 2013. A Category-Level 3D Object Dataset: Putting The Kinect to Work.Consumer depth cameras for computer vision: research topics and applications(2013), 141–165
work page 2013
-
[42]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4, Article 139 (2023), 14 pages. doi:10.1145/3592433
-
[43]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114(2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[44]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[45]
Roman Klokov and Victor Lempitsky. 2017. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In2017 IEEE International Conference on Computer Vision (ICCV). 863–872. doi:10.1109/ICCV.2017.99
-
[46]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS), F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc
work page 2012
-
[47]
Truc Le and Ye Duan. 2018. PointGrid: A Deep Network for 3D Shape Understanding. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9204–9214. doi:10.1109/CVPR.2018.00959
-
[48]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning.Nature521, 7553 (May 2015), 436–444. doi:10.1038/nature14539
-
[49]
Howard, Wayne Hubbard, and Lawrence Jackel
Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, R. Howard, Wayne Hubbard, and Lawrence Jackel. 1989. Handwritten Digit Recognition with a Back-Propagation Network. InAdvances in Neural Information Processing Systems (NeurIPS), D. Touretzky (Ed.), Vol. 2. Morgan-Kaufmann
work page 1989
-
[50]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2011. Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks.Commun. ACM54 (10 2011), 95–103. doi:10.1145/2001269.2001295
-
[51]
Chongshou Li, Yuheng Liu, Xinke Li, Yuning Zhang, Tianrui Li, and Junsong Yuan. 2025. Deep Hierarchical Learning for 3D Semantic Segmentation. International Journal of Computer Vision (IJCV)133, 7 (jul 2025), 4420–4441. doi:10.1007/s11263-025-02387-6
-
[52]
Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020. Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. InProceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA, 2020-10-12)(MM ’20). Association for Computing Machinery, 238–...
-
[53]
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: convolution on X-transformed points. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., 828–838
work page 2018
-
[54]
Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang
-
[55]
InProceedings of the AAAI Conference on AI, Vol
Pamba: Enhancing Global Interaction in Point Clouds via State Space Model. InProceedings of the AAAI Conference on AI, Vol. 39. 5092–5100
-
[56]
Yiyi Liao, Jun Xie, and Andreas Geiger. 2022. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D.Pattern Analysis and Machine Intelligence (PAMI)(2022). Manuscript submitted to ACM 26 Minhas Kamal et al
work page 2022
-
[57]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InEuropean Conference on Computer Vision (ECCV). Springer, 740–755
work page 2014
-
[58]
Xinhai Liu, Zhizhong Han, Yu-Shen Liu, and Matthias Zwicker. 2019. Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8778–8785
work page 2019
-
[59]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2015
-
[60]
Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, and Bastian Leibe. 2020. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking.International Journal of Computer Vision (IJCV)(2020)
work page 2020
-
[61]
Shitong Luo and Wei Hu. 2021. Diffusion Probabilistic Models for 3D Point Cloud Generation. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2836–2844. doi:10.1109/CVPR46437.2021.00286
-
[62]
Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. 2022. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. InInternational Conference on Learning Representations (ICLR). https://openreview.net/forum?id=3Pbra-_u76D
work page 2022
-
[63]
Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel X Chang, and Manolis Savva. 2022. MultiScan: Scalable RGBD scanning for 3D environments with articulated objects. InAdvances in Neural Information Processing Systems (NeurIPS)
work page 2022
-
[64]
Daniel Maturana and Sebastian A. Scherer. 2015. VoxNet: A 3D Convolutional Neural Network for real-time object recognition.2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(2015), 922–928. https://api.semanticscholar.org/CorpusID:14620252
work page 2015
-
[65]
Nerf: Representing scenes as neural radiance fields for view synthesis,
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (Dec. 2021), 99–106. doi:10.1145/3503250
-
[66]
Seyed Saber Mohammadi, Yiming Wang, and Alessio Del Bue. 2021. Pointview-GCN: 3D Shape Classification With Multi-View Point Clouds. In 2021 IEEE International Conference on Image Processing (ICIP). 3103–3107. doi:10.1109/ICIP42928.2021.9506426
-
[67]
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[68]
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning Convolutional Neural Networks for Graphs. InInternational Conference on Machine Learning (ICML). PMLR, 2014–2023
work page 2016
-
[69]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. Curran Associates, Inc
work page 2017
-
[70]
Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. 2023. Contrast with reconstruct: contrastive 3D representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML)(Honolulu, Hawaii, USA)(ICML). JMLR.org, Article 1171, 21 pages
work page 2023
- [71]
-
[72]
Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. 2022. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. InAdvances in Neural Information Processing Systems (NeurIPS), Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openrevi...
work page 2022
-
[73]
Bo Qiu, Yuzhou Zhou, Lei Dai, Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, and Bisheng Yang. 2024. WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation. 25, 12 (Dec 2024), 20900–20916. doi:10.1109/TITS.2024.3469546
-
[74]
Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, and Patrick Vandewalle. 2025. BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation. InComputer Vision – ECCV 2024 Workshops(Cham, 2025), Alessio Del Bue, Cristian Canton, Jordi Pont-Tuset, and Tatiana Tommasi (Eds.). Springer Nature Switz...
-
[75]
Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. 2020. Accelerating 3D Deep Learning with PyTorch3D.arXiv:2007.08501(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[76]
Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning Deep 3D Representations at High Resolutions. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2017
-
[77]
Plenoc- trees for real-time rendering of neural radiance fields,
Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. 2021. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. In2021 IEEE/CVF International Conference on Computer Vision (ICCV)(Montreal, QC, Canada, 2021-10). IEEE, 10892–10902. doi:10.1109/IC...
-
[78]
Jason Tyler Rolfe. 2016. Discrete variational autoencoders.arXiv preprint arXiv:1609.02200(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[79]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing, Cham, 234–241
work page 2015
-
[80]
David Rozenberszki, Or Litany, and Angela Dai. 2022. Language-Grounded Indoor 3D Semantic Segmentation in the Wild. InProceedings of the European Conference on Computer Vision (ECCV)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.