pith. machine review for the scientific record. sign in

arxiv: 2605.02098 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords point cloud croppingsemantic segmentation3D deep learninglarge-scale environmentsoutdoor scenesGaussian croppingspherical cropping
0
0 comments X

The pith

Switching from spherical to Gaussian cropping improves 3D semantic segmentation on large outdoor point clouds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large 3D point clouds must be split into smaller subclouds before modern neural networks can process them. Spherical crops are the usual choice, yet they discard geometric context around each piece. The paper tests three alternatives—Gaussian, exponential, and linear cropping—that cover bigger spatial areas while keeping roughly the same number of points inside each subcloud. When the same two network architectures are trained on indoor and outdoor datasets, the non-spherical methods raise accuracy, with the largest gains appearing in outdoor scenes and new state-of-the-art numbers reported. The work therefore shows that the geometry of the crop itself is a controllable lever for better scene understanding.

Core claim

Replacing spherical cropping with Gaussian, exponential, or linear functions produces subclouds whose spatial extent is larger for any fixed point budget, and this change raises semantic segmentation accuracy on large-scale outdoor datasets while leaving indoor results largely unchanged.

What carries the argument

Cropping functions (Gaussian, exponential, linear) that map a target point count to a larger spatial radius than the conventional spherical crop.

If this is right

  • Performance gains appear most clearly in large-scale outdoor scenes.
  • New state-of-the-art segmentation scores are reached on the tested outdoor datasets.
  • Both evaluated network architectures benefit from the change.
  • The released code makes the cropping variants directly reusable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cropping adjustment could be tested on related tasks such as object detection or instance segmentation in point clouds.
  • Combining the larger-radius crops with existing downsampling or voxelization pipelines might reduce memory use further.
  • In LiDAR-based mapping applications, wider context per subcloud could improve consistency across scene boundaries.

Load-bearing premise

Observed accuracy differences are caused by the shape of the crop rather than by unmeasured differences in point density, training schedule, or implementation details.

What would settle it

Re-run the exact same models and datasets with identical hyperparameters and random seeds while forcing every crop to contain the same number of points; if the accuracy gap between spherical and Gaussian crops disappears, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.02098 by Alexander Reiterer, Dominik Merkle, Maximilian Kellner, Michael Brunklaus.

Figure 1
Figure 1. Figure 1: Different architectures tested on S3DIS [2] Area view at source ↗
Figure 2
Figure 2. Figure 2: Probability of a point being selected depending on view at source ↗
Figure 4
Figure 4. Figure 4: Point cardinality for subclouds using a voxel size of 2 cm on the S3DIS dataset. Shaded regions represent the min-max view at source ↗
Figure 5
Figure 5. Figure 5: Influence on training using different point cropping view at source ↗
Figure 6
Figure 6. Figure 6: Performance validation using different cropping view at source ↗
Figure 8
Figure 8. Figure 8: The influence of the probability between all points view at source ↗
read the original abstract

Large-scale 3D point clouds can consist of billions of points. Even after downsampling, these point clouds are too large for modern 3D neural networks. In order to develop a semantic understanding of the scene, the point clouds are divided into smaller subclouds that can be processed. Typically, this division is done using spherical crops, resulting in a loss of surrounding geometric context. To address this issue, we propose alternative methods that produce subclouds with larger crop sizes while maintaining a similar number of points. Specifically, we compare exponential, Gaussian, and linear cropping methods with the spherical method. We evaluated two 3D deep learning model architectures using multiple indoor and outdoor environment datasets. Our results demonstrate that altering the cropping strategy can enhance model performance, especially for large-scale outdoor scenes, yielding new state-of-the-art results. Code is available at https://github.com/mvg-inatech/point_cloud_cropping

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes alternative point cloud cropping strategies (exponential, Gaussian, and linear) to the standard spherical cropping for processing large-scale 3D point clouds in semantic segmentation tasks. These methods aim to provide larger crop sizes while maintaining a similar number of points, thereby preserving more geometric context. The approaches are evaluated using two 3D deep learning model architectures on multiple indoor and outdoor datasets, with claims of enhanced performance, particularly in large-scale outdoor scenes, and achievement of new state-of-the-art results.

Significance. If the results hold and are properly controlled, this could offer a practical improvement for handling large 3D environments without modifying network architectures, potentially benefiting applications in robotics and autonomous systems. The provision of code supports reproducibility and further research.

major comments (2)
  1. Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.
  2. §4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.
minor comments (2)
  1. Abstract: consider including at least one key quantitative result (e.g., mIoU delta on an outdoor dataset) to make the summary of findings more informative.
  2. Methods section: provide explicit equations or pseudocode for the exponential, Gaussian, and linear cropping functions to ensure precise reproducibility beyond the high-level description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work's potential impact and for the detailed comments. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.

    Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version we will incorporate specific performance metrics (such as mIoU improvements on the outdoor datasets) drawn from the experimental results already reported in §4, and we will explicitly state that point counts were equalized to the same cardinality across all cropping strategies. We will also add a brief reference to variance across runs to address statistical significance. revision: yes

  2. Referee: §4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.

    Authors: We thank the referee for highlighting this point. The manuscript already uses identical training protocols (same architectures, hyperparameters, and optimization settings) for all cropping strategies. To make the controls fully explicit, we will expand §4 with a precise description of the point-sampling procedure applied inside each crop region and will add a short analysis confirming that the resulting local point-density distributions remain comparable across methods. These additions will better isolate the contribution of crop geometry. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations or self-referential reductions

full rationale

The manuscript is an experimental study that evaluates four cropping geometries (spherical, exponential, Gaussian, linear) on public indoor/outdoor point-cloud datasets using two standard 3D network architectures. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the abstract or described content. Performance differences are reported as measured outcomes on fixed benchmarks; they do not reduce by construction to quantities defined inside the paper. Any self-citations are incidental and non-load-bearing for the central empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical comparative study; the central claim rests entirely on experimental outcomes rather than any mathematical derivation or new theoretical construct.

pith-pipeline@v0.9.0 · 5469 in / 1133 out tokens · 40247 ms · 2026-05-08T19:09:35.751395+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 71 canonical work pages · 4 internal anchors

  1. [1]

    C. R. Qi, H. Su, K. Mo, L. J. Guibas, Point- Net: Deep Learning on Point Sets for 3D Clas- sification and Segmentation, arXiv:1612.00593 [cs] (Feb. 2016).doi:10.48550/arXiv.1612. 00593. URLhttp://arxiv.org/abs/1612.00593

  2. [2]

    Joint 2D-3D-Semantic Data for Indoor Scene Understanding

    I. Armeni, S. Sax, A. R. Zamir, S. Savarese, Joint 2D-3D-Semantic Data for Indoor Scene Understanding, arXiv:1702.01105 [cs] (Apr. 2017).doi:10.48550/arXiv.1702.01105. URLhttp://arxiv.org/abs/1702.01105

  3. [3]

    Armeni, O

    I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, S. Savarese, 3D Seman- tic Parsing of Large-Scale Indoor Spaces, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, 2016, pp. 1534–1543. doi:10.1109/CVPR.2016.170. URLhttp://ieeexplore.ieee.org/ document/7780539/

  4. [4]

    G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, B. Ghanem, PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies, arXiv:2206.04670 [cs] (Oct. 2022).doi:10. 48550/arXiv.2206.04670. URLhttp://arxiv.org/abs/2206.04670

  5. [5]

    Thomas, Y.-H

    H. Thomas, Y.-H. H. Tsai, T. D. Bar- foot, J. Zhang, KPConvX: Modernizing Ker- nel Point Convolution with Kernel Attention, arXiv:2405.13194 [cs] (May 2024).doi:10. 48550/arXiv.2405.13194. URLhttp://arxiv.org/abs/2405.13194

  6. [6]

    Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, M. Bennamoun, Deep Learning for 3D Point Clouds: A Survey, IEEE Transac- tions on Pattern Analysis and Machine Intelligence 43 (12) (2021) 4338–4364. doi:10.1109/TPAMI.2020.3005434. URLhttps://ieeexplore.ieee.org/ document/9127813/

  7. [7]

    Zhang, Y

    R. Zhang, Y. Wu, W. Jin, X. Meng, Deep-Learning-Based Point Cloud Se- mantic Segmentation: A Survey, Elec- tronics 12 (17) (2023) 3642.doi: 10.3390/electronics12173642. URLhttps://www.mdpi.com/2079-9292/ 12/17/3642

  8. [8]

    Li, L., Bayuelo, A., Bobadilla, L., Alam, T., and Shell, D

    A. Milioto, I. Vizzo, J. Behley, C. Stach- niss, RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Macau, China, 2019, pp. 4213–4220. doi:10.1109/IROS40897.2019.8967762. URLhttps://ieeexplore.ieee.org/ document/8967762/

  9. [9]

    Cortinhal, G

    T. Cortinhal, G. Tzelepis, E. Erdal Aksoy, SalsaNext: Fast, Uncertainty-Aware Seman- tic Segmentation of LiDAR Point Clouds, in: G. Bebis, Z. Yin, E. Kim, J. Bender, K. Subr, B. C. Kwon, J. Zhao, D. Kalkofen, G. Baciu (Eds.), Advances in Visual Com- puting, Vol. 12510, Springer International Publishing, Cham, 2020, pp. 207–222, series 12 Title: Lecture No...

  10. [10]

    B. Wu, A. Wan, X. Yue, K. Keutzer, Squeeze- Seg: Convolutional Neural Nets with Recur- rent CRF for Real-Time Road-Object Seg- mentation from 3D LiDAR Point Cloud, arXiv:1710.07368 [cs] (Oct. 2017).doi:10. 48550/arXiv.1710.07368. URLhttp://arxiv.org/abs/1710.07368

  11. [11]

    B. Wu, X. Zhou, S. Zhao, X. Yue, K. Keutzer, SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, Montreal, QC, Canada, 2019, pp. 4376–4382. doi:10.1109/ICRA.2019.8793495. URLhttps://ieeexplore.ieee.org/ d...

  12. [12]

    E. E. Aksoy, S. Baci, S. Cavdar, SalsaNet: Fast Road and Vehicle Segmentation in Li- DAR Point Clouds for Autonomous Driving, arXiv:1909.08291 [cs] (Sep. 2019).doi:10. 48550/arXiv.1909.08291. URLhttp://arxiv.org/abs/1909.08291

  13. [13]

    DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

    Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, H. Foroosh, PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 9598–9607. doi:10.1109/CVPR42600.2020.00962. URLhttps://ieeexplore.ieee.org/ docume...

  14. [14]

    Avidan, G

    G.Shi, R.Li, C.Ma, PillarNet: Real-Timeand High-Performance Pillar-Based 3D Object De- tection, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13670, Springer Nature Switzerland, Cham, 2022, pp. 35–52, series Title: Lecture Notes in Computer Sci- ence.doi:10.1007/978-3-031-20080-9_3. URLhttps://l...

  15. [15]

    Lee, Jialiang Alan Zhao, Amrita S

    M. Gerdzhev, R. Razani, E. Taghavi, L. Bingbing, TORNADO-Net: mulTi- view tOtal vaRiatioN semAntic segmen- tation with Diamond inceptiOn module, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Xi’an, China, 2021, pp. 9543–9549. doi:10.1109/ICRA48506.2021.9562041. URLhttps://ieeexplore.ieee.org/ document/9562041/

  16. [16]

    K. Chen, R. Oldja, N. Smolyanskiy, S. Birch- field, A. Popov, D. Wehr, I. Eden, J. Pehserl, MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views, arXiv:2006.05518 [cs] (Aug. 2020).doi:10.48550/arXiv.2006.05518. URLhttp://arxiv.org/abs/2006.05518

  17. [17]

    Y. A. Alnaggar, M. Afifi, K. Amer, M. Elhelw, Multi Projection Fusion for Real-time Seman- tic Segmentation of 3D LiDAR Point Clouds, arXiv:2011.01974 [cs] (Nov. 2020).doi:10. 48550/arXiv.2011.01974. URLhttp://arxiv.org/abs/2011.01974

  18. [18]

    Çiçek, A

    Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation, in: S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, W. Wells (Eds.), Medical Image Computing and Computer- Assisted Intervention – MICCAI 2016, Springer International Publishing, Cham, 2016, pp. 424–432

  19. [19]

    Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct

    L.P.Tchapmi, C.B.Choy, I.Armeni, J.Gwak, S. Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct. 2017).doi:10.48550/arXiv.1710. 07563. URLhttp://arxiv.org/abs/1710.07563

  20. [20]

    Graham, M

    B. Graham, M. Engelcke, L. V. D. Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, in: 2018 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 9224–9232. doi:10.1109/CVPR.2018.00961. URLhttps://ieeexplore.ieee.org/ document/8579059/ 13

  21. [21]

    C. Choy, J. Gwak, S. Savarese, 4D Spatio- Temporal ConvNets: Minkowski Convolu- tional Neural Networks, in: 2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 3070–3079. doi:10.1109/CVPR.2019.00319. URLhttps://ieeexplore.ieee.org/ document/8953494/

  22. [22]

    Contributors, Spconv: Spatially Sparse Convolution Library (2022)

    S. Contributors, Spconv: Spatially Sparse Convolution Library (2022). URLhttps://github.com/traveller59/ spconv

  23. [23]

    X. Ding, X. Zhang, J. Han, G. Ding, Scal- ing Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11953–11965. doi:10.1109/CVPR52688.2022.01166. URLhttps://ieeexplore.ieee.org/ document/9880273/

  24. [24]

    Y. Chen, J. Liu, X. Zhang, X. Qi, J. Jia, LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs, in: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), IEEE, Vancou- ver, BC, Canada, 2023, pp. 13488–13498. doi:10.1109/CVPR52729.2023.01296. URLhttps://ieeexplore.ieee.org/ document/10203060/

  25. [25]

    T. Feng, W. Wang, F. Ma, Y. Yang, LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels, in: 2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 14916–14927. doi:10.1109/CVPR52733.2024.01413. URLhttps://ieeexplore.ieee.org/ document/10656196/

  26. [26]

    B. Peng, X. Wu, L. Jiang, Y. Chen, H. Zhao, Z. Tian, J. Jia, OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 21305– 21315.doi:10.1109/CVPR52733.2024.02013. URLhttps://ieeexplore.ieee.org/ document/10655421/

  27. [27]

    C. R. Qi, L. Yi, H. Su, L. J. Guibas, Point- Net++: Deep Hierarchical Feature Learn- ing on Point Sets in a Metric Space, arXiv:1706.02413 [cs] (Jun. 2017).doi:10. 48550/arXiv.1706.02413. URLhttp://arxiv.org/abs/1706.02413

  28. [28]

    Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, RandLA- Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11105–11114. doi:10.1109/CVPR42600.2020.01112. URLhttps://ieeexplore.ieee.org/ document/9156466/

  29. [29]

    Landrieu, M

    L. Landrieu, M. Simonovsky, Large-scale Point CloudSemanticSegmentationwithSuperpoint Graphs, arXiv:1711.09869 [cs] (Mar. 2018). doi:10.48550/arXiv.1711.09869. URLhttp://arxiv.org/abs/1711.09869

  30. [30]

    Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic Graph CNN for Learning on Point Clouds, ACM Transactions on Graphics 38 (5) (2019) 1–12. doi:10.1145/3326362. URLhttps://dl.acm.org/doi/10.1145/ 3326362

  31. [31]

    H. Lei, N. Akhtar, A. Mian, SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11608–11617. doi:10.1109/CVPR42600.2020.01163. URLhttps://ieeexplore.ieee.org/ document/9157177/

  32. [32]

    Tatarchenko, J

    M. Tatarchenko, J. Park, V. Koltun, Q.-Y. Zhou, Tangent Convolutions for Dense Predic- tion in 3D, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 3887–3896.doi:10.1109/CVPR.2018.00409. URLhttps://ieeexplore.ieee.org/ document/8578507/

  33. [33]

    W. Wu, Z. Qi, L. Fuxin, PointConv: Deep Convolutional Networks on 3D Point Clouds, arXiv:1811.07246 [cs] (Nov. 2020).doi:10. 14 48550/arXiv.1811.07246. URLhttp://arxiv.org/abs/1811.07246

  34. [34]

    Thomas, C

    H. Thomas, C. R. Qi, J.-E. Deschaud, B. Mar- cotegui, F. Goulette, L. J. Guibas, KPConv: Flexible and Deformable Convolution for Point Clouds, arXiv:1904.08889 [cs] (Aug. 2019). doi:10.48550/arXiv.1904.08889. URLhttp://arxiv.org/abs/1904.08889

  35. [35]

    X. Li, Z. Zhang, Y. Li, M. Huang, J. Zhang, SFL-NET: Slight Filter Learn- ing Network for Point Cloud Semantic Segmentation, IEEE Transactions on Geo- science and Remote Sensing 61 (2023) 1–14. doi:10.1109/TGRS.2023.3313876. URLhttps://ieeexplore.ieee.org/ document/10250869/

  36. [36]

    H. Zhao, L. Jiang, J. Jia, P. Torr, V. Koltun, Point Transformer, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16239–16248. doi:10.1109/ICCV48922.2021.01595. URLhttps://ieeexplore.ieee.org/ document/9710703/

  37. [37]

    Computa- tional Visual Media 7:187–199

    M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, S.-M. Hu, PCT: Point cloud transformer, Computational Visual Media 7 (2) (2021) 187–199. doi:10.1007/s41095-021-0229-5. URLhttps://ieeexplore.ieee.org/ document/10897555/

  38. [38]

    X. Wu, Y. Lao, L. Jiang, X. Liu, H. Zhao, Point Transformer V2: Grouped Vector Attention and Partition-based Pooling, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Bel- grave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 33330–33342. URLhttps://proceedings.neurips. cc/paper_files/paper/20...

  39. [39]

    X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He, H. Zhao, Point Transformer V3: Simpler, Faster, Stronger, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 4840– 4851.doi:10.1109/CVPR52733.2024.00463. URLhttps://ieeexplore.ieee.org/ document/10658198/

  40. [40]

    Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, B. Guo, Swin Transformer V2: Scal- ing Up Capacity and Resolution, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11999–12009. doi:10.1109/CVPR52688.2022.01170. URLhttps://ieeexplore.ieee.org/ do...

  41. [41]

    Yang, Y.-X

    Y.-Q. Yang, Y.-X. Guo, J.-Y. Xiong, Y. Liu, H. Pan, P.-S. Wang, X. Tong, B. Guo, Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding, Compu- tational Visual Media 11 (1) (2025) 83–101. doi:10.26599/CVM.2025.9450383. URLhttps://ieeexplore.ieee.org/ document/10901941/

  42. [42]

    Kellner, B

    M. Kellner, B. Stahl, A. Reiterer, Fused Projection-Based Point Cloud Segmen- tation, Sensors 22 (3) (2022) 1139. doi:10.3390/s22031139. URLhttps://www.mdpi.com/1424-8220/ 22/3/1139

  43. [43]

    2022).doi:10.48550/arXiv.2206

    Y.Hou, X.Zhu, Y.Ma, C.C.Loy, Y.Li, Point- to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation, arXiv:2206.02099 [cs] (Jun. 2022).doi:10.48550/arXiv.2206. 02099. URLhttp://arxiv.org/abs/2206.02099

  44. [44]

    J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, S. Pu, RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for Li- DAR Point Cloud Segmentation, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16004–16013. doi:10.1109/ICCV48922.2021.01572. URLhttps://ieeexplore.ieee.org/ document/9709941/

  45. [45]

    In: Proceedings of the IEEE/CVF ICCV, pp

    M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, A. Joulin, Emerging Properties in Self-Supervised Vision Trans- formers, in: 2021 IEEE/CVF International 15 Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021, pp. 9630– 9640.doi:10.1109/ICCV48922.2021.00951. URLhttps://ieeexplore.ieee.org/ document/9709990/

  46. [46]

    K. He, X. Chen, S. Xie, Y. Li, P. Dollar, R. Girshick, Masked Autoencoders Are Scal- able Vision Learners, in: 2022 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 15979–15988. doi:10.1109/CVPR52688.2022.01553. URLhttps://ieeexplore.ieee.org/ document/9879206/

  47. [47]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, P. Bojanowski, DI- NOv2: Learning Robust Visual Features with- out Supervisi...

  48. [48]

    DINOv3

    O. Siméoni, H. V. Vo, M. Seitzer, F. Bal- dassarre, M. Oquab, C. Jose, V. Khali- dov, M. Szafraniec, S. Yi, M. Ramamon- jisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sen- tana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. Jé- gou, P. Labatut, P. Bojanowski, DINOv3, arXiv:2508.10104 [cs] (Aug. 202...

  49. [49]

    S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, O. Litany, PointContrast: Unsupervised Pre- training for 3D Point Cloud Understand- ing, in: A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision – ECCV 2020, Springer International Publish- ing, Cham, 2020, pp. 574–591

  50. [50]

    Y. Pang, W. Wang, F. E. H. Tay, W. Liu, Y. Tian, L. Yuan, Masked Autoencoders for Point Cloud Self-supervised Learning, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13662, Springer Nature Switzerland, Cham, 2022, pp. 604–621, series Title: Lecture Notes in Computer Science. doi:10.1007/978-3-03...

  51. [51]

    X. Wu, X. Wen, X. Liu, H. Zhao, Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning, in: 2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Vancouver, BC, Canada, 2023, pp. 9415–9424. doi:10.1109/CVPR52729.2023.00908. URLhttps://ieeexplore.ieee.org/ document/10203752/

  52. [52]

    X. Wu, D. DeTone, D. Frost, T. Shen, C. Xie, N. Yang, J. Engel, R. Newcombe, H. Zhao, J. Straub, Sonata: Self-Supervised Learning of Reliable Point Representations, arXiv:2503.16429 [cs] (Mar. 2025).doi:10. 48550/arXiv.2503.16429. URLhttp://arxiv.org/abs/2503.16429

  53. [53]

    arXiv preprint arXiv:2510.23607 (2025) 19 36 N

    Y. Zhang, X. Wu, Y. Lao, C. Wang, Z. Tian, N. Wang, H. Zhao, Concerto: Joint 2D- 3D Self-Supervised Learning Emerges Spatial Representations, arXiv:2510.23607 [cs] (Oct. 2025).doi:10.48550/arXiv.2510.23607. URLhttp://arxiv.org/abs/2510.23607

  54. [54]

    H. Zhu, H. Yang, X. Wu, D. Huang, S. Zhang, X. He, H. Zhao, C. Shen, Y. Qiao, T. He, W. Ouyang, PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre- trainingParadigm, arXiv:2310.08586[cs](Apr. 2025).doi:10.48550/arXiv.2310.08586. URLhttp://arxiv.org/abs/2310.08586

  55. [55]

    Behley, M

    J. Behley, M. Garbade, A. Milioto, J. Quen- zel, S. Behnke, C. Stachniss, J. Gall, SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences, arXiv:1904.01416 [cs] (Aug. 2019).doi:10. 48550/arXiv.1904.01416. URLhttp://arxiv.org/abs/1904.01416

  56. [56]

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, S. Zhao, S. Cheng, Y. Zhang, J. Shlens, 16 Z. Chen, D. Anguelov, Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset, arXiv:1912.0483...

  57. [57]

    Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

    H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, nuScenes: A mul- timodal dataset for autonomous driving, arXiv:1903.11027 [cs] (May 2020).doi:10. 48550/arXiv.1903.11027. URLhttp://arxiv.org/abs/1903.11027

  58. [58]

    Roynard, J.-E

    X. Roynard, J.-E. Deschaud, F. Goulette, Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification, The International Journal of Robotics Re- search 37 (6) (2018) 545–557.doi:10.1177/ 0278364918767506. URLhttps://journals.sagepub.com/doi/ 10.1177/0278364918767506

  59. [59]

    W. Tan, N. Qin, L. Ma, Y. Li, J. Du, G. Cai, K. Yang, J. Li, Toronto-3D: A Large-scale Mo- bile LiDAR Dataset for Semantic Segmenta- tion of Urban Roadways, in: 2020 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition Workshops (CVPRW), IEEE, Seattle, WA, USA, 2020, pp. 797–806. doi:10.1109/CVPRW50498.2020.00109. URLhttps://ieeexplore.ieee.or...

  60. [61]

    Thomas, F

    H. Thomas, F. Goulette, J.-E. Deschaud, B. Marcotegui, Y. LeGall, Semantic Clas- sification of 3D Point Clouds with Multi- scale Spherical Neighborhoods, in: 2018 International Conference on 3D Vision (3DV), IEEE, Verona, 2018, pp. 390–398. doi:10.1109/3DV.2018.00052. URLhttps://ieeexplore.ieee.org/ document/8490990/

  61. [62]

    Varney, V

    N. Varney, V. K. Asari, Q. Graehling, Pyra- mid Point: A Multi-Level Focusing Network for Revisiting Feature Layers (2020).doi: 10.48550/ARXIV.2011.08692. URLhttps://arxiv.org/abs/2011.08692

  62. [63]

    S. Yoo, Y. Jeong, M. Jameela, G. Sohn, Human Vision Based 3D Point Cloud Se- mantic Segmentation of Large-Scale Outdoor Scenes, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Vancou- ver, BC, Canada, 2023, pp. 6577–6586. doi:10.1109/CVPRW59228.2023.00699. URLhttps://ieeexplore.ieee.org/ document/10208664/

  63. [64]

    S. Yoo, Y. Jeong, M. M. Sheikholeslami, G. Sohn, EyeNet++: A Multiscale and Multidensity Approach for Outdoor 3-D Semantic Segmentation Inspired by the Hu- man Visual Field, IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1–19.doi:10.1109/TGRS.2025.3589287. URLhttps://ieeexplore.ieee.org/ document/11080501/

  64. [65]

    Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)

    S. Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)

  65. [66]

    M. Fey, J. E. Lenssen, Fast Graph Repre- sentation Learning with PyTorch Geometric, arXiv:1903.02428 [cs] (Apr. 2019).doi:10. 48550/arXiv.1903.02428. URLhttp://arxiv.org/abs/1903.02428

  66. [67]

    T. Dao, D. Y. Fu, S. Ermon, A. Rudra, C. Ré, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, version Number: 2 (2022).doi:10.48550/ARXIV. 2205.14135. URLhttps://arxiv.org/abs/2205.14135

  67. [68]

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    T. Dao, FlashAttention-2: Faster Attention with Better Parallelism and Work Partition- ing, version Number: 1 (2023).doi:10. 48550/ARXIV.2307.08691. URLhttps://arxiv.org/abs/2307.08691

  68. [69]

    Ioffe, C

    S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Re- ducing Internal Covariate Shift, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd In- ternational Conference on Machine Learning, 17 Vol. 37 of Proceedings of Machine Learning Research, PMLR, Lille, France, 2015, pp. 448–456. URLhttps://proceedings.mlr.press/ v37/ioffe15.html

  69. [70]

    J. L. Ba, J. R. Kiros, G. E. Hinton, Layer Nor- malization, version Number: 1 (2016).doi: 10.48550/ARXIV.1607.06450. URLhttps://arxiv.org/abs/1607.06450

  70. [71]

    C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cour- napeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. Van Kerkwijk, M. Brett, A. Hal- dane, J. F. Del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T. E. Oliphant, Array programmin...

  71. [72]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d....

  72. [73]

    S. K. Lam, A. Pitrou, S. Seibert, Numba: a LLVM-based Python JIT compiler, in: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, ACM, Austin Texas, 2015, pp. 1– 6.doi:10.1145/2833157.2833162. URLhttps://dl.acm.org/doi/10.1145/ 2833157.2833162

  73. [74]

    Decoupled Weight Decay Regularization

    I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, version Number: 3 (2017).doi:10.48550/ARXIV.1711.05101. URLhttps://arxiv.org/abs/1711.05101

  74. [75]

    L. N. Smith, N. Topin, Super-Convergence: Very Fast Training of Neural Networks Us- ing Large Learning Rates, version Number: 3 (2017).doi:10.48550/ARXIV.1708.07120. URLhttps://arxiv.org/abs/1708.07120

  75. [76]

    Rethinking the Inception Architecture for Computer Vision

    C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the Inception Architec- ture for Computer Vision, version Number: 3 (2015).doi:10.48550/ARXIV.1512.00567. URLhttps://arxiv.org/abs/1512.00567

  76. [77]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020).doi:10.48550/ ARXIV.2010.11929. URLhttps://arxiv.org/abs/2010.11929

  77. [78]

    X. Jiao, C. Lv, J. Zhao, R. Yi, Y.-H. Wen, Z. Pan, Z. Wu, Y.-J. Liu, Weighted Poisson- disk Resampling on Large-Scale Point Clouds, Proceedings of the AAAI Conference on Artificial Intelligence 39 (4) (2025) 4084–4092. doi:10.1609/aaai.v39i4.32428. URLhttps://ojs.aaai.org/index.php/ AAAI/article/view/32428

  78. [79]

    Kellner, A

    M. Kellner, A. Schmitt, A. Reiterer, Automatic Generation of 3D Bridge Models from 3D Point Clouds, Re- sults in Engineering (2026) 109532doi: 10.1016/j.rineng.2026.109532. URLhttps://linkinghub.elsevier.com/ retrieve/pii/S2590123026005724 18