arxiv: 2605.02098 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments

Maximilian Kellner , Dominik Merkle , Michael Brunklaus , Alexander Reiterer

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords point cloud croppingsemantic segmentation3D deep learninglarge-scale environmentsoutdoor scenesGaussian croppingspherical cropping

0 comments

The pith

Switching from spherical to Gaussian cropping improves 3D semantic segmentation on large outdoor point clouds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large 3D point clouds must be split into smaller subclouds before modern neural networks can process them. Spherical crops are the usual choice, yet they discard geometric context around each piece. The paper tests three alternatives—Gaussian, exponential, and linear cropping—that cover bigger spatial areas while keeping roughly the same number of points inside each subcloud. When the same two network architectures are trained on indoor and outdoor datasets, the non-spherical methods raise accuracy, with the largest gains appearing in outdoor scenes and new state-of-the-art numbers reported. The work therefore shows that the geometry of the crop itself is a controllable lever for better scene understanding.

Core claim

Replacing spherical cropping with Gaussian, exponential, or linear functions produces subclouds whose spatial extent is larger for any fixed point budget, and this change raises semantic segmentation accuracy on large-scale outdoor datasets while leaving indoor results largely unchanged.

What carries the argument

Cropping functions (Gaussian, exponential, linear) that map a target point count to a larger spatial radius than the conventional spherical crop.

If this is right

Performance gains appear most clearly in large-scale outdoor scenes.
New state-of-the-art segmentation scores are reached on the tested outdoor datasets.
Both evaluated network architectures benefit from the change.
The released code makes the cropping variants directly reusable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cropping adjustment could be tested on related tasks such as object detection or instance segmentation in point clouds.
Combining the larger-radius crops with existing downsampling or voxelization pipelines might reduce memory use further.
In LiDAR-based mapping applications, wider context per subcloud could improve consistency across scene boundaries.

Load-bearing premise

Observed accuracy differences are caused by the shape of the crop rather than by unmeasured differences in point density, training schedule, or implementation details.

What would settle it

Re-run the exact same models and datasets with identical hyperparameters and random seeds while forcing every crop to contain the same number of points; if the accuracy gap between spherical and Gaussian crops disappears, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2605.02098 by Alexander Reiterer, Dominik Merkle, Maximilian Kellner, Michael Brunklaus.

**Figure 1.** Figure 1: Different architectures tested on S3DIS [2] Area view at source ↗

**Figure 2.** Figure 2: Probability of a point being selected depending on view at source ↗

**Figure 4.** Figure 4: Point cardinality for subclouds using a voxel size of 2 cm on the S3DIS dataset. Shaded regions represent the min-max view at source ↗

**Figure 5.** Figure 5: Influence on training using different point cropping view at source ↗

**Figure 6.** Figure 6: Performance validation using different cropping view at source ↗

**Figure 8.** Figure 8: The influence of the probability between all points view at source ↗

read the original abstract

Large-scale 3D point clouds can consist of billions of points. Even after downsampling, these point clouds are too large for modern 3D neural networks. In order to develop a semantic understanding of the scene, the point clouds are divided into smaller subclouds that can be processed. Typically, this division is done using spherical crops, resulting in a loss of surrounding geometric context. To address this issue, we propose alternative methods that produce subclouds with larger crop sizes while maintaining a similar number of points. Specifically, we compare exponential, Gaussian, and linear cropping methods with the spherical method. We evaluated two 3D deep learning model architectures using multiple indoor and outdoor environment datasets. Our results demonstrate that altering the cropping strategy can enhance model performance, especially for large-scale outdoor scenes, yielding new state-of-the-art results. Code is available at https://github.com/mvg-inatech/point_cloud_cropping

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes alternative point cloud cropping strategies (exponential, Gaussian, and linear) to the standard spherical cropping for processing large-scale 3D point clouds in semantic segmentation tasks. These methods aim to provide larger crop sizes while maintaining a similar number of points, thereby preserving more geometric context. The approaches are evaluated using two 3D deep learning model architectures on multiple indoor and outdoor datasets, with claims of enhanced performance, particularly in large-scale outdoor scenes, and achievement of new state-of-the-art results.

Significance. If the results hold and are properly controlled, this could offer a practical improvement for handling large 3D environments without modifying network architectures, potentially benefiting applications in robotics and autonomous systems. The provision of code supports reproducibility and further research.

major comments (2)

Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.
§4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.

minor comments (2)

Abstract: consider including at least one key quantitative result (e.g., mIoU delta on an outdoor dataset) to make the summary of findings more informative.
Methods section: provide explicit equations or pseudocode for the exponential, Gaussian, and linear cropping functions to ensure precise reproducibility beyond the high-level description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work's potential impact and for the detailed comments. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: Abstract: the claim of performance gains and new state-of-the-art results is asserted without any numerical scores, statistical significance tests, or details on how point counts were equalized across methods, leaving the central claim unsupported by visible evidence.

Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised version we will incorporate specific performance metrics (such as mIoU improvements on the outdoor datasets) drawn from the experimental results already reported in §4, and we will explicitly state that point counts were equalized to the same cardinality across all cropping strategies. We will also add a brief reference to variance across runs to address statistical significance. revision: yes
Referee: §4 (Experimental Evaluation): the attribution of gains to crop geometry is insecure without explicit controls confirming identical point sampling, density, and training protocols across strategies. The abstract states alternatives maintain 'a similar number of points' but does not specify the exact sampling procedure inside each crop region; non-uniform outdoor densities could therefore produce systematically different local statistics even at matched cardinality.

Authors: We thank the referee for highlighting this point. The manuscript already uses identical training protocols (same architectures, hyperparameters, and optimization settings) for all cropping strategies. To make the controls fully explicit, we will expand §4 with a precise description of the point-sampling procedure applied inside each crop region and will add a short analysis confirming that the resulting local point-density distributions remain comparable across methods. These additions will better isolate the contribution of crop geometry. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations or self-referential reductions

full rationale

The manuscript is an experimental study that evaluates four cropping geometries (spherical, exponential, Gaussian, linear) on public indoor/outdoor point-cloud datasets using two standard 3D network architectures. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the abstract or described content. Performance differences are reported as measured outcomes on fixed benchmarks; they do not reduce by construction to quantities defined inside the paper. Any self-citations are incidental and non-load-bearing for the central empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical comparative study; the central claim rests entirely on experimental outcomes rather than any mathematical derivation or new theoretical construct.

pith-pipeline@v0.9.0 · 5469 in / 1133 out tokens · 40247 ms · 2026-05-08T19:09:35.751395+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

p_g = e^{-(d/σ_d)^2}; p_e = e^{-λd}; p_l = (d_m - d)/d_m; p_s = 1 if d<d_m else 0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 71 canonical work pages · 4 internal anchors

[1]

C. R. Qi, H. Su, K. Mo, L. J. Guibas, Point- Net: Deep Learning on Point Sets for 3D Clas- sification and Segmentation, arXiv:1612.00593 [cs] (Feb. 2016).doi:10.48550/arXiv.1612. 00593. URLhttp://arxiv.org/abs/1612.00593

work page doi:10.48550/arxiv.1612 2016
[2]

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

I. Armeni, S. Sax, A. R. Zamir, S. Savarese, Joint 2D-3D-Semantic Data for Indoor Scene Understanding, arXiv:1702.01105 [cs] (Apr. 2017).doi:10.48550/arXiv.1702.01105. URLhttp://arxiv.org/abs/1702.01105

work page Pith review doi:10.48550/arxiv.1702.01105 2017
[3]

Armeni, O

I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, S. Savarese, 3D Seman- tic Parsing of Large-Scale Indoor Spaces, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, 2016, pp. 1534–1543. doi:10.1109/CVPR.2016.170. URLhttp://ieeexplore.ieee.org/ document/7780539/

work page doi:10.1109/cvpr.2016.170 2016
[4]

G. Qian, Y. Li, H. Peng, J. Mai, H. A. A. K. Hammoud, M. Elhoseiny, B. Ghanem, PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies, arXiv:2206.04670 [cs] (Oct. 2022).doi:10. 48550/arXiv.2206.04670. URLhttp://arxiv.org/abs/2206.04670

work page arXiv 2022
[5]

Thomas, Y.-H

H. Thomas, Y.-H. H. Tsai, T. D. Bar- foot, J. Zhang, KPConvX: Modernizing Ker- nel Point Convolution with Kernel Attention, arXiv:2405.13194 [cs] (May 2024).doi:10. 48550/arXiv.2405.13194. URLhttp://arxiv.org/abs/2405.13194

work page arXiv 2024
[6]

Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, M. Bennamoun, Deep Learning for 3D Point Clouds: A Survey, IEEE Transac- tions on Pattern Analysis and Machine Intelligence 43 (12) (2021) 4338–4364. doi:10.1109/TPAMI.2020.3005434. URLhttps://ieeexplore.ieee.org/ document/9127813/

work page doi:10.1109/tpami.2020.3005434 2021
[7]

Zhang, Y

R. Zhang, Y. Wu, W. Jin, X. Meng, Deep-Learning-Based Point Cloud Se- mantic Segmentation: A Survey, Elec- tronics 12 (17) (2023) 3642.doi: 10.3390/electronics12173642. URLhttps://www.mdpi.com/2079-9292/ 12/17/3642

work page doi:10.3390/electronics12173642 2023
[8]

Li, L., Bayuelo, A., Bobadilla, L., Alam, T., and Shell, D

A. Milioto, I. Vizzo, J. Behley, C. Stach- niss, RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Macau, China, 2019, pp. 4213–4220. doi:10.1109/IROS40897.2019.8967762. URLhttps://ieeexplore.ieee.org/ document/8967762/

work page doi:10.1109/iros40897.2019.8967762 2019
[9]

Cortinhal, G

T. Cortinhal, G. Tzelepis, E. Erdal Aksoy, SalsaNext: Fast, Uncertainty-Aware Seman- tic Segmentation of LiDAR Point Clouds, in: G. Bebis, Z. Yin, E. Kim, J. Bender, K. Subr, B. C. Kwon, J. Zhao, D. Kalkofen, G. Baciu (Eds.), Advances in Visual Com- puting, Vol. 12510, Springer International Publishing, Cham, 2020, pp. 207–222, series 12 Title: Lecture No...

work page doi:10.1007/978-3-030-64559-5_16 2020
[10]

B. Wu, A. Wan, X. Yue, K. Keutzer, Squeeze- Seg: Convolutional Neural Nets with Recur- rent CRF for Real-Time Road-Object Seg- mentation from 3D LiDAR Point Cloud, arXiv:1710.07368 [cs] (Oct. 2017).doi:10. 48550/arXiv.1710.07368. URLhttp://arxiv.org/abs/1710.07368

work page arXiv 2017
[11]

B. Wu, X. Zhou, S. Zhao, X. Yue, K. Keutzer, SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road- Object Segmentation from a LiDAR Point Cloud, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, Montreal, QC, Canada, 2019, pp. 4376–4382. doi:10.1109/ICRA.2019.8793495. URLhttps://ieeexplore.ieee.org/ d...

work page doi:10.1109/icra.2019.8793495 2019
[12]

E. E. Aksoy, S. Baci, S. Cavdar, SalsaNet: Fast Road and Vehicle Segmentation in Li- DAR Point Clouds for Autonomous Driving, arXiv:1909.08291 [cs] (Sep. 2019).doi:10. 48550/arXiv.1909.08291. URLhttp://arxiv.org/abs/1909.08291

work page arXiv 1909
[13]

DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

Y. Zhang, Z. Zhou, P. David, X. Yue, Z. Xi, B. Gong, H. Foroosh, PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 9598–9607. doi:10.1109/CVPR42600.2020.00962. URLhttps://ieeexplore.ieee.org/ docume...

work page doi:10.1109/cvpr42600.2020.00962 2020
[14]

Avidan, G

G.Shi, R.Li, C.Ma, PillarNet: Real-Timeand High-Performance Pillar-Based 3D Object De- tection, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13670, Springer Nature Switzerland, Cham, 2022, pp. 35–52, series Title: Lecture Notes in Computer Sci- ence.doi:10.1007/978-3-031-20080-9_3. URLhttps://l...

work page doi:10.1007/978-3-031-20080-9_3 2022
[15]

Lee, Jialiang Alan Zhao, Amrita S

M. Gerdzhev, R. Razani, E. Taghavi, L. Bingbing, TORNADO-Net: mulTi- view tOtal vaRiatioN semAntic segmen- tation with Diamond inceptiOn module, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Xi’an, China, 2021, pp. 9543–9549. doi:10.1109/ICRA48506.2021.9562041. URLhttps://ieeexplore.ieee.org/ document/9562041/

work page doi:10.1109/icra48506.2021.9562041 2021
[16]

K. Chen, R. Oldja, N. Smolyanskiy, S. Birch- field, A. Popov, D. Wehr, I. Eden, J. Pehserl, MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views, arXiv:2006.05518 [cs] (Aug. 2020).doi:10.48550/arXiv.2006.05518. URLhttp://arxiv.org/abs/2006.05518

work page doi:10.48550/arxiv.2006.05518 2006
[17]

Y. A. Alnaggar, M. Afifi, K. Amer, M. Elhelw, Multi Projection Fusion for Real-time Seman- tic Segmentation of 3D LiDAR Point Clouds, arXiv:2011.01974 [cs] (Nov. 2020).doi:10. 48550/arXiv.2011.01974. URLhttp://arxiv.org/abs/2011.01974

work page arXiv 2011
[18]

Çiçek, A

Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation, in: S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, W. Wells (Eds.), Medical Image Computing and Computer- Assisted Intervention – MICCAI 2016, Springer International Publishing, Cham, 2016, pp. 424–432

2016
[19]

Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct

L.P.Tchapmi, C.B.Choy, I.Armeni, J.Gwak, S. Savarese, SEGCloud: Semantic Segmen- tation of 3D Point Clouds, arXiv:1710.07563 [cs] (Oct. 2017).doi:10.48550/arXiv.1710. 07563. URLhttp://arxiv.org/abs/1710.07563

work page doi:10.48550/arxiv.1710 2017
[20]

Graham, M

B. Graham, M. Engelcke, L. V. D. Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, in: 2018 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 9224–9232. doi:10.1109/CVPR.2018.00961. URLhttps://ieeexplore.ieee.org/ document/8579059/ 13

work page doi:10.1109/cvpr.2018.00961 2018
[21]

C. Choy, J. Gwak, S. Savarese, 4D Spatio- Temporal ConvNets: Minkowski Convolu- tional Neural Networks, in: 2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, Long Beach, CA, USA, 2019, pp. 3070–3079. doi:10.1109/CVPR.2019.00319. URLhttps://ieeexplore.ieee.org/ document/8953494/

work page doi:10.1109/cvpr.2019.00319 2019
[22]

Contributors, Spconv: Spatially Sparse Convolution Library (2022)

S. Contributors, Spconv: Spatially Sparse Convolution Library (2022). URLhttps://github.com/traveller59/ spconv

2022
[23]

X. Ding, X. Zhang, J. Han, G. Ding, Scal- ing Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11953–11965. doi:10.1109/CVPR52688.2022.01166. URLhttps://ieeexplore.ieee.org/ document/9880273/

work page doi:10.1109/cvpr52688.2022.01166 2022
[24]

Y. Chen, J. Liu, X. Zhang, X. Qi, J. Jia, LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs, in: 2023 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), IEEE, Vancou- ver, BC, Canada, 2023, pp. 13488–13498. doi:10.1109/CVPR52729.2023.01296. URLhttps://ieeexplore.ieee.org/ document/10203060/

work page doi:10.1109/cvpr52729.2023.01296 2023
[25]

T. Feng, W. Wang, F. Ma, Y. Yang, LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels, in: 2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 14916–14927. doi:10.1109/CVPR52733.2024.01413. URLhttps://ieeexplore.ieee.org/ document/10656196/

work page doi:10.1109/cvpr52733.2024.01413 2024
[26]

B. Peng, X. Wu, L. Jiang, Y. Chen, H. Zhao, Z. Tian, J. Jia, OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 21305– 21315.doi:10.1109/CVPR52733.2024.02013. URLhttps://ieeexplore.ieee.org/ document/10655421/

work page doi:10.1109/cvpr52733.2024.02013 2024
[27]

C. R. Qi, L. Yi, H. Su, L. J. Guibas, Point- Net++: Deep Hierarchical Feature Learn- ing on Point Sets in a Metric Space, arXiv:1706.02413 [cs] (Jun. 2017).doi:10. 48550/arXiv.1706.02413. URLhttp://arxiv.org/abs/1706.02413

work page Pith review arXiv 2017
[28]

Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, RandLA- Net: Efficient Semantic Segmentation of Large-Scale Point Clouds, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11105–11114. doi:10.1109/CVPR42600.2020.01112. URLhttps://ieeexplore.ieee.org/ document/9156466/

work page doi:10.1109/cvpr42600.2020.01112 2020
[29]

Landrieu, M

L. Landrieu, M. Simonovsky, Large-scale Point CloudSemanticSegmentationwithSuperpoint Graphs, arXiv:1711.09869 [cs] (Mar. 2018). doi:10.48550/arXiv.1711.09869. URLhttp://arxiv.org/abs/1711.09869

work page doi:10.48550/arxiv.1711.09869 2018
[30]

Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic Graph CNN for Learning on Point Clouds, ACM Transactions on Graphics 38 (5) (2019) 1–12. doi:10.1145/3326362. URLhttps://dl.acm.org/doi/10.1145/ 3326362

work page doi:10.1145/3326362 2019
[31]

H. Lei, N. Akhtar, A. Mian, SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2020, pp. 11608–11617. doi:10.1109/CVPR42600.2020.01163. URLhttps://ieeexplore.ieee.org/ document/9157177/

work page doi:10.1109/cvpr42600.2020.01163 2020
[32]

Tatarchenko, J

M. Tatarchenko, J. Park, V. Koltun, Q.-Y. Zhou, Tangent Convolutions for Dense Predic- tion in 3D, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, 2018, pp. 3887–3896.doi:10.1109/CVPR.2018.00409. URLhttps://ieeexplore.ieee.org/ document/8578507/

work page doi:10.1109/cvpr.2018.00409 2018
[33]

W. Wu, Z. Qi, L. Fuxin, PointConv: Deep Convolutional Networks on 3D Point Clouds, arXiv:1811.07246 [cs] (Nov. 2020).doi:10. 14 48550/arXiv.1811.07246. URLhttp://arxiv.org/abs/1811.07246

work page arXiv 2020
[34]

Thomas, C

H. Thomas, C. R. Qi, J.-E. Deschaud, B. Mar- cotegui, F. Goulette, L. J. Guibas, KPConv: Flexible and Deformable Convolution for Point Clouds, arXiv:1904.08889 [cs] (Aug. 2019). doi:10.48550/arXiv.1904.08889. URLhttp://arxiv.org/abs/1904.08889

work page doi:10.48550/arxiv.1904.08889 1904
[35]

X. Li, Z. Zhang, Y. Li, M. Huang, J. Zhang, SFL-NET: Slight Filter Learn- ing Network for Point Cloud Semantic Segmentation, IEEE Transactions on Geo- science and Remote Sensing 61 (2023) 1–14. doi:10.1109/TGRS.2023.3313876. URLhttps://ieeexplore.ieee.org/ document/10250869/

work page doi:10.1109/tgrs.2023.3313876 2023
[36]

H. Zhao, L. Jiang, J. Jia, P. Torr, V. Koltun, Point Transformer, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16239–16248. doi:10.1109/ICCV48922.2021.01595. URLhttps://ieeexplore.ieee.org/ document/9710703/

work page doi:10.1109/iccv48922.2021.01595 2021
[37]

Computa- tional Visual Media 7:187–199

M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, S.-M. Hu, PCT: Point cloud transformer, Computational Visual Media 7 (2) (2021) 187–199. doi:10.1007/s41095-021-0229-5. URLhttps://ieeexplore.ieee.org/ document/10897555/

work page doi:10.1007/s41095-021-0229-5 2021
[38]

X. Wu, Y. Lao, L. Jiang, X. Liu, H. Zhao, Point Transformer V2: Grouped Vector Attention and Partition-based Pooling, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Bel- grave, K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, Vol. 35, Curran Associates, Inc., 2022, pp. 33330–33342. URLhttps://proceedings.neurips. cc/paper_files/paper/20...

2022
[39]

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y. Qiao, W. Ouyang, T. He, H. Zhao, Point Transformer V3: Simpler, Faster, Stronger, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, WA, USA, 2024, pp. 4840– 4851.doi:10.1109/CVPR52733.2024.00463. URLhttps://ieeexplore.ieee.org/ document/10658198/

work page doi:10.1109/cvpr52733.2024.00463 2024
[40]

Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, B. Guo, Swin Transformer V2: Scal- ing Up Capacity and Resolution, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 11999–12009. doi:10.1109/CVPR52688.2022.01170. URLhttps://ieeexplore.ieee.org/ do...

work page doi:10.1109/cvpr52688.2022.01170 2022
[41]

Yang, Y.-X

Y.-Q. Yang, Y.-X. Guo, J.-Y. Xiong, Y. Liu, H. Pan, P.-S. Wang, X. Tong, B. Guo, Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding, Compu- tational Visual Media 11 (1) (2025) 83–101. doi:10.26599/CVM.2025.9450383. URLhttps://ieeexplore.ieee.org/ document/10901941/

work page doi:10.26599/cvm.2025.9450383 2025
[42]

Kellner, B

M. Kellner, B. Stahl, A. Reiterer, Fused Projection-Based Point Cloud Segmen- tation, Sensors 22 (3) (2022) 1139. doi:10.3390/s22031139. URLhttps://www.mdpi.com/1424-8220/ 22/3/1139

work page doi:10.3390/s22031139 2022
[43]

2022).doi:10.48550/arXiv.2206

Y.Hou, X.Zhu, Y.Ma, C.C.Loy, Y.Li, Point- to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation, arXiv:2206.02099 [cs] (Jun. 2022).doi:10.48550/arXiv.2206. 02099. URLhttp://arxiv.org/abs/2206.02099

work page doi:10.48550/arxiv.2206 2022
[44]

J. Xu, R. Zhang, J. Dou, Y. Zhu, J. Sun, S. Pu, RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for Li- DAR Point Cloud Segmentation, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Mon- treal, QC, Canada, 2021, pp. 16004–16013. doi:10.1109/ICCV48922.2021.01572. URLhttps://ieeexplore.ieee.org/ document/9709941/

work page doi:10.1109/iccv48922.2021.01572 2021
[45]

In: Proceedings of the IEEE/CVF ICCV, pp

M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, A. Joulin, Emerging Properties in Self-Supervised Vision Trans- formers, in: 2021 IEEE/CVF International 15 Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021, pp. 9630– 9640.doi:10.1109/ICCV48922.2021.00951. URLhttps://ieeexplore.ieee.org/ document/9709990/

work page doi:10.1109/iccv48922.2021.00951 2021
[46]

K. He, X. Chen, S. Xie, Y. Li, P. Dollar, R. Girshick, Masked Autoencoders Are Scal- able Vision Learners, in: 2022 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), IEEE, New Orleans, LA, USA, 2022, pp. 15979–15988. doi:10.1109/CVPR52688.2022.01553. URLhttps://ieeexplore.ieee.org/ document/9879206/

work page doi:10.1109/cvpr52688.2022.01553 2022
[47]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra, M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, P. Bojanowski, DI- NOv2: Learning Robust Visual Features with- out Supervisi...

work page internal anchor Pith review doi:10.48550/arxiv.2304.07193 2024
[48]

DINOv3

O. Siméoni, H. V. Vo, M. Seitzer, F. Bal- dassarre, M. Oquab, C. Jose, V. Khali- dov, M. Szafraniec, S. Yi, M. Ramamon- jisoa, F. Massa, D. Haziza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sen- tana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. Jé- gou, P. Labatut, P. Bojanowski, DINOv3, arXiv:2508.10104 [cs] (Aug. 202...

work page Pith review arXiv 2025
[49]

S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, O. Litany, PointContrast: Unsupervised Pre- training for 3D Point Cloud Understand- ing, in: A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.), Computer Vision – ECCV 2020, Springer International Publish- ing, Cham, 2020, pp. 574–591

2020
[50]

Y. Pang, W. Wang, F. E. H. Tay, W. Liu, Y. Tian, L. Yuan, Masked Autoencoders for Point Cloud Self-supervised Learning, in: S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, T. Hassner (Eds.), Computer Vision – ECCV 2022, Vol. 13662, Springer Nature Switzerland, Cham, 2022, pp. 604–621, series Title: Lecture Notes in Computer Science. doi:10.1007/978-3-03...

work page doi:10.1007/978-3-031-20086-1_35 2022
[51]

X. Wu, X. Wen, X. Liu, H. Zhao, Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning, in: 2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), IEEE, Vancouver, BC, Canada, 2023, pp. 9415–9424. doi:10.1109/CVPR52729.2023.00908. URLhttps://ieeexplore.ieee.org/ document/10203752/

work page doi:10.1109/cvpr52729.2023.00908 2023
[52]

X. Wu, D. DeTone, D. Frost, T. Shen, C. Xie, N. Yang, J. Engel, R. Newcombe, H. Zhao, J. Straub, Sonata: Self-Supervised Learning of Reliable Point Representations, arXiv:2503.16429 [cs] (Mar. 2025).doi:10. 48550/arXiv.2503.16429. URLhttp://arxiv.org/abs/2503.16429

work page arXiv 2025
[53]

arXiv preprint arXiv:2510.23607 (2025) 19 36 N

Y. Zhang, X. Wu, Y. Lao, C. Wang, Z. Tian, N. Wang, H. Zhao, Concerto: Joint 2D- 3D Self-Supervised Learning Emerges Spatial Representations, arXiv:2510.23607 [cs] (Oct. 2025).doi:10.48550/arXiv.2510.23607. URLhttp://arxiv.org/abs/2510.23607

work page doi:10.48550/arxiv.2510.23607 2025
[54]

H. Zhu, H. Yang, X. Wu, D. Huang, S. Zhang, X. He, H. Zhao, C. Shen, Y. Qiao, T. He, W. Ouyang, PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre- trainingParadigm, arXiv:2310.08586[cs](Apr. 2025).doi:10.48550/arXiv.2310.08586. URLhttp://arxiv.org/abs/2310.08586

work page doi:10.48550/arxiv.2310.08586 2025
[55]

Behley, M

J. Behley, M. Garbade, A. Milioto, J. Quen- zel, S. Behnke, C. Stachniss, J. Gall, SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences, arXiv:1904.01416 [cs] (Aug. 2019).doi:10. 48550/arXiv.1904.01416. URLhttp://arxiv.org/abs/1904.01416

work page arXiv 1904
[56]

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, S. Zhao, S. Cheng, Y. Zhang, J. Shlens, 16 Z. Chen, D. Anguelov, Scalability in Percep- tion for Autonomous Driving: Waymo Open Dataset, arXiv:1912.0483...

work page doi:10.48550/arxiv.1912.04838 1912
[57]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, nuScenes: A mul- timodal dataset for autonomous driving, arXiv:1903.11027 [cs] (May 2020).doi:10. 48550/arXiv.1903.11027. URLhttp://arxiv.org/abs/1903.11027

work page arXiv 1903
[58]

Roynard, J.-E

X. Roynard, J.-E. Deschaud, F. Goulette, Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification, The International Journal of Robotics Re- search 37 (6) (2018) 545–557.doi:10.1177/ 0278364918767506. URLhttps://journals.sagepub.com/doi/ 10.1177/0278364918767506

work page doi:10.1177/0278364918767506 2018
[59]

W. Tan, N. Qin, L. Ma, Y. Li, J. Du, G. Cai, K. Yang, J. Li, Toronto-3D: A Large-scale Mo- bile LiDAR Dataset for Semantic Segmenta- tion of Urban Roadways, in: 2020 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition Workshops (CVPRW), IEEE, Seattle, WA, USA, 2020, pp. 797–806. doi:10.1109/CVPRW50498.2020.00109. URLhttps://ieeexplore.ieee.or...

work page doi:10.1109/cvprw50498.2020.00109 2020
[61]

Thomas, F

H. Thomas, F. Goulette, J.-E. Deschaud, B. Marcotegui, Y. LeGall, Semantic Clas- sification of 3D Point Clouds with Multi- scale Spherical Neighborhoods, in: 2018 International Conference on 3D Vision (3DV), IEEE, Verona, 2018, pp. 390–398. doi:10.1109/3DV.2018.00052. URLhttps://ieeexplore.ieee.org/ document/8490990/

work page doi:10.1109/3dv.2018.00052 2018
[62]

Varney, V

N. Varney, V. K. Asari, Q. Graehling, Pyra- mid Point: A Multi-Level Focusing Network for Revisiting Feature Layers (2020).doi: 10.48550/ARXIV.2011.08692. URLhttps://arxiv.org/abs/2011.08692

work page doi:10.48550/arxiv.2011.08692 2020
[63]

S. Yoo, Y. Jeong, M. Jameela, G. Sohn, Human Vision Based 3D Point Cloud Se- mantic Segmentation of Large-Scale Outdoor Scenes, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, Vancou- ver, BC, Canada, 2023, pp. 6577–6586. doi:10.1109/CVPRW59228.2023.00699. URLhttps://ieeexplore.ieee.org/ document/10208664/

work page doi:10.1109/cvprw59228.2023.00699 2023
[64]

S. Yoo, Y. Jeong, M. M. Sheikholeslami, G. Sohn, EyeNet++: A Multiscale and Multidensity Approach for Outdoor 3-D Semantic Segmentation Inspired by the Hu- man Visual Field, IEEE Transactions on Geoscience and Remote Sensing 63 (2025) 1–19.doi:10.1109/TGRS.2025.3589287. URLhttps://ieeexplore.ieee.org/ document/11080501/

work page doi:10.1109/tgrs.2025.3589287 2025
[65]

Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)

S. Contributors, Spconv: Spatially sparse convolution library,https://github.com/ traveller59/spconv(2022)

2022
[66]

M. Fey, J. E. Lenssen, Fast Graph Repre- sentation Learning with PyTorch Geometric, arXiv:1903.02428 [cs] (Apr. 2019).doi:10. 48550/arXiv.1903.02428. URLhttp://arxiv.org/abs/1903.02428

work page internal anchor Pith review arXiv 1903
[67]

T. Dao, D. Y. Fu, S. Ermon, A. Rudra, C. Ré, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, version Number: 2 (2022).doi:10.48550/ARXIV. 2205.14135. URLhttps://arxiv.org/abs/2205.14135

work page internal anchor Pith review doi:10.48550/arxiv 2022
[68]

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

T. Dao, FlashAttention-2: Faster Attention with Better Parallelism and Work Partition- ing, version Number: 1 (2023).doi:10. 48550/ARXIV.2307.08691. URLhttps://arxiv.org/abs/2307.08691

work page internal anchor Pith review arXiv 2023
[69]

Ioffe, C

S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Re- ducing Internal Covariate Shift, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd In- ternational Conference on Machine Learning, 17 Vol. 37 of Proceedings of Machine Learning Research, PMLR, Lille, France, 2015, pp. 448–456. URLhttps://proceedings.mlr.press/ v37/ioffe15.html

2015
[70]

J. L. Ba, J. R. Kiros, G. E. Hinton, Layer Nor- malization, version Number: 1 (2016).doi: 10.48550/ARXIV.1607.06450. URLhttps://arxiv.org/abs/1607.06450

work page Pith review doi:10.48550/arxiv.1607.06450 2016
[71]

C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cour- napeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. Van Kerkwijk, M. Brett, A. Hal- dane, J. F. Del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T. E. Oliphant, Array programmin...

work page doi:10.1038/s41586-020-2649-2 2020
[72]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d....

2019
[73]

S. K. Lam, A. Pitrou, S. Seibert, Numba: a LLVM-based Python JIT compiler, in: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, ACM, Austin Texas, 2015, pp. 1– 6.doi:10.1145/2833157.2833162. URLhttps://dl.acm.org/doi/10.1145/ 2833157.2833162

work page doi:10.1145/2833157.2833162 2015
[74]

Decoupled Weight Decay Regularization

I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, version Number: 3 (2017).doi:10.48550/ARXIV.1711.05101. URLhttps://arxiv.org/abs/1711.05101

work page Pith review doi:10.48550/arxiv.1711.05101 2017
[75]

L. N. Smith, N. Topin, Super-Convergence: Very Fast Training of Neural Networks Us- ing Large Learning Rates, version Number: 3 (2017).doi:10.48550/ARXIV.1708.07120. URLhttps://arxiv.org/abs/1708.07120

work page Pith review doi:10.48550/arxiv.1708.07120 2017
[76]

Rethinking the Inception Architecture for Computer Vision

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the Inception Architec- ture for Computer Vision, version Number: 3 (2015).doi:10.48550/ARXIV.1512.00567. URLhttps://arxiv.org/abs/1512.00567

work page Pith review doi:10.48550/arxiv.1512.00567 2015
[77]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020).doi:10.48550/ ARXIV.2010.11929. URLhttps://arxiv.org/abs/2010.11929

work page Pith review arXiv 2020
[78]

X. Jiao, C. Lv, J. Zhao, R. Yi, Y.-H. Wen, Z. Pan, Z. Wu, Y.-J. Liu, Weighted Poisson- disk Resampling on Large-Scale Point Clouds, Proceedings of the AAAI Conference on Artificial Intelligence 39 (4) (2025) 4084–4092. doi:10.1609/aaai.v39i4.32428. URLhttps://ojs.aaai.org/index.php/ AAAI/article/view/32428

work page doi:10.1609/aaai.v39i4.32428 2025
[79]

Kellner, A

M. Kellner, A. Schmitt, A. Reiterer, Automatic Generation of 3D Bridge Models from 3D Point Clouds, Re- sults in Engineering (2026) 109532doi: 10.1016/j.rineng.2026.109532. URLhttps://linkinghub.elsevier.com/ retrieve/pii/S2590123026005724 18

work page doi:10.1016/j.rineng.2026.109532 2026