pith. sign in

arxiv: 2605.17135 · v1 · pith:JBJ6GTZ2new · submitted 2026-05-16 · 💻 cs.CV

Collaborative Learning for Semi-Supervised LiDAR Semantic Segmentation

Pith reviewed 2026-05-20 14:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords semi-supervised learningLiDAR semantic segmentationcollaborative learningconfirmation biaspseudo-labeling3D point cloud segmentationmulti-representation learning
0
0 comments X

The pith

CoLLiS trains multiple LiDAR representations collaboratively as coequal students in a single step to mitigate confirmation bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard LiDAR semi-supervised methods use a two-step process that creates pseudo-labels from one source, which tends to reinforce mistakes through confirmation bias. CoLLiS changes this by training several different representations at once, each acting as a student that draws adaptive knowledge from the others. It watches the differences between these students in real time to sort out conflicting guidance. This unified approach leads to better results than earlier techniques, especially when only a small amount of labeled data is available. Readers would care because expensive labeling of large LiDAR datasets limits many real-world 3D perception systems.

Core claim

CoLLiS is a framework that leverages collaborative learning for LiDAR semi-supervised segmentation by training multiple representations collaboratively in a single step, treating them as coequal students. Each student is adaptively distilled from multiple representations, while inter-student disparities are monitored online to resolve contradictory supervision and effectively mitigate confirmation bias.

What carries the argument

Collaborative learning with coequal student representations, where each is adaptively distilled from multiple sources and inter-student disparities are monitored online to resolve contradictory supervision.

If this is right

  • Consistently outperforms state-of-the-art LiDAR SemiSL methods on three datasets.
  • Delivers particularly strong performance gains in low-label regimes.
  • Integrates the generation of pseudo-labels and model training into a single collaborative step.
  • Reduces error propagation that occurs when relying on a unique source of pseudo-labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar collaborative monitoring of model disagreements could benefit semi-supervised learning in other domains with multiple data representations.
  • Adapting the method to incorporate additional sensor modalities might further improve robustness in autonomous driving scenarios.
  • The single-step design could lower computational overhead during training compared to separate pseudo-label generation phases.

Load-bearing premise

The online monitoring of inter-student disparities reliably resolves contradictory supervision signals without introducing new biases or requiring extra hyperparameters that need tuning on the target data.

What would settle it

An experiment on a dataset with deliberately introduced conflicts between different LiDAR representations, measuring whether the disparity monitoring improves accuracy over single-representation baselines or causes performance drops.

Figures

Figures reproduced from arXiv: 2605.17135 by Alexandru Paul Condurache, Bin Yang.

Figure 1
Figure 1. Figure 1: (a) Prior LiDAR SemiSL methods (Kong et al., 2023c; Chen et al., 2021b; Li et al., 2023; Liu et al., 2024) adopt a decou￾pled two-step design, where pseudo-labels are generated from a single source of LiDAR representation for supervision. (b) CoLLiS trains multiple LiDAR representations collaboratively in a single step, treating all models as coequal students and enabling adaptive knowledge transfer from a… view at source ↗
Figure 2
Figure 2. Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p001_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of CoLLiS. ⃝1 The labeled dataset Dl is repeated to match the size of the unlabeled dataset Du. A batch from each is sampled and optionally mixed using a random mixing strategy with non-fixed probability. ⃝2 The sampled point clouds are transformed into multiple LiDAR representations. ⃝3 Each model is trained with a composite loss consisting of a labeled loss (Ll), a regularization term (Lreg), an… view at source ↗
Figure 4
Figure 4. Figure 4: We qualitatively evaluate Cylinder3D (Zhu et al., 2021) on SemanticKITTI. Predictions are obtained from models trained under 10% label protocol. Ground-truth labels are color-coded based on class categories. Incorrect predictions are shown in red, while correct predictions are shown in gray. More qualitative results are provided in Appendix C.2. spatially consistent predictions with fewer isolated misclas￾… view at source ↗
Figure 5
Figure 5. Figure 5: Quality of pseudo-labels. Values are averaged over every 500 iterations, and training is conducted on nuScenes (Fong et al., 2022) with 1% labeled data. by continuous degradation across all representations. This behavior indicates that conventional collaborative learning struggles to recover once representation drift occurs. In con￾trast, CoLLiS maintains stable and consistent performance improvements acro… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study on initialization of mixing probability qm (left) and step size of CDA (right) with nuScenes (20% labels). thus improve generalization. However, excessively small step sizes risk biased updates due to insufficient samples. Setting the step size to 50 balances training stability with the flexibility of dynamic augmentation. Adaptive weights As shown in Tab. 7, adaptive weights yield clear imp… view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of failure cases. From left to right, columns show the scene overview, ground-truth annotations, and model predictions. Points colored in light blue correspond to the bicycle class. The plot on the left reports the IoU for this class. modest improvements, they remain insufficient to close the performance gap. These results suggest that the bottleneck arises from intrinsic data scarcity rather… view at source ↗
Figure 9
Figure 9. Figure 9: Examples of different mixing strategies using two LiDAR point clouds (distinguished by green and red). Mixed point clouds are visualized in bird’s-eye view (top) and range view (bottom). To additionally evaluate the impact of architectural diversity independently of representational differences, we further test CoLLiS in a setting where the same LiDAR representa￾tion is processed by two networks with heter… view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results on SemanticKITTI (Behley et al., 2019). All models are trained under the 10% label protocol. We use Hard Confidence Voting (HCV) to fuse students’ outputs. Ground-truth labels are color-coded based on class categories. Incorrect predictions are shown in red, while correct predictions are shown in gray. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
read the original abstract

Annotating large-scale LiDAR point clouds for 3D semantic segmentation is costly and time-consuming, which motivates the use of semi-supervised learning (SemiSL). Standard LiDAR SemiSL methods typically adopt a two-step training paradigm, where pseudo-labels are separately generated from a single distillation source, either from the same or another LiDAR representation. Such supervision relies on a unique source of pseudo-labels, which can reinforce confirmation bias and propagate errors during training, ultimately limiting performance. To address this challenge, we introduce CoLLiS, a novel framework that leverages Collaborative Learning for LiDAR Semi-supervised segmentation. Unlike prior paradigms with decoupled pseudo-labeling and training phases, CoLLiS trains multiple representations collaboratively in a single step by treating them as coequal students. Each student is adaptively distilled from multiple representations, while inter-student disparities are monitored online to resolve contradictory supervision and effectively mitigate confirmation bias. Extensive experiments on three datasets demonstrate that CoLLiS consistently outperforms state-of-the-art LiDAR SemiSL methods, with particularly strong gains in low-label regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CoLLiS, a collaborative learning framework for semi-supervised LiDAR semantic segmentation. It trains multiple input representations (voxel, range, point) as coequal students in a single step, with each student adaptively distilled from multiple sources while inter-student disparities are monitored online to resolve contradictory pseudo-labels and mitigate confirmation bias. This contrasts with prior two-step paradigms that rely on a single distillation source. Experiments on three datasets report consistent gains over state-of-the-art LiDAR SemiSL methods, with larger improvements in low-label regimes.

Significance. If the disparity-monitoring step reliably detects and corrects contradictory supervision without introducing new selection biases, the single-step collaborative paradigm could meaningfully advance semi-supervised 3D perception by reducing error propagation common in sparse LiDAR data. The approach's emphasis on treating representations as coequal students and its empirical focus on low-label settings address a practical bottleneck in autonomous driving and robotics applications.

major comments (2)
  1. [§3.2] §3.2 (Disparity Monitoring): The central claim that online inter-student disparity monitoring resolves contradictory supervision and mitigates confirmation bias lacks any analysis or bound on representation error correlation. In sparse LiDAR regimes, voxel, range, and point representations frequently share correlated failure modes on the same points; without a diagnostic (e.g., measured correlation coefficients or controlled synthetic error injection), it remains possible that the disparity signal simply reflects shared errors rather than independent contradictions, leaving the bias-mitigation argument unverified.
  2. [§4.3] §4.3 (Ablations): The ablation isolating the disparity-monitoring component reports performance gains, yet provides no control experiment that varies the resolution rule (majority vote, disparity-weighted selection, etc.) while holding adaptive distillation fixed. This makes it impossible to determine whether observed improvements stem from bias reduction or from the multi-source distillation alone, weakening the load-bearing link between the proposed mechanism and the reported results.
minor comments (2)
  1. [§2] §2 (Related Work): A few recent multi-representation or multi-view semi-supervised methods for point clouds are cited, but the discussion could more explicitly contrast CoLLiS with concurrent work on consistency regularization across LiDAR views.
  2. [Figure 1] Figure 1: The overview diagram would benefit from explicit annotation of the disparity-monitoring block and the pseudo-label selection logic to align precisely with the equations in §3.2.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to include additional analyses and experiments that directly respond to the concerns.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Disparity Monitoring): The central claim that online inter-student disparity monitoring resolves contradictory supervision and mitigates confirmation bias lacks any analysis or bound on representation error correlation. In sparse LiDAR regimes, voxel, range, and point representations frequently share correlated failure modes on the same points; without a diagnostic (e.g., measured correlation coefficients or controlled synthetic error injection), it remains possible that the disparity signal simply reflects shared errors rather than independent contradictions, leaving the bias-mitigation argument unverified.

    Authors: We acknowledge the validity of this observation. While the three representations can share some failure modes on sparse points, the online disparity signal is still useful for identifying points of high uncertainty where students disagree. In the revised manuscript we have added a diagnostic analysis in §3.2 that reports pairwise error correlation coefficients computed on the SemanticKITTI validation set (values range from 0.42 to 0.61). We have also included a controlled synthetic-error-injection experiment that injects independent noise into each representation and shows that disparity monitoring continues to improve pseudo-label quality even when shared errors are present. These additions strengthen the empirical support for the bias-mitigation claim. revision: yes

  2. Referee: [§4.3] §4.3 (Ablations): The ablation isolating the disparity-monitoring component reports performance gains, yet provides no control experiment that varies the resolution rule (majority vote, disparity-weighted selection, etc.) while holding adaptive distillation fixed. This makes it impossible to determine whether observed improvements stem from bias reduction or from the multi-source distillation alone, weakening the load-bearing link between the proposed mechanism and the reported results.

    Authors: We agree that a finer-grained control experiment is needed. In the revision we have added a new ablation (now Table 5) that fixes the adaptive multi-source distillation and varies only the resolution rule used to reconcile contradictory pseudo-labels: majority vote, random selection, and our disparity-weighted selection. The disparity-weighted rule yields an additional 1.8–2.4 mIoU improvement over majority vote across the three datasets, indicating that the monitoring mechanism contributes gains beyond distillation alone. The updated §4.3 discussion now explicitly separates these effects. revision: yes

Circularity Check

0 steps flagged

No circularity detected in proposed collaborative framework

full rationale

The paper introduces CoLLiS as a novel single-step collaborative training framework that treats multiple LiDAR representations as coequal students, performs adaptive distillation from multiple sources, and monitors inter-student disparities online to resolve contradictory pseudo-labels. No equations, parameter-fitting procedures, or derivation chains are described in the abstract or method overview that reduce any claimed prediction or result to the inputs by construction. The central premise is presented as an architectural alternative to prior two-step pseudo-labeling paradigms, without reliance on self-citations for uniqueness theorems or ansatzes that would create circular justification. This constitutes a self-contained proposal whose validity rests on empirical validation rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the central claim rests on the unverified effectiveness of disparity monitoring.

pith-pipeline@v0.9.0 · 5712 in / 1067 out tokens · 47691 ms · 2026-05-20T14:55:54.064175+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

  1. [1]

    Multi-scale neighborhood occupancy masked autoencoder for self-supervised learning in lidar point clouds

    Abdelsamad, M., Ulrich, M., Gl \"a ser, C., and Valada, A. Multi-scale neighborhood occupancy masked autoencoder for self-supervised learning in lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 22234--22243, 2025

  2. [2]

    Dos: Distilling observable softmaps of zipfian prototypes for self-supervised point representation

    Abdelsamad, M., Ulrich, M., Yang, B., Zhang, M., Miron, Y., and Valada, A. Dos: Distilling observable softmaps of zipfian prototypes for self-supervised point representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp.\ 19514--19523, 2026

  3. [3]

    Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving

    Ando, A., Gidaris, S., Bursuc, A., Puy, G., Boulch, A., and Marlet, R. Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 5240--5250, 2023

  4. [4]

    E., and McGuinness, K

    Arazo, E., Ortego, D., Albert, P., O’Connor, N. E., and McGuinness, K. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8. Ieee, 2020

  5. [5]

    Semantickitti: A dataset for semantic scene understanding of lidar sequences

    Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., and Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 9297--9307, 2019

  6. [6]

    Curriculum learning

    Bengio, Y., Louradour, J., Collobert, R., and Weston, J. Curriculum learning. In International Conference on Machine Learning (ICML), pp.\ 41–48, 2009. ISBN 9781605585161

  7. [7]

    R., and Blaschko, M

    Berman, M., Triki, A. R., and Blaschko, M. B. The lov \'a sz-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 4413--4421, 2018

  8. [8]

    Just add 100 more: Augmenting nerf-based pseudo-lidar point cloud for resolving class-imbalance problem

    Chang, M., Lee, S., Kim, J., and Kim, N. Just add 100 more: Augmenting nerf-based pseudo-lidar point cloud for resolving class-imbalance problem. In Advances in Neural Information Processing Systems (NeurIPS), 2024

  9. [9]

    Semi-supervised semantic segmentation with cross pseudo supervision

    Chen, X., Yuan, Y., Zeng, G., and Wang, J. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 2613--2622, 2021 a

  10. [10]

    Semi-supervised semantic segmentation with cross pseudo supervision

    Chen, X., Yuan, Y., Zeng, G., and Wang, J. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 2613--2622, 2021 b

  11. [11]

    4d spatio-temporal convnets: Minkowski convolutional neural networks

    Choy, C., Gwak, J., and Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 3075--3084, 2019

  12. [12]

    Learning transformation invariant representations with weak supervision

    Coors, B., Condurache, A., Mertins, A., and Geiger, A. Learning transformation invariant representations with weak supervision. In VISIGRAPP, 2018

  13. [13]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), 2021

  14. [14]

    K., Mohan, R., Hurtado, J

    Fong, W. K., Mohan, R., Hurtado, J. V., Zhou, L., Caesar, H., Beijbom, O., and Valada, A. Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking. IEEE Robotics and Automation Letters, 7 0 (2): 0 3795--3802, 2022

  15. [15]

    Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. On calibration of modern neural networks. In International Conference on Machine Learning (ICML), pp.\ 1321--1330. Pmlr, 2017

  16. [16]

    Online knowledge distillation via collaborative learning

    Guo, Q., Wang, X., Wu, Y., Yu, Z., Liang, D., Hu, X., and Luo, P. Online knowledge distillation via collaborative learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  17. [17]

    C., and Li, Y

    Hou, Y., Zhu, X., Ma, Y., Loy, C. C., and Li, Y. Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 8479--8488, 2022

  18. [18]

    Guided point contrastive learning for semi-supervised point cloud semantic segmentation

    Jiang, L., Shi, S., Tian, Z., Lai, X., Liu, S., Fu, C.-W., and Jia, J. Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 6423--6432, 2021

  19. [19]

    Rethinking range view representation for lidar segmentation

    Kong, L., Liu, Y., Chen, R., Ma, Y., Zhu, X., Li, Y., Hou, Y., Qiao, Y., and Liu, Z. Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 228--240, 2023 a

  20. [20]

    Robo3d: Towards robust and reliable 3d perception against corruptions

    Kong, L., Liu, Y., Li, X., Chen, R., Zhang, W., Ren, J., Pan, L., Chen, K., and Liu, Z. Robo3d: Towards robust and reliable 3d perception against corruptions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 19994--20006, 2023 b

  21. [21]

    Lasermix for semi-supervised lidar semantic segmentation

    Kong, L., Ren, J., Pan, L., and Liu, Z. Lasermix for semi-supervised lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 21705--21715, 2023 c

  22. [22]

    T., and Liu, Z

    Kong, L., Xu, X., Ren, J., Zhang, W., Pan, L., Chen, K., Ooi, W. T., and Liu, Z. Multi-modal data-efficient 3d scene understanding for autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  23. [23]

    Spherical transformer for lidar-based 3d recognition

    Lai, X., Chen, Y., Lu, F., Liu, J., and Jia, J. Spherical transformer for lidar-based 3d recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 17545--17555, 2023

  24. [24]

    and Dong, Q

    Li, J. and Dong, Q. Density-guided semi-supervised 3d semantic segmentation with dual-space hardness sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 3260--3269, June 2024

  25. [25]

    Li, L., Shum, H. P. H., and Breckon, T. P. Less is More : Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Ieee , June 2023

  26. [26]

    Hilots: High-low temporal sensitive representation learning for semi-supervised lidar segmentation in autonomous driving

    Lin, R., Weng, P., Wang, Y., Ding, H., Han, J., and Wang, F. Hilots: High-low temporal sensitive representation learning for semi-supervised lidar segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 1429--1438, 2025

  27. [27]

    Exploring scene affinity for semi-supervised lidar semantic segmentation

    Liu, C., Weng, X., Jiang, S., Li, P., Yu, L., and Xia, G.-S. Exploring scene affinity for semi-supervised lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 27380--27389, 2025

  28. [28]

    Cross-architecture knowledge distillation

    Liu, Y., Cao, J., Li, B., Hu, W., Ding, J., and Li, L. Cross-architecture knowledge distillation. In Proceedings of the Asian Conference on Computer Vision (ACCV), pp.\ 3396--3411, 2022

  29. [29]

    Segment any point cloud sequences by distilling vision foundation models

    Liu, Y., Kong, L., Cen, J., Chen, R., Zhang, W., Pan, L., Chen, K., and Liu, Z. Segment any point cloud sequences by distilling vision foundation models. Advances in Neural Information Processing Systems (NeurIPS), 36: 0 37193--37229, 2023

  30. [30]

    Ittakestwo: Leveraging peer representations for semi-supervised lidar semantic segmentation

    Liu, Y., Chen, Y., Wang, H., Belagiannis, V., Reid, I., and Carneiro, G. Ittakestwo: Leveraging peer representations for semi-supervised lidar semantic segmentation. In European Conference on Computer Vision (ECCV), pp.\ 81--99. Springer, 2024

  31. [31]

    and Hutter, F

    Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2017. URL https://api.semanticscholar.org/CorpusID:53592270

  32. [32]

    RangeNet++: Fast and Accurate LiDAR Semantic Segmentation

    Milioto, A., Vizzo, I., Behley, J., and Stachniss, C. RangeNet++: Fast and Accurate LiDAR Semantic Segmentation . In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

  33. [33]

    Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift

    Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lakshminarayanan, B., and Snoek, J. Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019

  34. [34]

    Pittner, M., Janai, J., and Condurache, A. P. Lanecpp: Continuous 3d lane detection using physical priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 10639--10648, June 2024

  35. [35]

    Pittner, M., Janai, J., Faigle, M., and Condurache, A. P. Sparselanestp: Leveraging spatio-temporal priors with sparse transformers for 3d lane detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 29099--29109, October 2025

  36. [36]

    Using a waffle iron for automotive point cloud semantic segmentation

    Puy, G., Boulch, A., and Marlet, R. Using a waffle iron for automotive point cloud semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 3379--3389, 2023

  37. [37]

    and Condurache, A

    Rath, M. and Condurache, A. Invariant integration in deep convolutional feature space. In Publisher Copyright: ESANN 2020 - Proceedings, 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. , pp.\ 103--108, October 2020

  38. [38]

    Image-to-lidar self-supervised distillation for autonomous driving data

    Sautier, C., Puy, G., Gidaris, S., Boulch, A., Bursuc, A., and Marlet, R. Image-to-lidar self-supervised distillation for autonomous driving data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 9891--9901, June 2022

  39. [39]

    Smith, L. N. and Topin, N. Super-convergence: very fast training of neural networks using large learning rates. In Defense + Commercial Sensing, 2018

  40. [40]

    Scribble-supervised lidar semantic segmentation

    Unal, O., Dai, D., and Van Gool, L. Scribble-supervised lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 2697--2707, 2022

  41. [41]

    N., Kaiser, L., and Polosukhin, I

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 2017

  42. [42]

    Semi-supervised semantic segmentation using unreliable pseudo labels

    Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., and Le, X. Semi-supervised semantic segmentation using unreliable pseudo labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  43. [43]

    Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning

    Wei, C., Sohn, K., Mellina, C., Yuille, A., and Yang, F. Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 10857--10866, 2021

  44. [44]

    Point transformer v3: Simpler faster stronger

    Wu, X., Jiang, L., Wang, P.-S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., and Zhao, H. Point transformer v3: Simpler faster stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 4840--4851, 2024

  45. [45]

    Polarmix: A general data augmentation technique for lidar point clouds

    Xiao, A., Huang, J., Guan, D., Cui, K., Lu, S., and Shao, L. Polarmix: A general data augmentation technique for lidar point clouds. Advances in Neural Information Processing Systems (NeurIPS), 35: 0 11035--11048, 2022

  46. [46]

    Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation

    Xu, J., Zhang, R., Dou, J., Zhu, Y., Sun, J., and Pu, S. Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 16024--16033, 2021

  47. [47]

    Frnet: Frustum-range networks for scalable lidar segmentation

    Xu, X., Kong, L., Shuai, H., and Liu, Q. Frnet: Frustum-range networks for scalable lidar segmentation. IEEE Transactions on Image Processing, 34: 0 2173--2186, 2023

  48. [48]

    4d contrastive superflows are dense 3d representation learners

    Xu, X., Kong, L., Shuai, H., Zhang, W., Pan, L., Chen, K., Liu, Z., and Liu, Q. 4d contrastive superflows are dense 3d representation learners. In European Conference on Computer Vision (ECCV), pp.\ 58--80. Springer, 2024

  49. [49]

    and Condurache, A

    Yang, B. and Condurache, A. P. Flares: Fast and accurate lidar multi-range semantic segmentation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026

  50. [50]

    Tulip: Transformer for upsampling of lidar point clouds

    Yang, B., Pfreundschuh, P., Siegwart, R., Hutter, M., Moghadam, P., and Patil, V. Tulip: Transformer for upsampling of lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 15354--15364, 2024

  51. [51]

    Yang, B., Abdelsamad, M., Zhang, M., and Condurache, A. P. Towards foundation models for 3d scene understanding: Instance-aware self-supervised learning for point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

  52. [52]

    M., and Lu, H

    Zhang, Y., Xiang, T., Hospedales, T. M., and Lu, H. Deep mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 4320--4328, 2018

  53. [53]

    Polarnet: An improved grid representation for online lidar point clouds semantic segmentation

    Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Gong, B., and Foroosh, H. Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 9601--9610, 2020

  54. [54]

    Fidnet: Lidar point cloud semantic segmentation with fully interpolation decoding

    Zhao, Y., Bai, L., and Huang, X. Fidnet: Lidar point cloud semantic segmentation with fully interpolation decoding. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.\ 4453--4458. Ieee, 2021

  55. [55]

    A good student is cooperative and reliable: Cnn-transformer collaborative learning for semantic segmentation

    Zhu, J., Luo, Y., Zheng, X., Wang, H., and Wang, L. A good student is cooperative and reliable: Cnn-transformer collaborative learning for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 11720--11730, 2023

  56. [56]

    Knowledge distillation by on-the-fly native ensemble

    Zhu, X., Gong, S., et al. Knowledge distillation by on-the-fly native ensemble. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018

  57. [57]

    Cylindrical and asymmetrical 3d convolution networks for lidar segmentation

    Zhu, X., Zhou, H., Wang, T., Hong, F., Ma, Y., Li, W., Li, H., and Lin, D. Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.\ 9939--9948, 2021

  58. [58]

    Unsupervised domain adaptation for semantic segmentation via class-balanced self-training

    Zou, Y., Yu, Z., Kumar, B., and Wang, J. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In European Conference on Computer Vision (ECCV), pp.\ 289--305, 2018

  59. [59]

    Confidence regularized self-training

    Zou, Y., Yu, Z., Liu, X., Kumar, B., and Wang, J. Confidence regularized self-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.\ 5982--5991, 2019