pith. machine review for the scientific record. sign in

arxiv: 2601.14477 · v2 · submitted 2026-01-20 · 💻 cs.CV · cs.AI· eess.IV

Recognition: 3 theorem links

· Lean Theorem

XD-MAP: Cross-Modal Domain Adaptation via Semantic Parametric Maps for Scalable Training Data Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:12 UTC · model grok-4.3

classification 💻 cs.CV cs.AIeess.IV
keywords domain adaptationsemantic segmentationLiDARparametric mapscross-modalpseudo labelingpanoptic segmentationroad scenes
0
0 comments X

The pith

XD-MAP builds semantic parametric maps from camera detections to create reliable pseudo labels for LiDAR segmentation without manual annotation or sensor overlap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes XD-MAP to transfer knowledge from image datasets to LiDAR by constructing semantic parametric maps that model scene elements detected in camera views. These maps generate pseudo labels directly in the LiDAR domain, removing the need for manual labeling or any direct overlap between sensors. The approach also extends perception from a limited front-view camera to a complete 360-degree field. On a large-scale road feature dataset, the resulting labels produce measurable gains in both 2D and 3D segmentation tasks over single-shot baselines.

Core claim

XD-MAP transfers sensor-specific knowledge from an image dataset to LiDAR by leveraging detections on camera images to create a semantic parametric map. The map elements are modeled to produce pseudo labels in the target domain without any manual annotation effort. Unlike previous domain transfer approaches, the method does not require direct overlap between sensors and enables extending the angular perception range from a front-view camera to a full 360-degree view.

What carries the argument

The semantic parametric map, which models scene elements from camera detections to generate pseudo labels for the LiDAR domain.

If this is right

  • LiDAR segmentation models can be trained using only image detections without additional labeling or sensor alignment.
  • Perception range extends from front-view cameras to full 360-degree coverage in the target domain.
  • 2D semantic segmentation gains +19.5 mIoU and panoptic segmentation gains +19.5 PQth over single-shot baselines.
  • 3D semantic segmentation gains +32.3 mIoU on the same road feature dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same map-based transfer could be tested between other mismatched sensor pairs such as radar and images.
  • The method might allow rapid generation of labeled data for new geographic regions or rare driving scenarios without fresh annotation campaigns.
  • Combining the parametric map with existing foundation models could further reduce dependence on task-specific labeled datasets.

Load-bearing premise

The semantic parametric map accurately models scene elements from camera detections to produce reliable pseudo labels in the LiDAR domain without any manual annotation or direct sensor overlap.

What would settle it

Running the reported experiments on the large-scale road feature dataset and finding that XD-MAP does not exceed the single-shot baseline by +19.5 mIoU in 2D semantic segmentation or +32.3 mIoU in 3D semantic segmentation would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2601.14477 by Christoph Stiller, Fabian Immel, Frank Bieder, Haohao Hu, Hendrik K\"onigshof, Jan-Hendrik Pauls, Yinzhe Shen.

Figure 1
Figure 1. Figure 1: Overview of the proposed XD-MAP: We generate pseudo labels from RGB camera images from a single camera using a neural [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Exemplary results of the semantic parametric mapping. Depicted are parametric primitives of three semantic classes (poles, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of measurement artifacts affecting our [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spatial distribution of the sequences in Karlsruhe, Ger [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results of 2D perception. w.r.t. the reference pose. Assuming static objects, this shifts all lidar points as if they were measured from the reference pose. Since our labels are not pixels in an individual im￾age, but generated from mapped geometric primitives in 3D space, motion compensation impacts the congruence of the labels with the range image. We examine its influence com￾paring training… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results of 3D segmentation. Comparison of pseudo labels and predictions of XD-Map and XD-B2. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Boxplot of instance height and width in the field of view [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Until open-world foundation models match the performance of specialized approaches, deep learning systems remain dependent on task- and sensor-specific data availability. To bridge the gap between available datasets and deployment domains, domain adaptation strategies are widely used. In this work, we propose XD-MAP, a novel approach to transfer sensor-specific knowledge from an image dataset to LiDAR, an entirely different sensing domain. Our method leverages detections on camera images to create a semantic parametric map. The map elements are modeled to produce pseudo labels in the target domain without any manual annotation effort. Unlike previous domain transfer approaches, our method does not require direct overlap between sensors and enables extending the angular perception range from a front-view camera to a full 360{\deg} view. On our large-scale road feature dataset, XD-MAP outperforms single shot baseline approaches by +19.5 mIoU for 2D semantic segmentation, +19.5 PQth for 2D panoptic segmentation, and +32.3 mIoU in 3D semantic segmentation. The results demonstrate the effectiveness of our approach achieving strong performance on LiDAR data without any manual labeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes XD-MAP, a cross-modal domain adaptation technique that builds a semantic parametric map from 2D camera detections to synthesize pseudo-labels for LiDAR point clouds. The map is intended to enable training of 2D semantic segmentation, 2D panoptic segmentation, and 3D semantic segmentation models in the target LiDAR domain without manual annotation or direct sensor overlap, extending perception from front-view camera to full 360°. On a large-scale road feature dataset the method reports gains of +19.5 mIoU (2D semantic), +19.5 PQth (2D panoptic), and +32.3 mIoU (3D semantic) over single-shot baselines.

Significance. If the parametric map reliably converts camera detections into accurate full-surround LiDAR labels, the approach would offer a scalable route to labeled LiDAR training data from existing camera corpora, lowering annotation costs in autonomous-driving and robotics applications. The magnitude of the reported margins is noteworthy and, if reproducible, would constitute a practical contribution to cross-modal data generation.

major comments (3)
  1. [§3] §3 (Method): the construction of the semantic parametric map is described only at a high level; no equations or algorithmic steps are given for (i) lifting 2D detections to 3D, (ii) fitting category-specific shape parameters, or (iii) handling objects outside the camera frustum. Without these details the central claim that the map produces reliable pseudo-labels cannot be evaluated.
  2. [§4] §4 (Experiments): no quantitative validation or error analysis of the generated pseudo-labels is presented (e.g., comparison against any held-out LiDAR ground truth, precision-recall of projected labels, or sensitivity to detection noise). Consequently the +19.5 mIoU and +32.3 mIoU gains cannot be attributed unambiguously to successful domain transfer rather than to weak baselines or dataset peculiarities.
  3. [§4.2] §4.2 (Baselines): the single-shot baselines are not specified in sufficient detail (architecture, training protocol, data augmentation). If they are substantially weaker than current camera-to-LiDAR transfer methods, the reported margins do not demonstrate the superiority of the parametric-map approach.
minor comments (2)
  1. [§1, §4.1] The abstract and §1 refer to “our large-scale road feature dataset” without providing size, sensor configuration, or a citation; this information should appear in §4.1.
  2. [§3] Notation for the parametric map elements (e.g., what variables represent extent, orientation, or semantic class) is introduced inconsistently between text and any accompanying figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each point below and will revise the manuscript to incorporate additional details and analysis where feasible.

read point-by-point responses
  1. Referee: [§3] §3 (Method): the construction of the semantic parametric map is described only at a high level; no equations or algorithmic steps are given for (i) lifting 2D detections to 3D, (ii) fitting category-specific shape parameters, or (iii) handling objects outside the camera frustum. Without these details the central claim that the map produces reliable pseudo-labels cannot be evaluated.

    Authors: We agree that more technical detail is needed. In the revised manuscript we will expand §3 with explicit equations and steps: (i) lifting via monocular depth estimation and camera-to-LiDAR projection, (ii) category-specific parametric fitting formulated as constrained optimization using class priors, and (iii) frustum handling by accumulating the parametric map over time to cover the full 360° surround. These additions will make the pseudo-label generation process fully evaluable. revision: yes

  2. Referee: [§4] §4 (Experiments): no quantitative validation or error analysis of the generated pseudo-labels is presented (e.g., comparison against any held-out LiDAR ground truth, precision-recall of projected labels, or sensitivity to detection noise). Consequently the +19.5 mIoU and +32.3 mIoU gains cannot be attributed unambiguously to successful domain transfer rather than to weak baselines or dataset peculiarities.

    Authors: We acknowledge that direct pseudo-label validation would strengthen attribution of the gains. While downstream task performance serves as the primary evidence, the revision will add a quantitative error analysis subsection, including precision-recall on a held-out subset with available LiDAR annotations and sensitivity tests to detection noise. This should help isolate the contribution of the parametric map. revision: partial

  3. Referee: [§4.2] §4.2 (Baselines): the single-shot baselines are not specified in sufficient detail (architecture, training protocol, data augmentation). If they are substantially weaker than current camera-to-LiDAR transfer methods, the reported margins do not demonstrate the superiority of the parametric-map approach.

    Authors: We apologize for the omitted details. The revised §4.2 will fully specify the baselines, including architectures (e.g., RangeNet++ for 3D, DeepLabv3 for 2D), training protocols (optimizer, learning-rate schedule, epoch count), and all augmentations. This transparency will confirm that the reported margins reflect the strength of XD-MAP rather than weak baselines. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The XD-MAP method constructs semantic parametric maps from external camera detections and applies geometric/semantic modeling to generate LiDAR pseudo-labels. This process does not reduce any claimed prediction or performance gain to a fitted parameter or input quantity by construction, nor does it rely on self-citations for uniqueness or load-bearing premises. The reported mIoU and PQ gains are measured against independent baselines on a held-out dataset, leaving the derivation self-contained with external content from the modeling steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that camera detections can be reliably turned into a parametric map whose elements produce accurate LiDAR pseudo labels; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)
  • domain assumption Camera detections provide sufficient semantic information to model scene elements for cross-modal label transfer
    Invoked to justify creating the semantic parametric map from image detections alone
invented entities (1)
  • semantic parametric map no independent evidence
    purpose: Model scene elements to generate pseudo labels in the target LiDAR domain
    Core new construct introduced to bridge camera and LiDAR without direct overlap

pith-pipeline@v0.9.0 · 5534 in / 1213 out tokens · 27616 ms · 2026-05-16T12:12:18.421641+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Ceres Solver, 2023

    Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver, 2023. 3

  2. [2]

    A survey: object detection methods from cnn to transformer.Multimedia Tools and Applications, 82 (14):21353–21383, 2023

    Ershat Arkin, Nurbiya Yadikar, Xuebin Xu, Alimjan Aysa, and Kurban Ubul. A survey: object detection methods from cnn to transformer.Multimedia Tools and Applications, 82 (14):21353–21383, 2023. 2

  3. [3]

    Generalized B-spline Camera Model

    Johannes Beck and Christoph Stiller. Generalized B-spline Camera Model. In2018 IEEE Intelligent Vehicles Sympo- sium (IV), pages 2137–2142, 2018. 5

  4. [4]

    Behley, M

    J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 5

  5. [5]

    Boosting multi-modal unsupervised domain adaptation for lidar semantic segmentation by self-supervised depth com- pletion.IEEE Access, 2023

    Adriano Cardace, Andrea Conti, Pierluigi Zama Ramirez, Riccardo Spezialetti, Samuele Salti, and Luigi Di Stefano. Boosting multi-modal unsupervised domain adaptation for lidar semantic segmentation by self-supervised depth com- pletion.IEEE Access, 2023. 3

  6. [6]

    Schwing, Alexan- der Kirillov, and Rohit Girdhar

    Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, 2022. 2, 5

  7. [7]

    Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christo- pher K. I. Williams, John M. Winn, and Andrew Zisser- man. The pascal visual object classes challenge: A retro- spective.International Journal of Computer Vision, 111:98 – 136, 2014. 5

  8. [8]

    Traffic light mapping and detection

    Nathaniel Fairfield and Chris Urmson. Traffic light mapping and detection. In2011 IEEE International Conference on Robotics and Automation, pages 5421–5426, 2011. 2

  9. [9]

    Embracing single stride 3d object detector with sparse trans- former

    Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Embracing single stride 3d object detector with sparse trans- former. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8458–8468,

  10. [10]

    A brief survey on semantic segmentation with deep learning.Neurocomputing, 406:302–321, 2020

    Shijie Hao, Yuan Zhou, and Yanrong Guo. A brief survey on semantic segmentation with deep learning.Neurocomputing, 406:302–321, 2020. 2

  11. [11]

    Soap: Cross-sensor domain adapta- tion for 3d object detection using stationary object aggrega- tion pseudo-labelling

    Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, and Krzysztof Czarnecki. Soap: Cross-sensor domain adapta- tion for 3d object detection using stationary object aggrega- tion pseudo-labelling. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 3352–3361, 2024. 2

  12. [12]

    Oneformer: One transformer to rule universal image segmentation

    Jitesh Jain, Jiachen Li, Mang Tik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2989–2998, 2023. 2

  13. [13]

    Panoptic Segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic Segmentation. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9396–9405, Long Beach, CA, USA, 2019. 5

  14. [14]

    Auto- matic Calibration of Multiple Cameras and Depth Sensors with a Spherical Target

    Julius K ¨ummerle, Tilman K¨uhner, and Martin Lauer. Auto- matic Calibration of Multiple Cameras and Depth Sensors with a Spherical Target. In2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8, 2018. 5

  15. [15]

    Accurate and Efficient Self-Localization on Roads using Basic Geometric Primitives.2019 International Conference on Robotics and Automation (ICRA), pages 5965–5971, 2019

    Julius K ¨ummerle, Marc Sons, Fabian Poggenhans, Tilman K¨uhner, Martin Lauer, and Christoph Stiller. Accurate and Efficient Self-Localization on Roads using Basic Geometric Primitives.2019 International Conference on Robotics and Automation (ICRA), pages 5965–5971, 2019. 2

  16. [16]

    Domain transfer for seman- tic segmentation of lidar data using deep neural networks

    Ferdinand Langer, Andres Milioto, Alexandre Haag, Jens Behley, and Cyrill Stachniss. Domain transfer for seman- tic segmentation of lidar data using deep neural networks. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8263–8270. IEEE, 2020. 3

  17. [17]

    Traffic light mapping, localization, and state de- tection for autonomous vehicles

    Jesse Levinson, Jake Askeland, Jennifer Dolson, and Sebas- tian Thrun. Traffic light mapping, localization, and state de- tection for autonomous vehicles. In2011 IEEE International Conference on Robotics and Automation, pages 5784–5791,

  18. [18]

    Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it

    Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Ham- marstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22150–22159, 2024. 5

  19. [19]

    Lawrence Zitnick

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham,

  20. [20]

    Springer International Publishing. 5

  21. [21]

    Du- alCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

    Yunze Man, Liang-Yan Gui, and Yu-Xiong Wang. Du- alCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception. In2023 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS),

  22. [22]

    V oxel transformer for 3d object detection

    Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Ji- ashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. V oxel transformer for 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3164–3173, 2021. 2

  23. [23]

    Gonzalo Mateo-Garc ´ıa, Valero Laparra, Dan L ´opez- Puigdollers, and Luis G´omez-Chova. Cross-sensor adversar- ial domain adaptation of landsat-8 and proba-v images for cloud detection.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:747–761, 2020. 3

  24. [24]

    Rangenet ++: Fast and accurate lidar semantic segmentation

    Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. Rangenet ++: Fast and accurate lidar semantic segmentation. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4213–4220,

  25. [25]

    The mapillary vistas dataset for semantic understanding of street scenes

    Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 2

  26. [26]

    Jan-Hendrik Pauls, Benjamin Schmidt, and Christoph Stiller. Automatic Mapping of Tailored Landmark Representations for Automated Driving and Map Learning.2021 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 6725–6731, 2021. 2, 3

  27. [27]

    Seamless Scene Segmentation.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8269–8278, 2019

    Lorenzo Porzi, Samuel Rota Bul `o, Aleksander Colovic, and Peter Kontschieder. Seamless Scene Segmentation.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8269–8278, 2019. 3

  28. [28]

    Few-shot cross-sensor domain adaptation between sar and multispec- tral data

    K Ram Prabhakar, Veera Harikrishna Nukala, Jayavard- hana Gubbi, Arpan Pal, and Balamuralidhar P. Few-shot cross-sensor domain adaptation between sar and multispec- tral data. InIGARSS 2022 - 2022 IEEE International Geo- science and Remote Sensing Symposium, pages 763–766,

  29. [29]

    Rist, Markus Enzweiler, and Dariu M

    Christoph B. Rist, Markus Enzweiler, and Dariu M. Gavrila. Cross-sensor deep domain adaptation for lidar detection and segmentation. In2019 IEEE Intelligent Vehicles Symposium (IV), pages 1535–1542, 2019. 3

  30. [30]

    Sefati, M

    M. Sefati, M. Daum, B. Sondermann, K. D. Kreisk ¨other, and A. Kampker. Improving vehicle localization using seman- tic and pole-like landmarks.2017 IEEE Intelligent Vehicles Symposium (IV), pages 13–19, 2017. 2

  31. [31]

    Detection and 3d reconstruction of traffic signs from multiple view color images.ISPRS Journal of Photogrammetry and Remote Sensing, 77:1–20, 2013

    Bahman Soheilian, Nicolas Paparoditis, and Bruno Vallet. Detection and 3d reconstruction of traffic signs from multiple view color images.ISPRS Journal of Photogrammetry and Remote Sensing, 77:1–20, 2013. 2

  32. [32]

    Efficient multi-drive map optimization towards life-long localization using surround view

    Marc Sons and Christoph Stiller. Efficient multi-drive map optimization towards life-long localization using surround view. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2671–2677, 2018. 3

  33. [33]

    Keller, and Christoph Stiller

    Marc Sons, Martin Lauer, Christoph G. Keller, and Christoph Stiller. Mapping and localization using surround view. In 2017 IEEE Intelligent Vehicles Symposium (IV), pages 1158– 1163, 2017. 3

  34. [34]

    Cali- brating multiple cameras with non-overlapping views using coded checkerboard targets

    Tobias Strauß, Julius Ziegler, and Johannes Beck. Cali- brating multiple cameras with non-overlapping views using coded checkerboard targets. In17th International IEEE Con- ference on Intelligent Transportation Systems (ITSC), pages 2623–2628, 2014. 5

  35. [35]

    Semantic segmentation using vision transformers: A survey.Engineering Applications of Artificial Intelligence, 126:106669, 2023

    Hans Thisanke, Chamli Deshan, Kavindu Chamith, Sa- chith Seneviratne, Rajith Vidanaarachchi, and Damayanthi Herath. Semantic segmentation using vision transformers: A survey.Engineering Applications of Artificial Intelligence, 126:106669, 2023. 2, 5

  36. [36]

    A survey on deep domain adaptation for li- dar perception

    Larissa T Triess, Mariella Dreissig, Christoph B Rist, and J Marius Z¨ollner. A survey on deep domain adaptation for li- dar perception. In2021 IEEE intelligent vehicles symposium workshops (IV workshops), pages 350–357. IEEE, 2021. 3

  37. [37]

    Attention is all you need.Advances in Neural Information Processing Systems, 2017

    A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 2

  38. [38]

    Farrell, and Matthew Barth

    Anh Vu, Qichi Yang, Jay A. Farrell, and Matthew Barth. Traffic sign detection, state estimation, and identification us- ing onboard sensors. In16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pages 875–880, 2013. 2

  39. [39]

    Dsvt: Dy- namic sparse voxel transformer with rotated sets

    Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, and Liwei Wang. Dsvt: Dy- namic sparse voxel transformer with rotated sets. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13520–13529, 2023. 2

  40. [40]

    Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery.Remote Sensing of Environment, 277: 113058, 2022

    Junjue Wang, Ailong Ma, Yanfei Zhong, Zhuo Zheng, and Liangpei Zhang. Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery.Remote Sensing of Environment, 277: 113058, 2022. 3

  41. [41]

    Loam: Lidar odometry and map- ping in real-time

    Ji Zhang and Sanjiv Singh. Loam: Lidar odometry and map- ping in real-time. InProceedings of Robotics: Science and Systems, 2014. 2

  42. [42]

    Srdan: Scale- aware and range-aware domain adaptation network for cross- dataset 3d object detection

    Weichen Zhang, Wen Li, and Dong Xu. Srdan: Scale- aware and range-aware domain adaptation network for cross- dataset 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6769–6779, 2021. 3

  43. [43]

    Cylindrical and asymmetrical 3d convolution networks for lidar seg- mentation

    Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, and Dahua Lin. Cylindrical and asymmetrical 3d convolution networks for lidar seg- mentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9939–9948,

  44. [44]

    Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023

    Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023. 2