arxiv: 2601.14477 · v2 · submitted 2026-01-20 · 💻 cs.CV · cs.AI· eess.IV

Recognition: 3 theorem links

· Lean Theorem

XD-MAP: Cross-Modal Domain Adaptation via Semantic Parametric Maps for Scalable Training Data Generation

Frank Bieder , Hendrik K\"onigshof , Haohao Hu , Fabian Immel , Yinzhe Shen , Jan-Hendrik Pauls , Christoph Stiller

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:12 UTC · model grok-4.3

classification 💻 cs.CV cs.AIeess.IV

keywords domain adaptationsemantic segmentationLiDARparametric mapscross-modalpseudo labelingpanoptic segmentationroad scenes

0 comments

The pith

XD-MAP builds semantic parametric maps from camera detections to create reliable pseudo labels for LiDAR segmentation without manual annotation or sensor overlap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes XD-MAP to transfer knowledge from image datasets to LiDAR by constructing semantic parametric maps that model scene elements detected in camera views. These maps generate pseudo labels directly in the LiDAR domain, removing the need for manual labeling or any direct overlap between sensors. The approach also extends perception from a limited front-view camera to a complete 360-degree field. On a large-scale road feature dataset, the resulting labels produce measurable gains in both 2D and 3D segmentation tasks over single-shot baselines.

Core claim

XD-MAP transfers sensor-specific knowledge from an image dataset to LiDAR by leveraging detections on camera images to create a semantic parametric map. The map elements are modeled to produce pseudo labels in the target domain without any manual annotation effort. Unlike previous domain transfer approaches, the method does not require direct overlap between sensors and enables extending the angular perception range from a front-view camera to a full 360-degree view.

What carries the argument

The semantic parametric map, which models scene elements from camera detections to generate pseudo labels for the LiDAR domain.

If this is right

LiDAR segmentation models can be trained using only image detections without additional labeling or sensor alignment.
Perception range extends from front-view cameras to full 360-degree coverage in the target domain.
2D semantic segmentation gains +19.5 mIoU and panoptic segmentation gains +19.5 PQth over single-shot baselines.
3D semantic segmentation gains +32.3 mIoU on the same road feature dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same map-based transfer could be tested between other mismatched sensor pairs such as radar and images.
The method might allow rapid generation of labeled data for new geographic regions or rare driving scenarios without fresh annotation campaigns.
Combining the parametric map with existing foundation models could further reduce dependence on task-specific labeled datasets.

Load-bearing premise

The semantic parametric map accurately models scene elements from camera detections to produce reliable pseudo labels in the LiDAR domain without any manual annotation or direct sensor overlap.

What would settle it

Running the reported experiments on the large-scale road feature dataset and finding that XD-MAP does not exceed the single-shot baseline by +19.5 mIoU in 2D semantic segmentation or +32.3 mIoU in 3D semantic segmentation would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2601.14477 by Christoph Stiller, Fabian Immel, Frank Bieder, Haohao Hu, Hendrik K\"onigshof, Jan-Hendrik Pauls, Yinzhe Shen.

**Figure 1.** Figure 1: Overview of the proposed XD-MAP: We generate pseudo labels from RGB camera images from a single camera using a neural [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Exemplary results of the semantic parametric mapping. Depicted are parametric primitives of three semantic classes (poles, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of measurement artifacts affecting our [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Spatial distribution of the sequences in Karlsruhe, Ger [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results of 2D perception. w.r.t. the reference pose. Assuming static objects, this shifts all lidar points as if they were measured from the reference pose. Since our labels are not pixels in an individual image, but generated from mapped geometric primitives in 3D space, motion compensation impacts the congruence of the labels with the range image. We examine its influence comparing training… view at source ↗

**Figure 6.** Figure 6: Qualitative results of 3D segmentation. Comparison of pseudo labels and predictions of XD-Map and XD-B2. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Boxplot of instance height and width in the field of view [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Until open-world foundation models match the performance of specialized approaches, deep learning systems remain dependent on task- and sensor-specific data availability. To bridge the gap between available datasets and deployment domains, domain adaptation strategies are widely used. In this work, we propose XD-MAP, a novel approach to transfer sensor-specific knowledge from an image dataset to LiDAR, an entirely different sensing domain. Our method leverages detections on camera images to create a semantic parametric map. The map elements are modeled to produce pseudo labels in the target domain without any manual annotation effort. Unlike previous domain transfer approaches, our method does not require direct overlap between sensors and enables extending the angular perception range from a front-view camera to a full 360{\deg} view. On our large-scale road feature dataset, XD-MAP outperforms single shot baseline approaches by +19.5 mIoU for 2D semantic segmentation, +19.5 PQth for 2D panoptic segmentation, and +32.3 mIoU in 3D semantic segmentation. The results demonstrate the effectiveness of our approach achieving strong performance on LiDAR data without any manual labeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XD-MAP turns camera detections into 360-degree LiDAR pseudo-labels via semantic parametric maps, delivering clear gains on their dataset but leaving the map construction and error propagation details thin.

read the letter

The main point is that this paper shows how to build a semantic parametric map from front-camera detections and then project it into full 360 LiDAR labels without any sensor overlap or manual annotation. That mechanism is the actual new piece, and it sidesteps the paired-data requirement that limits most cross-modal adaptation work. On their large road-feature dataset the numbers look useful: +19.5 mIoU on 2D semantic segmentation, +19.5 PQth on panoptic, and +32.3 mIoU on 3D semantic segmentation over single-shot baselines. Those margins matter for anyone who needs more labeled LiDAR without extra labeling cost. The approach is straightforward enough that a practitioner could try to reproduce the pipeline once the modeling steps are written down. The soft spots sit in the map itself. The abstract and stress-test note both flag that inferring 3D extents, positions, and classes for objects outside the camera frustum rests on road-planarity and shape priors that are not yet stress-tested in the provided summary. If detection errors or depth mis-estimates get amplified during the projection, the pseudo-labels could be systematically biased rather than merely noisy, which would shrink the real advantage over baselines. I would want to see an ablation on how sensitive the final mIoU is to those priors and some quantitative check on pseudo-label accuracy against held-out ground truth. This is aimed at teams building perception stacks for autonomous driving who are short on LiDAR labels. A reader already working on domain adaptation or synthetic data generation will get the most out of it. The idea is grounded enough and the reported gains are large enough that it deserves a serious referee rather than a desk reject; the central claim can be checked once the map-construction details and validation experiments are in front of reviewers.

Referee Report

3 major / 2 minor

Summary. The paper proposes XD-MAP, a cross-modal domain adaptation technique that builds a semantic parametric map from 2D camera detections to synthesize pseudo-labels for LiDAR point clouds. The map is intended to enable training of 2D semantic segmentation, 2D panoptic segmentation, and 3D semantic segmentation models in the target LiDAR domain without manual annotation or direct sensor overlap, extending perception from front-view camera to full 360°. On a large-scale road feature dataset the method reports gains of +19.5 mIoU (2D semantic), +19.5 PQth (2D panoptic), and +32.3 mIoU (3D semantic) over single-shot baselines.

Significance. If the parametric map reliably converts camera detections into accurate full-surround LiDAR labels, the approach would offer a scalable route to labeled LiDAR training data from existing camera corpora, lowering annotation costs in autonomous-driving and robotics applications. The magnitude of the reported margins is noteworthy and, if reproducible, would constitute a practical contribution to cross-modal data generation.

major comments (3)

[§3] §3 (Method): the construction of the semantic parametric map is described only at a high level; no equations or algorithmic steps are given for (i) lifting 2D detections to 3D, (ii) fitting category-specific shape parameters, or (iii) handling objects outside the camera frustum. Without these details the central claim that the map produces reliable pseudo-labels cannot be evaluated.
[§4] §4 (Experiments): no quantitative validation or error analysis of the generated pseudo-labels is presented (e.g., comparison against any held-out LiDAR ground truth, precision-recall of projected labels, or sensitivity to detection noise). Consequently the +19.5 mIoU and +32.3 mIoU gains cannot be attributed unambiguously to successful domain transfer rather than to weak baselines or dataset peculiarities.
[§4.2] §4.2 (Baselines): the single-shot baselines are not specified in sufficient detail (architecture, training protocol, data augmentation). If they are substantially weaker than current camera-to-LiDAR transfer methods, the reported margins do not demonstrate the superiority of the parametric-map approach.

minor comments (2)

[§1, §4.1] The abstract and §1 refer to “our large-scale road feature dataset” without providing size, sensor configuration, or a citation; this information should appear in §4.1.
[§3] Notation for the parametric map elements (e.g., what variables represent extent, orientation, or semantic class) is introduced inconsistently between text and any accompanying figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each point below and will revise the manuscript to incorporate additional details and analysis where feasible.

read point-by-point responses

Referee: [§3] §3 (Method): the construction of the semantic parametric map is described only at a high level; no equations or algorithmic steps are given for (i) lifting 2D detections to 3D, (ii) fitting category-specific shape parameters, or (iii) handling objects outside the camera frustum. Without these details the central claim that the map produces reliable pseudo-labels cannot be evaluated.

Authors: We agree that more technical detail is needed. In the revised manuscript we will expand §3 with explicit equations and steps: (i) lifting via monocular depth estimation and camera-to-LiDAR projection, (ii) category-specific parametric fitting formulated as constrained optimization using class priors, and (iii) frustum handling by accumulating the parametric map over time to cover the full 360° surround. These additions will make the pseudo-label generation process fully evaluable. revision: yes
Referee: [§4] §4 (Experiments): no quantitative validation or error analysis of the generated pseudo-labels is presented (e.g., comparison against any held-out LiDAR ground truth, precision-recall of projected labels, or sensitivity to detection noise). Consequently the +19.5 mIoU and +32.3 mIoU gains cannot be attributed unambiguously to successful domain transfer rather than to weak baselines or dataset peculiarities.

Authors: We acknowledge that direct pseudo-label validation would strengthen attribution of the gains. While downstream task performance serves as the primary evidence, the revision will add a quantitative error analysis subsection, including precision-recall on a held-out subset with available LiDAR annotations and sensitivity tests to detection noise. This should help isolate the contribution of the parametric map. revision: partial
Referee: [§4.2] §4.2 (Baselines): the single-shot baselines are not specified in sufficient detail (architecture, training protocol, data augmentation). If they are substantially weaker than current camera-to-LiDAR transfer methods, the reported margins do not demonstrate the superiority of the parametric-map approach.

Authors: We apologize for the omitted details. The revised §4.2 will fully specify the baselines, including architectures (e.g., RangeNet++ for 3D, DeepLabv3 for 2D), training protocols (optimizer, learning-rate schedule, epoch count), and all augmentations. This transparency will confirm that the reported margins reflect the strength of XD-MAP rather than weak baselines. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The XD-MAP method constructs semantic parametric maps from external camera detections and applies geometric/semantic modeling to generate LiDAR pseudo-labels. This process does not reduce any claimed prediction or performance gain to a fitted parameter or input quantity by construction, nor does it rely on self-citations for uniqueness or load-bearing premises. The reported mIoU and PQ gains are measured against independent baselines on a held-out dataset, leaving the derivation self-contained with external content from the modeling steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that camera detections can be reliably turned into a parametric map whose elements produce accurate LiDAR pseudo labels; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)

domain assumption Camera detections provide sufficient semantic information to model scene elements for cross-modal label transfer
Invoked to justify creating the semantic parametric map from image detections alone

invented entities (1)

semantic parametric map no independent evidence
purpose: Model scene elements to generate pseudo labels in the target LiDAR domain
Core new construct introduced to bridge camera and LiDAR without direct overlap

pith-pipeline@v0.9.0 · 5534 in / 1213 out tokens · 27616 ms · 2026-05-16T12:12:18.421641+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Poles and traffic lights are modeled as cylinders while road signs are upright planes... estimate a single map element... non-linear optimizer
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Spherical Camera Model... frustum that encapsulates the corresponding map element
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

XD-MAP outperforms single shot baseline approaches by +19.5 mIoU...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Ceres Solver, 2023

Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver, 2023. 3

work page 2023
[2]

A survey: object detection methods from cnn to transformer.Multimedia Tools and Applications, 82 (14):21353–21383, 2023

Ershat Arkin, Nurbiya Yadikar, Xuebin Xu, Alimjan Aysa, and Kurban Ubul. A survey: object detection methods from cnn to transformer.Multimedia Tools and Applications, 82 (14):21353–21383, 2023. 2

work page 2023
[3]

Generalized B-spline Camera Model

Johannes Beck and Christoph Stiller. Generalized B-spline Camera Model. In2018 IEEE Intelligent Vehicles Sympo- sium (IV), pages 2137–2142, 2018. 5

work page 2018
[4]

Behley, M

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 5

work page 2019
[5]

Boosting multi-modal unsupervised domain adaptation for lidar semantic segmentation by self-supervised depth com- pletion.IEEE Access, 2023

Adriano Cardace, Andrea Conti, Pierluigi Zama Ramirez, Riccardo Spezialetti, Samuele Salti, and Luigi Di Stefano. Boosting multi-modal unsupervised domain adaptation for lidar semantic segmentation by self-supervised depth com- pletion.IEEE Access, 2023. 3

work page 2023
[6]

Schwing, Alexan- der Kirillov, and Rohit Girdhar

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, 2022. 2, 5

work page 2022
[7]

Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christo- pher K. I. Williams, John M. Winn, and Andrew Zisser- man. The pascal visual object classes challenge: A retro- spective.International Journal of Computer Vision, 111:98 – 136, 2014. 5

work page 2014
[8]

Traffic light mapping and detection

Nathaniel Fairfield and Chris Urmson. Traffic light mapping and detection. In2011 IEEE International Conference on Robotics and Automation, pages 5421–5426, 2011. 2

work page 2011
[9]

Embracing single stride 3d object detector with sparse trans- former

Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Embracing single stride 3d object detector with sparse trans- former. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8458–8468,

work page
[10]

A brief survey on semantic segmentation with deep learning.Neurocomputing, 406:302–321, 2020

Shijie Hao, Yuan Zhou, and Yanrong Guo. A brief survey on semantic segmentation with deep learning.Neurocomputing, 406:302–321, 2020. 2

work page 2020
[11]

Soap: Cross-sensor domain adapta- tion for 3d object detection using stationary object aggrega- tion pseudo-labelling

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, and Krzysztof Czarnecki. Soap: Cross-sensor domain adapta- tion for 3d object detection using stationary object aggrega- tion pseudo-labelling. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 3352–3361, 2024. 2

work page 2024
[12]

Oneformer: One transformer to rule universal image segmentation

Jitesh Jain, Jiachen Li, Mang Tik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2989–2998, 2023. 2

work page 2023
[13]

Panoptic Segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic Segmentation. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9396–9405, Long Beach, CA, USA, 2019. 5

work page 2019
[14]

Auto- matic Calibration of Multiple Cameras and Depth Sensors with a Spherical Target

Julius K ¨ummerle, Tilman K¨uhner, and Martin Lauer. Auto- matic Calibration of Multiple Cameras and Depth Sensors with a Spherical Target. In2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–8, 2018. 5

work page 2018
[15]

Accurate and Efficient Self-Localization on Roads using Basic Geometric Primitives.2019 International Conference on Robotics and Automation (ICRA), pages 5965–5971, 2019

Julius K ¨ummerle, Marc Sons, Fabian Poggenhans, Tilman K¨uhner, Martin Lauer, and Christoph Stiller. Accurate and Efficient Self-Localization on Roads using Basic Geometric Primitives.2019 International Conference on Robotics and Automation (ICRA), pages 5965–5971, 2019. 2

work page 2019
[16]

Domain transfer for seman- tic segmentation of lidar data using deep neural networks

Ferdinand Langer, Andres Milioto, Alexandre Haag, Jens Behley, and Cyrill Stachniss. Domain transfer for seman- tic segmentation of lidar data using deep neural networks. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8263–8270. IEEE, 2020. 3

work page 2020
[17]

Traffic light mapping, localization, and state de- tection for autonomous vehicles

Jesse Levinson, Jake Askeland, Jennifer Dolson, and Sebas- tian Thrun. Traffic light mapping, localization, and state de- tection for autonomous vehicles. In2011 IEEE International Conference on Robotics and Automation, pages 5784–5791,

work page
[18]

Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it

Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Ham- marstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22150–22159, 2024. 5

work page 2024
[19]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham,

work page 2014
[20]

Springer International Publishing. 5

work page
[21]

Du- alCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Yunze Man, Liang-Yan Gui, and Yu-Xiong Wang. Du- alCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception. In2023 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS),

work page
[22]

V oxel transformer for 3d object detection

Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Ji- ashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. V oxel transformer for 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3164–3173, 2021. 2

work page 2021
[23]

Gonzalo Mateo-Garc ´ıa, Valero Laparra, Dan L ´opez- Puigdollers, and Luis G´omez-Chova. Cross-sensor adversar- ial domain adaptation of landsat-8 and proba-v images for cloud detection.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:747–761, 2020. 3

work page 2020
[24]

Rangenet ++: Fast and accurate lidar semantic segmentation

Andres Milioto, Ignacio Vizzo, Jens Behley, and Cyrill Stachniss. Rangenet ++: Fast and accurate lidar semantic segmentation. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4213–4220,

work page
[25]

The mapillary vistas dataset for semantic understanding of street scenes

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The mapillary vistas dataset for semantic understanding of street scenes. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. 2

work page 2017
[26]

Jan-Hendrik Pauls, Benjamin Schmidt, and Christoph Stiller. Automatic Mapping of Tailored Landmark Representations for Automated Driving and Map Learning.2021 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 6725–6731, 2021. 2, 3

work page 2021
[27]

Seamless Scene Segmentation.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8269–8278, 2019

Lorenzo Porzi, Samuel Rota Bul `o, Aleksander Colovic, and Peter Kontschieder. Seamless Scene Segmentation.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8269–8278, 2019. 3

work page 2019
[28]

Few-shot cross-sensor domain adaptation between sar and multispec- tral data

K Ram Prabhakar, Veera Harikrishna Nukala, Jayavard- hana Gubbi, Arpan Pal, and Balamuralidhar P. Few-shot cross-sensor domain adaptation between sar and multispec- tral data. InIGARSS 2022 - 2022 IEEE International Geo- science and Remote Sensing Symposium, pages 763–766,

work page 2022
[29]

Rist, Markus Enzweiler, and Dariu M

Christoph B. Rist, Markus Enzweiler, and Dariu M. Gavrila. Cross-sensor deep domain adaptation for lidar detection and segmentation. In2019 IEEE Intelligent Vehicles Symposium (IV), pages 1535–1542, 2019. 3

work page 2019
[30]

Sefati, M

M. Sefati, M. Daum, B. Sondermann, K. D. Kreisk ¨other, and A. Kampker. Improving vehicle localization using seman- tic and pole-like landmarks.2017 IEEE Intelligent Vehicles Symposium (IV), pages 13–19, 2017. 2

work page 2017
[31]

Detection and 3d reconstruction of traffic signs from multiple view color images.ISPRS Journal of Photogrammetry and Remote Sensing, 77:1–20, 2013

Bahman Soheilian, Nicolas Paparoditis, and Bruno Vallet. Detection and 3d reconstruction of traffic signs from multiple view color images.ISPRS Journal of Photogrammetry and Remote Sensing, 77:1–20, 2013. 2

work page 2013
[32]

Efficient multi-drive map optimization towards life-long localization using surround view

Marc Sons and Christoph Stiller. Efficient multi-drive map optimization towards life-long localization using surround view. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2671–2677, 2018. 3

work page 2018
[33]

Keller, and Christoph Stiller

Marc Sons, Martin Lauer, Christoph G. Keller, and Christoph Stiller. Mapping and localization using surround view. In 2017 IEEE Intelligent Vehicles Symposium (IV), pages 1158– 1163, 2017. 3

work page 2017
[34]

Cali- brating multiple cameras with non-overlapping views using coded checkerboard targets

Tobias Strauß, Julius Ziegler, and Johannes Beck. Cali- brating multiple cameras with non-overlapping views using coded checkerboard targets. In17th International IEEE Con- ference on Intelligent Transportation Systems (ITSC), pages 2623–2628, 2014. 5

work page 2014
[35]

Semantic segmentation using vision transformers: A survey.Engineering Applications of Artificial Intelligence, 126:106669, 2023

Hans Thisanke, Chamli Deshan, Kavindu Chamith, Sa- chith Seneviratne, Rajith Vidanaarachchi, and Damayanthi Herath. Semantic segmentation using vision transformers: A survey.Engineering Applications of Artificial Intelligence, 126:106669, 2023. 2, 5

work page 2023
[36]

A survey on deep domain adaptation for li- dar perception

Larissa T Triess, Mariella Dreissig, Christoph B Rist, and J Marius Z¨ollner. A survey on deep domain adaptation for li- dar perception. In2021 IEEE intelligent vehicles symposium workshops (IV workshops), pages 350–357. IEEE, 2021. 3

work page 2021
[37]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 2

work page 2017
[38]

Farrell, and Matthew Barth

Anh Vu, Qichi Yang, Jay A. Farrell, and Matthew Barth. Traffic sign detection, state estimation, and identification us- ing onboard sensors. In16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pages 875–880, 2013. 2

work page 2013
[39]

Dsvt: Dy- namic sparse voxel transformer with rotated sets

Haiyang Wang, Chen Shi, Shaoshuai Shi, Meng Lei, Sen Wang, Di He, Bernt Schiele, and Liwei Wang. Dsvt: Dy- namic sparse voxel transformer with rotated sets. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13520–13529, 2023. 2

work page 2023
[40]

Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery.Remote Sensing of Environment, 277: 113058, 2022

Junjue Wang, Ailong Ma, Yanfei Zhong, Zhuo Zheng, and Liangpei Zhang. Cross-sensor domain adaptation for high spatial resolution urban land-cover mapping: From airborne to spaceborne imagery.Remote Sensing of Environment, 277: 113058, 2022. 3

work page 2022
[41]

Loam: Lidar odometry and map- ping in real-time

Ji Zhang and Sanjiv Singh. Loam: Lidar odometry and map- ping in real-time. InProceedings of Robotics: Science and Systems, 2014. 2

work page 2014
[42]

Srdan: Scale- aware and range-aware domain adaptation network for cross- dataset 3d object detection

Weichen Zhang, Wen Li, and Dong Xu. Srdan: Scale- aware and range-aware domain adaptation network for cross- dataset 3d object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6769–6779, 2021. 3

work page 2021
[43]

Cylindrical and asymmetrical 3d convolution networks for lidar seg- mentation

Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Yuexin Ma, Wei Li, Hongsheng Li, and Dahua Lin. Cylindrical and asymmetrical 3d convolution networks for lidar seg- mentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9939–9948,

work page
[44]

Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023

Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023. 2

work page 2023