RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

Keqiang Li; Lei He; Ni Ding; Ruidan Xing; Runyi Huang; Yuheng Shi

arxiv: 2605.23406 · v1 · pith:VOEDAPASnew · submitted 2026-05-22 · 💻 cs.CV

RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

Runyi Huang , Ni Ding , Ruidan Xing , Yuheng Shi , Lei He , Keqiang Li This is my paper

Pith reviewed 2026-05-25 04:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords LiDAR data generationroadside sensorsvehicle-mounted LiDARautonomous drivingpoint cloud resamplingobject detectionR2V-LiDARvirtual modeling

0 comments

The pith

Roadside LiDAR observations can be converted into vehicle-mounted LiDAR data using coordinate transformation and virtual modeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework called RS2AD-LiDAR to generate vehicle-mounted LiDAR point clouds from roadside sensor data. This addresses the high costs and scarcity of vehicle data collection for training end-to-end autonomous driving models. A new dataset R2V-LiDAR is constructed to enable evaluation due to lack of existing overlapping data. Experiments show the generated data is semantically similar to real data and, when added to training, improves both BEV and 3D object detection accuracy.

Core claim

The central claim is that transforming roadside LiDAR point clouds into the vehicle coordinate system and then synthesizing high-fidelity vehicle-mounted data via virtual LiDAR modeling and point cloud resampling produces data that matches real vehicle LiDAR semantically and enhances detection model performance when incorporated into training.

What carries the argument

The RS2AD-LiDAR framework that performs roadside-to-vehicle coordinate transformation followed by virtual LiDAR modeling and point cloud resampling to synthesize the data.

If this is right

Generated data supplements scarce vehicle data to improve object detection accuracy in BEV and 3D.
Reduces reliance on expensive single-vehicle data collection and annotation.
Enables leveraging existing roadside infrastructure for generating training data.
First method to achieve roadside-to-vehicle LiDAR reconstruction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might enable data sharing between roadside units and multiple vehicle platforms without direct collection.
Could be extended to generate data for rare or dangerous scenarios observed from the roadside.
The method may reduce data silos by allowing roadside observations to contribute to vehicle model training across fleets.

Load-bearing premise

The R2V-LiDAR dataset's overlapping perception coverage is representative of real roadside and vehicle setups so that generated data transfers without significant domain gaps from the modeling steps.

What would settle it

Training an object detector on real data plus the generated data and observing no improvement or a decrease in BEV and 3D detection accuracy on a separate real-world test set compared to real data alone would falsify the effectiveness claim.

read the original abstract

End-to-end autonomous driving solutions, which directly process multimodal sensory data and output fine-grained control commands, have gradually become a mainstream direction with the development of autonomous driving technology. However, current methods in this category rely on single-vehicle data collection for model training and optimization, which suffers from high acquisition and annotation costs, scarcity of valuable scenarios, and data silos. To address these challenges, we propose RS2AD-LiDAR, a novel framework for reconstructing and generating vehicle-mounted LiDAR data from roadside sensor observations. Since no public dataset currently provides highly overlapping perception coverage between roadside and vehicle-mounted LiDAR sensors, which is essential for studying roadside-to-vehicle data generation, we constructed a dedicated dataset named R2V-LiDAR which is used solely for evaluation in this work. Specifically, our method transforms roadside LiDAR point clouds into the vehicle-mounted LiDAR coordinate system, and synthesizes high-fidelity vehicle-mounted data via virtual LiDAR modeling and point cloud resampling techniques. To the best of our knowledge, this is the first approach to reconstruct vehicle-mounted LiDAR data from roadside sensor inputs. Extensive experimental comparisons demonstrate the semantic similarity between the generated data and real data. Furthermore, object detection experiments show that incorporating the generated data into real data for model training improves both Bird's Eye View (BEV) and 3D detection accuracy, thereby validating the effectiveness of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is the first paper to frame roadside-to-vehicle LiDAR generation as a task, with a new paired dataset and some detection gains when mixing generated data into training, but the evidence for fidelity and generalization is thin.

read the letter

The main takeaway is that the work introduces a new task—reconstructing vehicle-mounted LiDAR from roadside observations—along with a dedicated paired dataset (R2V-LiDAR) built because no public one has the required overlap. They transform coordinates, apply virtual LiDAR modeling, and resample points, then report semantic similarity plus modest gains in BEV and 3D detection when the generated data is added to real training sets. That direction addresses a genuine data-scarcity issue in end-to-end driving.

Referee Report

2 major / 2 minor

Summary. The paper proposes RS2AD-LiDAR, a framework that transforms roadside LiDAR point clouds into the vehicle coordinate frame and applies virtual LiDAR modeling plus resampling to synthesize vehicle-mounted LiDAR data. A new paired dataset (R2V-LiDAR) is introduced solely for evaluation because no public dataset supplies the required overlap. The central empirical claims are that the generated point clouds exhibit semantic similarity to real vehicle LiDAR and that augmenting real training data with the generated data measurably improves both BEV and 3D object detection accuracy.

Significance. If the generated data can be shown to transfer without substantial domain shift, the method would offer a practical route to enlarge training corpora for end-to-end driving models by exploiting existing roadside infrastructure, thereby lowering collection and annotation costs. The construction of R2V-LiDAR itself fills a documented gap in paired roadside-vehicle observations. The significance is currently limited by the absence of quantitative evidence that the coordinate transform, virtual modeling, and resampling steps preserve the statistical properties (density, noise, beam pattern) needed for the reported detection gains to generalize.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the central claim that mixing generated data improves BEV/3D detection rests on the assumption that R2V-LiDAR supplies representative overlapping coverage and that the virtual modeling/resampling pipeline produces statistically faithful vehicle LiDAR; however, no coverage-completeness metrics, calibration-error statistics, or distributional comparisons (point-count histograms, range distributions, intensity statistics) before versus after resampling are supplied, leaving open the possibility that observed gains are dataset-specific artifacts.
[Dataset construction] Dataset description: the paper states R2V-LiDAR is used only for evaluation and is not released; without public access or at least tabulated overlap statistics, reviewers cannot independently verify that the paired observations are sufficiently dense and accurately calibrated to support the transfer claim.

minor comments (2)

[Experiments] Clarify whether any quantitative similarity metrics (Chamfer distance, Earth-mover’s distance, or point-density KL divergence) were computed in addition to the qualitative semantic-similarity statements.
[Introduction] The abstract asserts this is the “first approach”; a brief related-work paragraph distinguishing the method from prior roadside-to-vehicle projection or simulation techniques would strengthen the novelty claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed and constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical support for our claims.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim that mixing generated data improves BEV/3D detection rests on the assumption that R2V-LiDAR supplies representative overlapping coverage and that the virtual modeling/resampling pipeline produces statistically faithful vehicle LiDAR; however, no coverage-completeness metrics, calibration-error statistics, or distributional comparisons (point-count histograms, range distributions, intensity statistics) before versus after resampling are supplied, leaving open the possibility that observed gains are dataset-specific artifacts.

Authors: We agree that these quantitative metrics are necessary to rule out dataset-specific artifacts and to confirm that the coordinate transform, virtual modeling, and resampling preserve the required statistical properties. In the revised manuscript we will add coverage-completeness metrics for the paired observations, calibration-error statistics for the coordinate transformation, and distributional comparisons (point-count histograms, range distributions, and intensity statistics) computed before versus after resampling. These additions will directly address the concern about generalization of the reported detection gains. revision: yes
Referee: [Dataset construction] Dataset description: the paper states R2V-LiDAR is used only for evaluation and is not released; without public access or at least tabulated overlap statistics, reviewers cannot independently verify that the paired observations are sufficiently dense and accurately calibrated to support the transfer claim.

Authors: We acknowledge that independent verification requires more information than is currently provided. Although the R2V-LiDAR dataset cannot be released publicly owing to the terms under which the roadside and vehicle data were collected, we will include a new table of overlap statistics in the revised dataset section. The table will report average overlap percentage, number of paired frames, point-density overlap, and calibration accuracy measures. These tabulated statistics will enable reviewers to assess whether the paired observations are sufficiently dense and accurately calibrated. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims rest on dataset construction and detection metrics without reduction to inputs or self-citations

full rationale

The paper describes a pipeline of coordinate transformation, virtual LiDAR modeling, and resampling to generate vehicle-mounted point clouds from roadside inputs, then evaluates semantic similarity and downstream detection gains on the newly constructed R2V-LiDAR dataset. No equations, fitted parameters, or derivation steps appear in the provided text. The uniqueness claim is a standard 'to the best of our knowledge' statement rather than a load-bearing theorem imported from prior author work. All reported improvements are measured against external baselines (real data only) and are therefore falsifiable outside any self-referential loop. The method is self-contained against the stated empirical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on abstract, the approach assumes accurate sensor coordinate alignment and that virtual modeling plus resampling produce transferable data; no explicit free parameters or invented entities are described.

axioms (1)

domain assumption Roadside and vehicle LiDAR have sufficient overlapping coverage for meaningful coordinate transformation
Described as essential because no public dataset provides it

pith-pipeline@v0.9.0 · 5796 in / 1178 out tokens · 31119 ms · 2026-05-25T04:50:45.189694+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

IEEE Trans

Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10164–10183 (2024) https://doi.org/10.1109/TPAMI.2024. 3435937

work page doi:10.1109/tpami.2024 2024
[2]

10610948

Wang, T.-H., Maalouf, A., Xiao, W., Ban, Y., Amini, A., Rosman, G., Kara- man, S., Rus, D.: Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 6687–6694 (2024). https://doi.org/10. 1109/icra57147.2024.10611590 . IEEE

work page arXiv 2024
[3]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Lu, S., He, L., Li, S.E., Luo, Y., Wang, J., Li, K.: Hierarchical end-to-end autonomous driving: Integrating bev perception with deep reinforcement learning. In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 8856–8863 (2025). https://doi.org/10.1109/icra55743.2025.11128829 . IEEE

work page doi:10.1109/icra55743.2025.11128829 2025
[4]

The International Journal of Robotics Research40(4-5), 681–690 (2021) https://doi.org/10.1177/ 0278364920979368

Pitropov, M., Garcia, D.E., Rebello, J., Smart, M., Wang, C., Czarnecki, K., Waslander, S.: Canadian adverse driving conditions dataset. The International Journal of Robotics Research40(4-5), 681–690 (2021) https://doi.org/10.1177/ 0278364920979368

work page 2021
[5]

IEEE Transactions on Intelligent Transportation Systems (2024) https://doi.org/10.1109/tits.2024

Gao, B., Liu, J., Zou, H., Chen, J., He, L., Li, K.: Vehicle-road-cloud collabo- rative perception framework and key technologies: A review. IEEE Transactions on Intelligent Transportation Systems (2024) https://doi.org/10.1109/tits.2024. 3459799

work page doi:10.1109/tits.2024 2024
[6]

In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/cvpr.2012. 23 6248074 . IEEE

work page doi:10.1109/cvpr.2012 2012
[7]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016). https://doi.org/10.1109/cvpr.2016. 350

work page doi:10.1109/cvpr.2016 2016
[8]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krish- nan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 11621–11631 (2020). https://doi.org/ 10.1109/cvpr42600.2020.01164

work page doi:10.1109/cvpr42600.2020.01164 2020
[9]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B.,et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020). https://doi. org/10.1109/cvpr42600.2020.00252

work page doi:10.1109/cvpr42600.2020.00252 2020
[10]

A ConvNet for the 2020s

Ye, X., Shu, M., Li, H., Shi, Y., Li, Y., Wang, G., Tan, X., Ding, E.: Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 21341–21350 (2022). https://doi.org/ 10.1109/cvpr52688.2022.02065

work page doi:10.1109/cvpr52688.2022.02065 2022
[11]

In: Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pp

Zimmer, W., Creß, C., Nguyen, H.T., Knoll, A.C.: Tumtraf intersection dataset: All you need for urban 3d camera-lidar roadside perception. In: Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pp. 1030–1037 (2023). https://doi.org/10.1109/itsc57777.2023.10422289

work page doi:10.1109/itsc57777.2023.10422289 2023
[12]

A ConvNet for the 2020s

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z.,et al.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21329–21338 (2022). https: //doi.org/10.1109/cvpr52688.2022.02067

work page doi:10.1109/cvpr52688.2022.02067 2022
[13]

Emogen: Emotional image content generation with text-to-image diffusion models,

Ma, C., Qiao, L., Zhu, C., Liu, K., Kong, Z., Li, Q., Zhou, X., Kan, Y., Wu, W.,et al.: Holovic: Large-scale dataset and benchmark for multi-sensor holo- graphic intersection and vehicle-infrastructure cooperative. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22129–22138 (2024). https://doi.org/10.1109/cvpr527...

work page doi:10.1109/cvpr52733.2024.02089 2024
[14]

In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Caccia, L., Van Hoof, H., Courville, A., Pineau, J.: Deep generative modeling of lidar data. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5034–5040 (2019). https://doi.org/10.1109/iros40897. 2019.8968535 . IEEE 24

work page doi:10.1109/iros40897 2019
[15]

In: European Conference on Computer Vision, pp

Zyrianov, V., Zhu, X., Wang, S.: Learning to generate realistic lidar point clouds. In: European Conference on Computer Vision, pp. 17–35 (2022). https://doi.org/ 10.1007/978-3-031-20050-2 2 . Springer

work page doi:10.1007/978-3-031-20050-2 2022
[16]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kong, L., Liu, Y., Chen, R., Ma, Y., Zhu, X., Li, Y., Hou, Y., Qiao, Y., Liu, Z.: Rethinking range view representation for lidar segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 228–240 (2023). https://doi.org/10.1109/iccv51070.2023.00028

work page doi:10.1109/iccv51070.2023.00028 2023
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Xiong, Y., Ma, W.-C., Wang, J., Urtasun, R.: Learning compact representations for lidar completion and generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1074–1083 (2023). https://doi. org/10.1109/cvpr52729.2023.00110

work page doi:10.1109/cvpr52729.2023.00110 2023
[18]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Zyrianov, V., Che, H., Liu, Z., Wang, S.: Lidardm: Generative lidar simulation in a generated world. In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 6055–6062 (2025). https://doi.org/10.1109/icra55743. 2025.11128001 . IEEE

work page doi:10.1109/icra55743 2025
[19]

https://doi.org/10.1109/iros60139.2025.11247690

Ren, X., Lu, Y., Cao, T., Gao, R., Huang, S., Sabour, A., Shen, T., Pfaff, T., Wu, J.Z., Chen, R., Kim, S.W., Gao, J., Leal-Taixe, L., Chen, M., Fidler, S., Ling, H.: Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models (2025). https://doi.org/10.1109/iros60139.2025.11247690 . https://arxiv.org/abs/2506.09042

work page doi:10.1109/iros60139.2025.11247690 2025
[20]

In: Conference on Robot Learning, pp

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017). https://doi.org/10.1007/978-3-030-01267-0 15 . PMLR

work page doi:10.1007/978-3-030-01267-0 2017
[21]

In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat

Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, pp. 2149–2154 (2004). https://doi.org/10.1109/IROS.2004.1389727

work page doi:10.1109/iros.2004.1389727 2004
[22]

In: Field and Service Robotics: Results of the 11th International Conference, pp

Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics: Results of the 11th International Conference, pp. 621–635 (2017). https://doi.org/10.1007/ 978-3-319-67361-5 40 . Springer

work page 2017
[23]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243 (2016). https://doi.org/10.1109/cvpr.2016.352

work page doi:10.1109/cvpr.2016.352 2016
[24]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Manivasagam, S., Wang, S., Wong, K., Zeng, W., Sazanovich, M., Tan, S., Yang, B., Ma, W.-C., Urtasun, R.: Lidarsim: Realistic lidar simulation by leverag- ing the real world. In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp. 11167–11176 (2020). https://doi.org/10. 1109/cvpr42600.2020.01118

work page arXiv 2020
[25]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Huang, S., Gojcic, Z., Wang, Z., Williams, F., Kasten, Y., Fidler, S., Schindler, K., Litany, O.: Neural lidar fields for novel view synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18236–18246 (2023). https://doi.org/10.1109/iccv51070.2023.01672

work page doi:10.1109/iccv51070.2023.01672 2023
[26]

In: Proceedings of the 32nd ACM International Conference on Multimedia, pp

Tao, T., Gao, L., Wang, G., Lao, Y., Chen, P., Zhao, H., Hao, D., Liang, X., Salzmann, M., Yu, K.: Lidar-nerf: Novel lidar view synthesis via neural radiance fields. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 390–398 (2024). https://doi.org/10.1145/3664647.3681482

work page doi:10.1145/3664647.3681482 2024
[27]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Lee, S., Lim, H., Myung, H.: Patchwork++: Fast and robust ground segmentation solving partial under-segmentation using 3d point cloud. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 13276– 13283 (2022). https://doi.org/10.1109/iros47612.2022.9981561 . IEEE

work page doi:10.1109/iros47612.2022.9981561 2022
[28]

Emogen: Emotional image content generation with text-to-image diffusion models,

Wu, X., Jiang, L., Wang, P.-S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point transformer v3: Simpler, faster, stronger. In: CVPR (2024). https://doi.org/10.1109/cvpr52733.2024.00463

work page doi:10.1109/cvpr52733.2024.00463 2024
[29]

https://github.com/open-mmlab/ OpenPCDet (2020)

OpenPCDet Development Team: OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. https://github.com/open-mmlab/ OpenPCDet (2020). https://doi.org/10.1051/eas:0833018

work page doi:10.1051/eas:0833018 2020
[30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpil- lars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019). https://doi.org/10.1109/cvpr.2019.01298

work page doi:10.1109/cvpr.2019.01298 2019
[31]

IEEE Transactions on Pattern Analysis and Machine Intelligence44(3), 1219–1231 (2022) https://doi.org/10.1109/tpami.2020.3025077

Simonelli, A., Bul` o, S.R., Porzi, L., Antequera, M.L., Kontschieder, P.: Disentan- gling monocular 3d object detection: From single to multi-class recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence44(3), 1219–1231 (2022) https://doi.org/10.1109/tpami.2020.3025077

work page doi:10.1109/tpami.2020.3025077 2022
[32]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020). https://doi.org/10.1109/cvpr42600.2020.01054

work page doi:10.1109/cvpr42600.2020.01054 2020
[33]

Sensors18(10), 3337 (2018) https://doi.org/10.3390/s18103337 26

Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors18(10), 3337 (2018) https://doi.org/10.3390/s18103337 26

work page doi:10.3390/s18103337 2018

[1] [1]

IEEE Trans

Chen, L., Wu, P., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and frontiers. IEEE Trans. Pattern Anal. Mach. Intell.46(12), 10164–10183 (2024) https://doi.org/10.1109/TPAMI.2024. 3435937

work page doi:10.1109/tpami.2024 2024

[2] [2]

10610948

Wang, T.-H., Maalouf, A., Xiao, W., Ban, Y., Amini, A., Rosman, G., Kara- man, S., Rus, D.: Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 6687–6694 (2024). https://doi.org/10. 1109/icra57147.2024.10611590 . IEEE

work page arXiv 2024

[3] [3]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Lu, S., He, L., Li, S.E., Luo, Y., Wang, J., Li, K.: Hierarchical end-to-end autonomous driving: Integrating bev perception with deep reinforcement learning. In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 8856–8863 (2025). https://doi.org/10.1109/icra55743.2025.11128829 . IEEE

work page doi:10.1109/icra55743.2025.11128829 2025

[4] [4]

The International Journal of Robotics Research40(4-5), 681–690 (2021) https://doi.org/10.1177/ 0278364920979368

Pitropov, M., Garcia, D.E., Rebello, J., Smart, M., Wang, C., Czarnecki, K., Waslander, S.: Canadian adverse driving conditions dataset. The International Journal of Robotics Research40(4-5), 681–690 (2021) https://doi.org/10.1177/ 0278364920979368

work page 2021

[5] [5]

IEEE Transactions on Intelligent Transportation Systems (2024) https://doi.org/10.1109/tits.2024

Gao, B., Liu, J., Zou, H., Chen, J., He, L., Li, K.: Vehicle-road-cloud collabo- rative perception framework and key technologies: A review. IEEE Transactions on Intelligent Transportation Systems (2024) https://doi.org/10.1109/tits.2024. 3459799

work page doi:10.1109/tits.2024 2024

[6] [6]

In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp

Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012). https://doi.org/10.1109/cvpr.2012. 23 6248074 . IEEE

work page doi:10.1109/cvpr.2012 2012

[7] [7]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016). https://doi.org/10.1109/cvpr.2016. 350

work page doi:10.1109/cvpr.2016 2016

[8] [8]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krish- nan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 11621–11631 (2020). https://doi.org/ 10.1109/cvpr42600.2020.01164

work page doi:10.1109/cvpr42600.2020.01164 2020

[9] [9]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B.,et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020). https://doi. org/10.1109/cvpr42600.2020.00252

work page doi:10.1109/cvpr42600.2020.00252 2020

[10] [10]

A ConvNet for the 2020s

Ye, X., Shu, M., Li, H., Shi, Y., Li, Y., Wang, G., Tan, X., Ding, E.: Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 21341–21350 (2022). https://doi.org/ 10.1109/cvpr52688.2022.02065

work page doi:10.1109/cvpr52688.2022.02065 2022

[11] [11]

In: Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pp

Zimmer, W., Creß, C., Nguyen, H.T., Knoll, A.C.: Tumtraf intersection dataset: All you need for urban 3d camera-lidar roadside perception. In: Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pp. 1030–1037 (2023). https://doi.org/10.1109/itsc57777.2023.10422289

work page doi:10.1109/itsc57777.2023.10422289 2023

[12] [12]

A ConvNet for the 2020s

Yu, H., Luo, Y., Shu, M., Huo, Y., Yang, Z., Shi, Y., Guo, Z., Li, H., Hu, X., Yuan, J., Nie, Z.,et al.: Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21329–21338 (2022). https: //doi.org/10.1109/cvpr52688.2022.02067

work page doi:10.1109/cvpr52688.2022.02067 2022

[13] [13]

Emogen: Emotional image content generation with text-to-image diffusion models,

Ma, C., Qiao, L., Zhu, C., Liu, K., Kong, Z., Li, Q., Zhou, X., Kan, Y., Wu, W.,et al.: Holovic: Large-scale dataset and benchmark for multi-sensor holo- graphic intersection and vehicle-infrastructure cooperative. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22129–22138 (2024). https://doi.org/10.1109/cvpr527...

work page doi:10.1109/cvpr52733.2024.02089 2024

[14] [14]

In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Caccia, L., Van Hoof, H., Courville, A., Pineau, J.: Deep generative modeling of lidar data. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5034–5040 (2019). https://doi.org/10.1109/iros40897. 2019.8968535 . IEEE 24

work page doi:10.1109/iros40897 2019

[15] [15]

In: European Conference on Computer Vision, pp

Zyrianov, V., Zhu, X., Wang, S.: Learning to generate realistic lidar point clouds. In: European Conference on Computer Vision, pp. 17–35 (2022). https://doi.org/ 10.1007/978-3-031-20050-2 2 . Springer

work page doi:10.1007/978-3-031-20050-2 2022

[16] [16]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Kong, L., Liu, Y., Chen, R., Ma, Y., Zhu, X., Li, Y., Hou, Y., Qiao, Y., Liu, Z.: Rethinking range view representation for lidar segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 228–240 (2023). https://doi.org/10.1109/iccv51070.2023.00028

work page doi:10.1109/iccv51070.2023.00028 2023

[17] [17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Xiong, Y., Ma, W.-C., Wang, J., Urtasun, R.: Learning compact representations for lidar completion and generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1074–1083 (2023). https://doi. org/10.1109/cvpr52729.2023.00110

work page doi:10.1109/cvpr52729.2023.00110 2023

[18] [18]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Zyrianov, V., Che, H., Liu, Z., Wang, S.: Lidardm: Generative lidar simulation in a generated world. In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp. 6055–6062 (2025). https://doi.org/10.1109/icra55743. 2025.11128001 . IEEE

work page doi:10.1109/icra55743 2025

[19] [19]

https://doi.org/10.1109/iros60139.2025.11247690

Ren, X., Lu, Y., Cao, T., Gao, R., Huang, S., Sabour, A., Shen, T., Pfaff, T., Wu, J.Z., Chen, R., Kim, S.W., Gao, J., Leal-Taixe, L., Chen, M., Fidler, S., Ling, H.: Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models (2025). https://doi.org/10.1109/iros60139.2025.11247690 . https://arxiv.org/abs/2506.09042

work page doi:10.1109/iros60139.2025.11247690 2025

[20] [20]

In: Conference on Robot Learning, pp

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on Robot Learning, pp. 1–16 (2017). https://doi.org/10.1007/978-3-030-01267-0 15 . PMLR

work page doi:10.1007/978-3-030-01267-0 2017

[21] [21]

In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat

Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, pp. 2149–2154 (2004). https://doi.org/10.1109/IROS.2004.1389727

work page doi:10.1109/iros.2004.1389727 2004

[22] [22]

In: Field and Service Robotics: Results of the 11th International Conference, pp

Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics: Results of the 11th International Conference, pp. 621–635 (2017). https://doi.org/10.1007/ 978-3-319-67361-5 40 . Springer

work page 2017

[23] [23]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3234–3243 (2016). https://doi.org/10.1109/cvpr.2016.352

work page doi:10.1109/cvpr.2016.352 2016

[24] [24]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Manivasagam, S., Wang, S., Wong, K., Zeng, W., Sazanovich, M., Tan, S., Yang, B., Ma, W.-C., Urtasun, R.: Lidarsim: Realistic lidar simulation by leverag- ing the real world. In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp. 11167–11176 (2020). https://doi.org/10. 1109/cvpr42600.2020.01118

work page arXiv 2020

[25] [25]

In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

Huang, S., Gojcic, Z., Wang, Z., Williams, F., Kasten, Y., Fidler, S., Schindler, K., Litany, O.: Neural lidar fields for novel view synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18236–18246 (2023). https://doi.org/10.1109/iccv51070.2023.01672

work page doi:10.1109/iccv51070.2023.01672 2023

[26] [26]

In: Proceedings of the 32nd ACM International Conference on Multimedia, pp

Tao, T., Gao, L., Wang, G., Lao, Y., Chen, P., Zhao, H., Hao, D., Liang, X., Salzmann, M., Yu, K.: Lidar-nerf: Novel lidar view synthesis via neural radiance fields. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 390–398 (2024). https://doi.org/10.1145/3664647.3681482

work page doi:10.1145/3664647.3681482 2024

[27] [27]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Lee, S., Lim, H., Myung, H.: Patchwork++: Fast and robust ground segmentation solving partial under-segmentation using 3d point cloud. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 13276– 13283 (2022). https://doi.org/10.1109/iros47612.2022.9981561 . IEEE

work page doi:10.1109/iros47612.2022.9981561 2022

[28] [28]

Emogen: Emotional image content generation with text-to-image diffusion models,

Wu, X., Jiang, L., Wang, P.-S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point transformer v3: Simpler, faster, stronger. In: CVPR (2024). https://doi.org/10.1109/cvpr52733.2024.00463

work page doi:10.1109/cvpr52733.2024.00463 2024

[29] [29]

https://github.com/open-mmlab/ OpenPCDet (2020)

OpenPCDet Development Team: OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. https://github.com/open-mmlab/ OpenPCDet (2020). https://doi.org/10.1051/eas:0833018

work page doi:10.1051/eas:0833018 2020

[30] [30]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpil- lars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019). https://doi.org/10.1109/cvpr.2019.01298

work page doi:10.1109/cvpr.2019.01298 2019

[31] [31]

IEEE Transactions on Pattern Analysis and Machine Intelligence44(3), 1219–1231 (2022) https://doi.org/10.1109/tpami.2020.3025077

Simonelli, A., Bul` o, S.R., Porzi, L., Antequera, M.L., Kontschieder, P.: Disentan- gling monocular 3d object detection: From single to multi-class recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence44(3), 1219–1231 (2022) https://doi.org/10.1109/tpami.2020.3025077

work page doi:10.1109/tpami.2020.3025077 2022

[32] [32]

In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020). https://doi.org/10.1109/cvpr42600.2020.01054

work page doi:10.1109/cvpr42600.2020.01054 2020

[33] [33]

Sensors18(10), 3337 (2018) https://doi.org/10.3390/s18103337 26

Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors18(10), 3337 (2018) https://doi.org/10.3390/s18103337 26

work page doi:10.3390/s18103337 2018