arxiv: 2605.05014 · v2 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

Gasser Elazab , Frank Neuhaus , Tilman Ko{\ss} , Malte Splietker , Aditya Date , Michael Unterreiner , Maximilian Jansen , Olaf Hellwich

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords automotive datasetdense depth estimationLiDAR fusionroad topographyautonomous driving3D reconstructionchallenging environmentsmulti-modal sensors

0 comments

The pith

CARD dataset supplies quasi-dense 3D ground truth for irregular road surfaces through multi-LiDAR fusion that produces about 500,000 valid depth pixels per frame.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CARD, a multi-modal automotive dataset recorded on challenging road topographies that include speed bumps, potholes, irregular surfaces, and off-road segments across Germany and Italy. Prior driving datasets concentrate on flat paved roads and supply only sparse LiDAR points as ground truth, which limits assessment of fine geometric details in depth estimation and completion. CARD includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration data. Its multi-LiDAR fusion process generates roughly 500,000 valid depth pixels per frame, stated as 6.5 times denser than KITTI Depth Completion and 10 times denser than other public driving datasets on average. The release adds 2D bounding boxes for road irregularities, a standardized evaluation protocol, and baseline results from state-of-the-art depth models.

Core claim

CARD delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. The sensor suite comprises synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Multi-LiDAR fusion yields approximately 500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans about 110 km and 4.7 hours and supplies 2D bounding boxes targeting road-topography irregularities.

What carries the argument

Multi-LiDAR fusion of front and rear LiDAR scans combined with LiDAR-inertial odometry and provided calibration to generate quasi-dense depth ground truth maps.

Load-bearing premise

The multi-LiDAR fusion process combined with the provided calibration and LiDAR-inertial odometry produces accurate quasi-dense 3D ground truth without systematic errors or artifacts on irregular surfaces.

What would settle it

A side-by-side comparison of the fused depth values against independent high-precision measurements taken on the same speed bumps and potholes using a surveying instrument or calibrated stereo photogrammetry.

Figures

Figures reproduced from arXiv: 2605.05014 by Aditya Date, Frank Neuhaus, Gasser Elazab, Malte Splietker, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich, Tilman Ko{\ss}.

**Figure 1.** Figure 1: CARD example from Carmiano, Italy. Right: map with view at source ↗

**Figure 2.** Figure 2: Ground truth points per image. CARD has more depth view at source ↗

**Figure 4.** Figure 4: (a) Image-level distribution of road-topography labels: view at source ↗

**Figure 5.** Figure 5: Ground truth generation: motion-compensated LiDAR bursts are voxel-accumulated, dynamic-object filtered, then projected to view at source ↗

**Figure 6.** Figure 6: Qualitative ablation of voxel cleaning and adaptive vot view at source ↗

**Figure 7.** Figure 7: Colored point cloud comparison. We show CARD alongside densified ground truth from KITTI-DC [ view at source ↗

**Figure 8.** Figure 8: Height ground truth and predictions for a pothole: GT view at source ↗

**Figure 9.** Figure 9: Overview of the rig and wheel calibration procedure. view at source ↗

**Figure 10.** Figure 10: Qualitative Overview of the Ground-Truth Generation view at source ↗

**Figure 11.** Figure 11: Qualitative ablation of the ground-truth aggregation pipeline. (a)–(c) show the progressive cleanup using cropping and adaptive view at source ↗

**Figure 12.** Figure 12: Qualitative YOLOv8 detections for potholes (negative view at source ↗

**Figure 13.** Figure 13: Projected Densified Ground Truth. We show input view at source ↗

**Figure 14.** Figure 14: Diversity of the CARD Dataset. We display random image crops sampled across the dataset to illustrate variability in geometry view at source ↗

**Figure 15.** Figure 15: Qualitative Comparison of a positive example of a speed bump. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [22, 40, 52] (Rows 1–3). FoundationStereo [53] (Bottom Row) view at source ↗

**Figure 16.** Figure 16: Qualitative Comparison of a pothole example. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [22, 40, 52] (Rows 1–3). FoundationStereo [53] (Bottom Row) accurately recovers the geometry view at source ↗

**Figure 17.** Figure 17: Qualitative Comparison of a positive road irregularity. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [6, 22, 40] (Rows 1–3). FoundationStereo [53] (Bottom Row) view at source ↗

**Figure 18.** Figure 18: Qualitative Comparison of a positive road irregularity. Top: Input context and GT geometry. Bottom: Model predictions. view at source ↗

read the original abstract

Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. Our sensor suite includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Notably, our multi-LiDAR fusion yields ~500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans ~110 km and 4.7 hours across Germany and Italy. In addition, CARD provides 2D bounding boxes targeting road-topography irregularities, enabling accurate benchmarking for both geometry and perception tasks. Furthermore, we establish a standardized evaluation protocol for road surface irregularities on CARD and benchmark state-of-the-art depth estimation models to provide strong baselines. The CARD dataset is hosted on https://huggingface.co/CARD-Data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CARD gives denser multi-LiDAR ground truth on bumpy and off-road roads than KITTI-style sets, but the fusion accuracy on those irregular surfaces lacks independent error checks.

read the letter

The main thing to know about CARD is that it is a new multi-modal dataset with much denser 3D ground truth on challenging road topographies than what is available in KITTI or similar collections. The multi-LiDAR fusion gives roughly 500k valid depth pixels per frame across sequences that include speed bumps, potholes, and off-road driving. The paper does a good job laying out the sensor suite, synchronization, calibration, and the collection process over 110 km in Germany and Italy. Adding 2D bounding boxes for irregularities and running baselines for depth models is a plus, as is making the data available on Hugging Face with a standardized evaluation protocol. What is new is the combination of dense ground truth with continuous challenging topography, which addresses a real gap for training depth estimators that need to handle uneven surfaces. The soft spot is the missing validation for accuracy on those uneven surfaces. The abstract and description emphasize the density and the fusion method, but there are no reported quantitative error metrics like RMSE against an external reference specifically for the irregular segments. Without that, it is difficult to confirm that the ground truth is free of artifacts where the road surface changes quickly or where small odometry errors could matter. The stress-test concern holds here based on what is described. This paper is for people working on depth completion, 3D reconstruction, or perception in autonomous driving who want data from more realistic road conditions. It could be valuable for both training and benchmarking. It deserves peer review because the core contribution is the dataset, and a referee can evaluate the data collection details and any additional evidence on quality that might be in the full text.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces the CARD dataset, a multi-modal automotive collection for dense 3D reconstruction on challenging road topographies. It provides synchronized global-shutter stereo cameras, front/rear LiDARs, 6-DoF LiDAR-inertial odometry poses, per-wheel motion traces, full calibration, and 2D bounding boxes for road irregularities. The central claim is that multi-LiDAR fusion produces ~500K valid depth pixels per frame (6.5x denser than KITTI Depth Completion), yielding quasi-dense 3D ground truth across ~110 km / 4.7 hours of sequences rich in speed bumps, potholes, irregular surfaces, and off-road segments in Germany and Italy. The paper also defines a standardized evaluation protocol for road surface irregularities and reports baselines from state-of-the-art depth estimation models.

Significance. If the fused depth data prove geometrically accurate, CARD would fill a clear gap in public driving datasets by supplying dense, calibrated ground truth on non-flat and irregular terrains where existing collections (KITTI, etc.) are sparse and limited to paved roads. The combination of multi-modal sensors, long continuous sequences, and an explicit benchmarking protocol for topography irregularities would support more realistic evaluation of depth completion, surface reconstruction, and perception models for autonomous driving. The public release on Hugging Face and provision of calibration/poses are practical strengths.

major comments (1)

[Dataset construction / multi-LiDAR fusion] Dataset construction / multi-LiDAR fusion section: The headline claim of accurate quasi-dense 3D ground truth (~500K valid depth pixels per frame) rests on the fusion pipeline (using provided calibration and LiDAR-inertial odometry) being free of systematic bias or artifacts on irregular surfaces. No quantitative validation is reported—e.g., RMSE, MAE, or outlier rates against an external reference (total station, high-precision IMU, or stereo photogrammetry) on speed bumps, potholes, or off-road segments. Density alone does not establish geometric fidelity where surface normals change rapidly or small pose errors are amplified.

minor comments (1)

[Abstract] The abstract and introduction could explicitly state the total number of frames or sequences to allow readers to assess scale relative to the claimed 4.7 hours of data.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on the CARD dataset manuscript. We address the major comment point by point below and outline planned revisions.

read point-by-point responses

Referee: Dataset construction / multi-LiDAR fusion section: The headline claim of accurate quasi-dense 3D ground truth (~500K valid depth pixels per frame) rests on the fusion pipeline (using provided calibration and LiDAR-inertial odometry) being free of systematic bias or artifacts on irregular surfaces. No quantitative validation is reported—e.g., RMSE, MAE, or outlier rates against an external reference (total station, high-precision IMU, or stereo photogrammetry) on speed bumps, potholes, or off-road segments. Density alone does not establish geometric fidelity where surface normals change rapidly or small pose errors are amplified.

Authors: We agree that geometric accuracy must be demonstrated beyond density statistics, especially given the challenges of irregular road topographies. The manuscript presents the multi-LiDAR fusion results using the released calibration parameters and 6-DoF LiDAR-inertial odometry poses, which follow established practices in the field. We did not include direct external validation (e.g., total-station RMSE) because such high-precision ground-truth references were not collected during acquisition. In the revised manuscript we will add a dedicated 'Geometric Validation' subsection that reports: (i) intra-LiDAR consistency metrics (RMSE between front and rear LiDAR projections on overlapping regions), (ii) cross-modal agreement between fused depths and stereo disparity estimates on a curated subset of frames containing speed bumps and potholes, and (iii) a brief error-propagation analysis based on the reported pose uncertainties. We will also expand the limitations paragraph to discuss potential artifacts on high-curvature surfaces. These additions will be supported by new quantitative tables and qualitative visualizations. revision: partial

standing simulated objections not resolved

Direct RMSE/MAE or outlier statistics against external high-precision references (total station, survey-grade IMU, or independent photogrammetry) on the irregular segments, as these reference measurements were not acquired during data collection.

Circularity Check

0 steps flagged

No circularity: dataset release with no derivations or fitted predictions

full rationale

The paper is a data collection and release effort describing a sensor suite, synchronization, multi-LiDAR fusion pipeline, and benchmarks on the released CARD dataset. No equations, models, or predictions are presented that reduce to fitted parameters or self-citations by construction. The ~500K depth pixels claim is an empirical count from the fusion process, not a derived result. Central claims rest on external validation potential rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes empirical data collection and sensor fusion rather than new theoretical derivations, so the ledger contains only standard domain assumptions about sensor accuracy.

axioms (1)

domain assumption Sensor synchronization, calibration, and LiDAR-inertial odometry produce sufficiently accurate poses and alignments for dense 3D reconstruction
Invoked implicitly in the description of the sensor suite and multi-LiDAR fusion process.

pith-pipeline@v0.9.0 · 5572 in / 1221 out tokens · 56729 ms · 2026-05-08T18:07:40.275161+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 12 canonical work pages · 3 internal anchors

[1]

Ceres Solver.http://ceres-solver.org, 2023

Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver.http://ceres-solver.org, 2023. 6

2023
[2]

Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Adam Lilja, Carl Lindstr ¨om, Daria Motorniuk, Jun- sheng Fu, Jenny Widahl, and Christoffer Petersson. Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20178– 20188, 2023...

2023
[3]

Bridging the gap between real-world and synthetic images for test- ing autonomous driving systems

Mohammad Hossein Amini and Shiva Nejati. Bridging the gap between real-world and synthetic images for test- ing autonomous driving systems. InProceedings of the 39th IEEE/ACM International Conference on Automated Soft- ware Engineering, 2024. 2

2024
[4]

Rdd2022: A multi- national image dataset for automatic road damage detection

Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, and Yoshihide Sekimoto. Rdd2022: A multi- national image dataset for automatic road damage detection. Geoscience Data Journal, 11(4):846–862, 2024. 3

2024
[5]

Computer vision-based detection and classification of road obstacles: Systematic literature review.IEEE Access, 2025

Hamza Assemlali, Soukaina Bouhsissin, and Nawal Sael. Computer vision-based detection and classification of road obstacles: Systematic literature review.IEEE Access, 2025. 1

2025
[6]

Depth pro: Sharp monocular metric depth in less than a second

Alexey Bochkovskiy, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second. InThe Thirteenth International Conference on Learning Representations, 2025. 8, 9, 13

2025
[7]

Carla simulated data for rare road object detection

Tom Bu, Xinhe Zhang, Christoph Mertz, and John M Dolan. Carla simulated data for rare road object detection. In2021 IEEE International Intelligent Transportation Systems Con- ference (ITSC), 2021. 2

2021
[8]

nuscenes: A multi- modal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 1, 2, 3

2020
[9]

Argoverse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. 1, 2, 3

2019
[10]

Attribution 4.0 international (cc by 4.0) — legal code.https://creativecommons

Creative Commons. Attribution 4.0 international (cc by 4.0) — legal code.https://creativecommons. org/licenses/by/4.0/legalcode, 2013. Accessed Nov. 9, 2025. 2

2013
[11]

Attribution–noncommercial 4.0 in- ternational (cc by-nc 4.0) — legal code.https : / / creativecommons.org/licenses/by- nc/4.0/ legalcode, 2013

Creative Commons. Attribution–noncommercial 4.0 in- ternational (cc by-nc 4.0) — legal code.https : / / creativecommons.org/licenses/by- nc/4.0/ legalcode, 2013. Accessed Nov. 9, 2025. 2

2013
[12]

Carla: An open urban driv- ing simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 1, 2

2017
[13]

Yuchuan Du, Jing Chen, Cong Zhao, Chenglong Liu, Feix- iong Liao, and Ching-Yao Chan. Comfortable and energy- efficient speed control of autonomous vehicles on rough pavements using deep reinforcement learning.Transporta- tion Research Part C: Emerging Technologies, 2022. 1, 3

2022
[14]

MonoPP: Metric-scaled self-supervised monocular depth estimation by planar-parallax geometry in automotive applications

Gasser Elazab, Torben Gr ¨aber, Michael Unterreiner, and Olaf Hellwich. MonoPP: Metric-scaled self-supervised monocular depth estimation by planar-parallax geometry in automotive applications. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025. 7

2025
[15]

Gamma-from-mono: Road-relative, metric, self-supervised monocular geometry for vehicular applica- tions.arXiv preprint arXiv:2512.04303, 2025

Gasser Elazab, Maximilian Jansen, Michael Unterreiner, and Olaf Hellwich. Gamma-from-mono: Road-relative, metric, self-supervised monocular geometry for vehicular applica- tions.arXiv preprint arXiv:2512.04303, 2025. 7

work page arXiv 2025
[16]

Vision meets robotics: The kitti dataset.The in- ternational journal of robotics research, 32(11):1231–1237,

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.The in- ternational journal of robotics research, 32(11):1231–1237,
[17]

A2d2: Audi autonomous driving dataset

Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian M ¨uhlegg, Sebas- tian Dorn, et al. A2d2: Audi autonomous driving dataset. arXiv preprint arXiv:2004.06320, 2020. 1, 2, 3, 7

work page arXiv 2004
[18]

Digging into self-supervised monocular depth estimation

Cl ´ement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. Digging into self-supervised monocular depth estimation. InProceedings of the IEEE/CVF interna- tional conference on computer vision, 2019. 7

2019
[19]

3d packing for self-supervised monocular depth estimation

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven- tos, and Adrien Gaidon. 3d packing for self-supervised monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 3, 7

2020
[20]

XT32 product page, 2025

Hesai Technology. XT32 product page, 2025. 4

2025
[21]

Juqi Hu, Youmin Zhang, and Subhash Rakheja. Adaptive lane change trajectory planning scheme for autonomous ve- hicles under various road frictions and vehicle speeds.IEEE Transactions on Intelligent Vehicles, 8(2):1252–1265, 2022. 1, 3

2022
[22]

Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geomet- ric foundation model for zero-shot metric depth and surface normal estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 3, 8, 9, 11, 12, 13, 14

2024
[23]

labelimg

HumanSignal. labelimg. GitHub repository. Archived 2024- 02-29. Original work by Tzutalin. 5 [24]U3-3990SE-C-HQ (AB02987) Specification. IDS Imaging Development Systems GmbH, 2025. Rev. 1.2. 4

2024
[24]

Ultralytics YOLOv8, 2025

Glenn Jocher, Jing Qiu, and Ayush Chaurasia. Ultralytics YOLOv8, 2025. Version 8.3.105. 4, 7, 5

2025
[25]

Direct visibility of point sets

Sagi Katz, Ayellet Tal, and Ronen Basri. Direct visibility of point sets. InACM SIGGRAPH 2007 Papers, page 24–es, New York, NY , USA, 2007. Association for Computing Ma- chinery. 6, 4

2007
[26]

V oxelized gicp for fast and accurate 3d point cloud registration

Kenji Koide, Masashi Yokozuka, Shuji Oishi, and Atsuhiko Banno. V oxelized gicp for fast and accurate 3d point cloud registration. In2021 IEEE international conference on robotics and automation (ICRA). IEEE, 2021. 5

2021
[27]

Gvdepth: Zero-shot monocular depth estimation for ground vehicles based on probabilistic cue fusion.arXiv preprint arXiv:2412.06080, 2024

Karlo Koledic, Luka Petrovic, Ivan Markovic, and Ivan Petrovic. Gvdepth: Zero-shot monocular depth estimation for ground vehicles based on probabilistic cue fusion.arXiv preprint arXiv:2412.06080, 2024. 3

work page arXiv 2024
[28]

Dis- tilling monocular foundation model for fine-grained depth completion

Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling monocular foundation model for fine-grained depth completion. InProceedings of the Computer Vision and Pat- tern Recognition Conference, 2025. 8

2025
[29]

A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent Vehicles, 2024

Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, and Alois C Knoll. A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent Vehicles, 2024. 2

2024
[30]

One million scenes for aut onomous driving: Once dataset

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021. 1, 2, 3

work page arXiv 2021
[31]

Effect of pavement surface conditions on road traffic accident – a re- view

Rahma Mkwata and Elizabeth Eu Mee Chong. Effect of pavement surface conditions on road traffic accident – a re- view. InE3S Web of Conferences, 2022. 1

2022
[32]

Deep learning for safe autonomous driving: Current challenges and future direc- tions.IEEE Transactions on Intelligent Transportation Sys- tems, 22(7):4316–4336, 2020

Khan Muhammad, Amin Ullah, Jaime Lloret, Javier Del Ser, and Victor Hugo C De Albuquerque. Deep learning for safe autonomous driving: Current challenges and future direc- tions.IEEE Transactions on Intelligent Transportation Sys- tems, 22(7):4316–4336, 2020. 1

2020
[33]

M. A. F. Musa, Sitti Asmah Hassan, and Nordiana Mashros. The impact of roadway conditions towards accident severity on federal roads in malaysia.PLoS ONE, 15, 2020. 1

2020
[34]

Mc2slam: Real-time inertial lidar odometry using two-scan motion compensation

Frank Neuhaus, Tilman Koß, Robert Kohnen, and Dietrich Paulus. Mc2slam: Real-time inertial lidar odometry using two-scan motion compensation. InGerman Conference on Pattern Recognition, pages 60–72. Springer, 2018. 4, 1

2018
[35]

Why do drivers and automation disengage the automation? results from a study among tesla users.arXiv preprint arXiv:2309.10440, 2023

Sina Nordhoff and Joost De Winter. Why do drivers and automation disengage the automation? results from a study among tesla users.arXiv preprint arXiv:2309.10440, 2023. 1

work page arXiv 2023
[36]

The h3d dataset for full-surround 3d multi-object de- tection and tracking in crowded urban scenes

Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The h3d dataset for full-surround 3d multi-object de- tection and tracking in crowded urban scenes. In2019 In- ternational Conference on Robotics and Automation (ICRA), pages 9552–9557. IEEE, 2019. 2

2019
[37]

Speed bump and pothole detection using deep neural network with images captured through zed camera.Applied Sciences, 13(14): 8349, 2023

Jos ´e-Eleazar Peralta-L ´opez, Joel-Artemio Morales-Viscaya, David L ´azaro-Mata, Marcos-Jes ´us Villase ˜nor-Aguilar, Juan Prado-Olivarez, Francisco-Javier P ´erez-Pinal, Jos ´e- Alfredo Padilla-Medina, Juan-Jos ´e Mart ´ınez-Nolasco, and Alejandro-Israel Barranco-Guti ´errez. Speed bump and pothole detection using deep neural network with images captur...

2023
[38]

Unidepth: Universal monocular metric depth estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. Unidepth: Universal monocular metric depth estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10106–10116, 2024. 3

2024
[39]

Unidepthv2: Universal monocular metric depth estimation made simpler

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. Unidepthv2: Universal monocular metric depth estimation made simpler.arXiv preprint arXiv:2502.20110, 2025. 8, 9, 11, 12, 13, 14

work page arXiv 2025
[40]

Vi- sion transformers for dense prediction

Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 3

2021
[41]

Make3d: Learning 3d scene structure from a single still image.IEEE transactions on pattern analysis and machine intelligence, 31(5):824–840, 2008

Ashutosh Saxena, Min Sun, and Andrew Y Ng. Make3d: Learning 3d scene structure from a single still image.IEEE transactions on pattern analysis and machine intelligence, 31(5):824–840, 2008. 7

2008
[42]

Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles. InField and service robotics: Results of the 11th international conference, 2017. 1, 2

2017
[43]

Tartandrive 2.0: More modalities and better infrastructure to further self- supervised learning research in off-road driving tasks

Matthew Sivaprakasam, Parv Maheshwari, Mateo Guaman Castro, Samuel Triest, Micah Nye, Steve Willits, Andrew Saba, Wenshan Wang, and Sebastian Scherer. Tartandrive 2.0: More modalities and better infrastructure to further self- supervised learning research in off-road driving tasks. In 2024 IEEE International Conference on Robotics and Au- tomation (ICRA),...

2024
[44]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 1, 2, 3, 7

2020
[45]

Bilateral propagation network for depth completion

Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bilateral propagation network for depth completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 8

2024
[46]

Shrec 2022: Pothole and crack detection in the road pave- ment using images and rgb-d data.Computers & Graphics, 107:161–171, 2022

Elia Moscoso Thompson, Andrea Ranieri, Silvia Biasotti, Miguel Chicchon, Ivan Sipiran, Minh-Khoi Pham, Thang- Long Nguyen-Ho, Hai-Dang Nguyen, and Minh-Triet Tran. Shrec 2022: Pothole and crack detection in the road pave- ment using images and rgb-d data.Computers & Graphics, 107:161–171, 2022. 3

2022
[47]

Sparsity invariant cnns

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In2017 international conference on 3D Vision (3DV), 2017. 2, 3, 4, 5, 7

2017
[48]

The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10, 2017

Paul V oigt and Axel V on dem Bussche. The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10, 2017. 4

2017
[49]

The apolloscape open dataset for autonomous driving and its application.IEEE transactions on pattern analysis and machine intelligence, 1, 2019

Peng Wang, Xinyu Huang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The apolloscape open dataset for autonomous driving and its application.IEEE transactions on pattern analysis and machine intelligence, 1, 2019. 2

2019
[50]

Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. InProceedings of the Computer Vision and Pattern Recognition Conference,
[51]

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details.arXiv preprint arXiv:2507.02546,

work page internal anchor Pith review arXiv
[52]

Foundationstereo: Zero- shot stereo matching

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025. 7, 8, 9, 11, 12, 13, 14

2025
[53]

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 1, 2, 3

work page internal anchor Pith review arXiv 2023
[54]

Global status report on road safety 2023

World Health Organization. Global status report on road safety 2023. Technical report, World Health Organization, Geneva, 2023. ISBN: 9789240087200. 1

2023
[55]

Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing

Magnus Wrenninge and Jonas Unger. Synscapes: A photore- alistic synthetic dataset for street scene parsing. arxiv 2018. arXiv preprint arXiv:1810.08705, 2018. 1, 2

work page Pith review arXiv 2018
[56]

Pandaset: Advanced sensor suite dataset for autonomous driving

Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC). IEEE, 2021. 1, 2, 3

2021
[57]

Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios

Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, and Bolei Zhou. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, 2019. 2, 3, 4, 5

2019
[58]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2024. 3, 6, 4

2024
[59]

Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37, 2024. 3, 8, 9, 14

2024
[60]

Analysis of the impact of different road conditions on accident sever- ity at highway-rail grade crossings based on explainable ma- chine learning.Symmetry, 17(1):147, 2025

Zhen Yang, Chen Zhang, Gen Li, and Hongyi Xu. Analysis of the impact of different road conditions on accident sever- ity at highway-rail grade crossings based on explainable ma- chine learning.Symmetry, 17(1):147, 2025. 1

2025
[61]

Metric3d: Towards zero-shot metric 3d prediction from a single image

Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 3

2023
[62]

Rsrd: A road surface reconstruction dataset and benchmark for safe and comfort- able autonomous driving.arXiv preprint arXiv:2310.02262,

Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, and Yintao Wei. Rsrd: A road surface reconstruction dataset and benchmark for safe and comfort- able autonomous driving.arXiv preprint arXiv:2310.02262,

work page arXiv
[63]

Open3D: A Modern Library for 3D Data Processing

Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Open3d: A modern library for 3d data processing.arXiv preprint arXiv:1801.09847, 2018. 6, 4

work page internal anchor Pith review arXiv 2018
[64]

Ralad: Bridging the real-to-sim domain gap in au- tonomous driving with retrieval-augmented learning.arXiv preprint arXiv:2501.12296, 2025

Jiacheng Zuo, Haibo Hu, Zikang Zhou, Yufei Cui, Ziquan Liu, Jianping Wang, Nan Guan, Jin Wang, and Chun Jason Xue. Ralad: Bridging the real-to-sim domain gap in au- tonomous driving with retrieval-augmented learning.arXiv preprint arXiv:2501.12296, 2025. 1

work page arXiv 2025