pith. machine review for the scientific record. sign in

arxiv: 2605.05014 · v2 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords automotive datasetdense depth estimationLiDAR fusionroad topographyautonomous driving3D reconstructionchallenging environmentsmulti-modal sensors
0
0 comments X

The pith

CARD dataset supplies quasi-dense 3D ground truth for irregular road surfaces through multi-LiDAR fusion that produces about 500,000 valid depth pixels per frame.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CARD, a multi-modal automotive dataset recorded on challenging road topographies that include speed bumps, potholes, irregular surfaces, and off-road segments across Germany and Italy. Prior driving datasets concentrate on flat paved roads and supply only sparse LiDAR points as ground truth, which limits assessment of fine geometric details in depth estimation and completion. CARD includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration data. Its multi-LiDAR fusion process generates roughly 500,000 valid depth pixels per frame, stated as 6.5 times denser than KITTI Depth Completion and 10 times denser than other public driving datasets on average. The release adds 2D bounding boxes for road irregularities, a standardized evaluation protocol, and baseline results from state-of-the-art depth models.

Core claim

CARD delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. The sensor suite comprises synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Multi-LiDAR fusion yields approximately 500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans about 110 km and 4.7 hours and supplies 2D bounding boxes targeting road-topography irregularities.

What carries the argument

Multi-LiDAR fusion of front and rear LiDAR scans combined with LiDAR-inertial odometry and provided calibration to generate quasi-dense depth ground truth maps.

Load-bearing premise

The multi-LiDAR fusion process combined with the provided calibration and LiDAR-inertial odometry produces accurate quasi-dense 3D ground truth without systematic errors or artifacts on irregular surfaces.

What would settle it

A side-by-side comparison of the fused depth values against independent high-precision measurements taken on the same speed bumps and potholes using a surveying instrument or calibrated stereo photogrammetry.

Figures

Figures reproduced from arXiv: 2605.05014 by Aditya Date, Frank Neuhaus, Gasser Elazab, Malte Splietker, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich, Tilman Ko{\ss}.

Figure 1
Figure 1. Figure 1: CARD example from Carmiano, Italy. Right: map with view at source ↗
Figure 2
Figure 2. Figure 2: Ground truth points per image. CARD has more depth view at source ↗
Figure 4
Figure 4. Figure 4: (a) Image-level distribution of road-topography labels: view at source ↗
Figure 5
Figure 5. Figure 5: Ground truth generation: motion-compensated LiDAR bursts are voxel-accumulated, dynamic-object filtered, then projected to view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative ablation of voxel cleaning and adaptive vot view at source ↗
Figure 7
Figure 7. Figure 7: Colored point cloud comparison. We show CARD alongside densified ground truth from KITTI-DC [ view at source ↗
Figure 8
Figure 8. Figure 8: Height ground truth and predictions for a pothole: GT view at source ↗
Figure 9
Figure 9. Figure 9: Overview of the rig and wheel calibration procedure. view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative Overview of the Ground-Truth Generation view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative ablation of the ground-truth aggregation pipeline. (a)–(c) show the progressive cleanup using cropping and adaptive view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative YOLOv8 detections for potholes (negative view at source ↗
Figure 13
Figure 13. Figure 13: Projected Densified Ground Truth. We show input view at source ↗
Figure 14
Figure 14. Figure 14: Diversity of the CARD Dataset. We display random image crops sampled across the dataset to illustrate variability in geometry view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative Comparison of a positive example of a speed bump. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [22, 40, 52] (Rows 1–3). FoundationStereo [53] (Bottom Row) view at source ↗
Figure 16
Figure 16. Figure 16: Qualitative Comparison of a pothole example. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [22, 40, 52] (Rows 1–3). FoundationStereo [53] (Bottom Row) accurately recovers the geometry view at source ↗
Figure 17
Figure 17. Figure 17: Qualitative Comparison of a positive road irregularity. Top: Input context and GT geometry. Bottom: Model predictions. Monocular baselines [6, 22, 40] (Rows 1–3). FoundationStereo [53] (Bottom Row) view at source ↗
Figure 18
Figure 18. Figure 18: Qualitative Comparison of a positive road irregularity. Top: Input context and GT geometry. Bottom: Model predictions. view at source ↗
read the original abstract

Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. Our sensor suite includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Notably, our multi-LiDAR fusion yields ~500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans ~110 km and 4.7 hours across Germany and Italy. In addition, CARD provides 2D bounding boxes targeting road-topography irregularities, enabling accurate benchmarking for both geometry and perception tasks. Furthermore, we establish a standardized evaluation protocol for road surface irregularities on CARD and benchmark state-of-the-art depth estimation models to provide strong baselines. The CARD dataset is hosted on https://huggingface.co/CARD-Data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces the CARD dataset, a multi-modal automotive collection for dense 3D reconstruction on challenging road topographies. It provides synchronized global-shutter stereo cameras, front/rear LiDARs, 6-DoF LiDAR-inertial odometry poses, per-wheel motion traces, full calibration, and 2D bounding boxes for road irregularities. The central claim is that multi-LiDAR fusion produces ~500K valid depth pixels per frame (6.5x denser than KITTI Depth Completion), yielding quasi-dense 3D ground truth across ~110 km / 4.7 hours of sequences rich in speed bumps, potholes, irregular surfaces, and off-road segments in Germany and Italy. The paper also defines a standardized evaluation protocol for road surface irregularities and reports baselines from state-of-the-art depth estimation models.

Significance. If the fused depth data prove geometrically accurate, CARD would fill a clear gap in public driving datasets by supplying dense, calibrated ground truth on non-flat and irregular terrains where existing collections (KITTI, etc.) are sparse and limited to paved roads. The combination of multi-modal sensors, long continuous sequences, and an explicit benchmarking protocol for topography irregularities would support more realistic evaluation of depth completion, surface reconstruction, and perception models for autonomous driving. The public release on Hugging Face and provision of calibration/poses are practical strengths.

major comments (1)
  1. [Dataset construction / multi-LiDAR fusion] Dataset construction / multi-LiDAR fusion section: The headline claim of accurate quasi-dense 3D ground truth (~500K valid depth pixels per frame) rests on the fusion pipeline (using provided calibration and LiDAR-inertial odometry) being free of systematic bias or artifacts on irregular surfaces. No quantitative validation is reported—e.g., RMSE, MAE, or outlier rates against an external reference (total station, high-precision IMU, or stereo photogrammetry) on speed bumps, potholes, or off-road segments. Density alone does not establish geometric fidelity where surface normals change rapidly or small pose errors are amplified.
minor comments (1)
  1. [Abstract] The abstract and introduction could explicitly state the total number of frames or sequences to allow readers to assess scale relative to the claimed 4.7 hours of data.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on the CARD dataset manuscript. We address the major comment point by point below and outline planned revisions.

read point-by-point responses
  1. Referee: Dataset construction / multi-LiDAR fusion section: The headline claim of accurate quasi-dense 3D ground truth (~500K valid depth pixels per frame) rests on the fusion pipeline (using provided calibration and LiDAR-inertial odometry) being free of systematic bias or artifacts on irregular surfaces. No quantitative validation is reported—e.g., RMSE, MAE, or outlier rates against an external reference (total station, high-precision IMU, or stereo photogrammetry) on speed bumps, potholes, or off-road segments. Density alone does not establish geometric fidelity where surface normals change rapidly or small pose errors are amplified.

    Authors: We agree that geometric accuracy must be demonstrated beyond density statistics, especially given the challenges of irregular road topographies. The manuscript presents the multi-LiDAR fusion results using the released calibration parameters and 6-DoF LiDAR-inertial odometry poses, which follow established practices in the field. We did not include direct external validation (e.g., total-station RMSE) because such high-precision ground-truth references were not collected during acquisition. In the revised manuscript we will add a dedicated 'Geometric Validation' subsection that reports: (i) intra-LiDAR consistency metrics (RMSE between front and rear LiDAR projections on overlapping regions), (ii) cross-modal agreement between fused depths and stereo disparity estimates on a curated subset of frames containing speed bumps and potholes, and (iii) a brief error-propagation analysis based on the reported pose uncertainties. We will also expand the limitations paragraph to discuss potential artifacts on high-curvature surfaces. These additions will be supported by new quantitative tables and qualitative visualizations. revision: partial

standing simulated objections not resolved
  • Direct RMSE/MAE or outlier statistics against external high-precision references (total station, survey-grade IMU, or independent photogrammetry) on the irregular segments, as these reference measurements were not acquired during data collection.

Circularity Check

0 steps flagged

No circularity: dataset release with no derivations or fitted predictions

full rationale

The paper is a data collection and release effort describing a sensor suite, synchronization, multi-LiDAR fusion pipeline, and benchmarks on the released CARD dataset. No equations, models, or predictions are presented that reduce to fitted parameters or self-citations by construction. The ~500K depth pixels claim is an empirical count from the fusion process, not a derived result. Central claims rest on external validation potential rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes empirical data collection and sensor fusion rather than new theoretical derivations, so the ledger contains only standard domain assumptions about sensor accuracy.

axioms (1)
  • domain assumption Sensor synchronization, calibration, and LiDAR-inertial odometry produce sufficiently accurate poses and alignments for dense 3D reconstruction
    Invoked implicitly in the description of the sensor suite and multi-LiDAR fusion process.

pith-pipeline@v0.9.0 · 5572 in / 1221 out tokens · 56729 ms · 2026-05-08T18:07:40.275161+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 12 canonical work pages · 3 internal anchors

  1. [1]

    Ceres Solver.http://ceres-solver.org, 2023

    Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver.http://ceres-solver.org, 2023. 6

  2. [2]

    Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

    Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Adam Lilja, Carl Lindstr ¨om, Daria Motorniuk, Jun- sheng Fu, Jenny Widahl, and Christoffer Petersson. Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20178– 20188, 2023...

  3. [3]

    Bridging the gap between real-world and synthetic images for test- ing autonomous driving systems

    Mohammad Hossein Amini and Shiva Nejati. Bridging the gap between real-world and synthetic images for test- ing autonomous driving systems. InProceedings of the 39th IEEE/ACM International Conference on Automated Soft- ware Engineering, 2024. 2

  4. [4]

    Rdd2022: A multi- national image dataset for automatic road damage detection

    Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, and Yoshihide Sekimoto. Rdd2022: A multi- national image dataset for automatic road damage detection. Geoscience Data Journal, 11(4):846–862, 2024. 3

  5. [5]

    Computer vision-based detection and classification of road obstacles: Systematic literature review.IEEE Access, 2025

    Hamza Assemlali, Soukaina Bouhsissin, and Nawal Sael. Computer vision-based detection and classification of road obstacles: Systematic literature review.IEEE Access, 2025. 1

  6. [6]

    Depth pro: Sharp monocular metric depth in less than a second

    Alexey Bochkovskiy, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second. InThe Thirteenth International Conference on Learning Representations, 2025. 8, 9, 13

  7. [7]

    Carla simulated data for rare road object detection

    Tom Bu, Xinhe Zhang, Christoph Mertz, and John M Dolan. Carla simulated data for rare road object detection. In2021 IEEE International Intelligent Transportation Systems Con- ference (ITSC), 2021. 2

  8. [8]

    nuscenes: A multi- modal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 1, 2, 3

  9. [9]

    Argoverse: 3d tracking and forecasting with rich maps

    Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. 1, 2, 3

  10. [10]

    Attribution 4.0 international (cc by 4.0) — legal code.https://creativecommons

    Creative Commons. Attribution 4.0 international (cc by 4.0) — legal code.https://creativecommons. org/licenses/by/4.0/legalcode, 2013. Accessed Nov. 9, 2025. 2

  11. [11]

    Attribution–noncommercial 4.0 in- ternational (cc by-nc 4.0) — legal code.https : / / creativecommons.org/licenses/by- nc/4.0/ legalcode, 2013

    Creative Commons. Attribution–noncommercial 4.0 in- ternational (cc by-nc 4.0) — legal code.https : / / creativecommons.org/licenses/by- nc/4.0/ legalcode, 2013. Accessed Nov. 9, 2025. 2

  12. [12]

    Carla: An open urban driv- ing simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 1, 2

  13. [13]

    Yuchuan Du, Jing Chen, Cong Zhao, Chenglong Liu, Feix- iong Liao, and Ching-Yao Chan. Comfortable and energy- efficient speed control of autonomous vehicles on rough pavements using deep reinforcement learning.Transporta- tion Research Part C: Emerging Technologies, 2022. 1, 3

  14. [14]

    MonoPP: Metric-scaled self-supervised monocular depth estimation by planar-parallax geometry in automotive applications

    Gasser Elazab, Torben Gr ¨aber, Michael Unterreiner, and Olaf Hellwich. MonoPP: Metric-scaled self-supervised monocular depth estimation by planar-parallax geometry in automotive applications. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025. 7

  15. [15]

    Gamma-from-mono: Road-relative, metric, self-supervised monocular geometry for vehicular applica- tions.arXiv preprint arXiv:2512.04303, 2025

    Gasser Elazab, Maximilian Jansen, Michael Unterreiner, and Olaf Hellwich. Gamma-from-mono: Road-relative, metric, self-supervised monocular geometry for vehicular applica- tions.arXiv preprint arXiv:2512.04303, 2025. 7

  16. [16]

    Vision meets robotics: The kitti dataset.The in- ternational journal of robotics research, 32(11):1231–1237,

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.The in- ternational journal of robotics research, 32(11):1231–1237,

  17. [17]

    A2d2: Audi autonomous driving dataset

    Jakob Geyer, Yohannes Kassahun, Mentar Mahmudi, Xavier Ricou, Rupesh Durgesh, Andrew S Chung, Lorenz Hauswald, Viet Hoang Pham, Maximilian M ¨uhlegg, Sebas- tian Dorn, et al. A2d2: Audi autonomous driving dataset. arXiv preprint arXiv:2004.06320, 2020. 1, 2, 3, 7

  18. [18]

    Digging into self-supervised monocular depth estimation

    Cl ´ement Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J Brostow. Digging into self-supervised monocular depth estimation. InProceedings of the IEEE/CVF interna- tional conference on computer vision, 2019. 7

  19. [19]

    3d packing for self-supervised monocular depth estimation

    Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven- tos, and Adrien Gaidon. 3d packing for self-supervised monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 3, 7

  20. [20]

    XT32 product page, 2025

    Hesai Technology. XT32 product page, 2025. 4

  21. [21]

    Juqi Hu, Youmin Zhang, and Subhash Rakheja. Adaptive lane change trajectory planning scheme for autonomous ve- hicles under various road frictions and vehicle speeds.IEEE Transactions on Intelligent Vehicles, 8(2):1252–1265, 2022. 1, 3

  22. [22]

    Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geomet- ric foundation model for zero-shot metric depth and surface normal estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 3, 8, 9, 11, 12, 13, 14

  23. [23]

    labelimg

    HumanSignal. labelimg. GitHub repository. Archived 2024- 02-29. Original work by Tzutalin. 5 [24]U3-3990SE-C-HQ (AB02987) Specification. IDS Imaging Development Systems GmbH, 2025. Rev. 1.2. 4

  24. [24]

    Ultralytics YOLOv8, 2025

    Glenn Jocher, Jing Qiu, and Ayush Chaurasia. Ultralytics YOLOv8, 2025. Version 8.3.105. 4, 7, 5

  25. [25]

    Direct visibility of point sets

    Sagi Katz, Ayellet Tal, and Ronen Basri. Direct visibility of point sets. InACM SIGGRAPH 2007 Papers, page 24–es, New York, NY , USA, 2007. Association for Computing Ma- chinery. 6, 4

  26. [26]

    V oxelized gicp for fast and accurate 3d point cloud registration

    Kenji Koide, Masashi Yokozuka, Shuji Oishi, and Atsuhiko Banno. V oxelized gicp for fast and accurate 3d point cloud registration. In2021 IEEE international conference on robotics and automation (ICRA). IEEE, 2021. 5

  27. [27]

    Gvdepth: Zero-shot monocular depth estimation for ground vehicles based on probabilistic cue fusion.arXiv preprint arXiv:2412.06080, 2024

    Karlo Koledic, Luka Petrovic, Ivan Markovic, and Ivan Petrovic. Gvdepth: Zero-shot monocular depth estimation for ground vehicles based on probabilistic cue fusion.arXiv preprint arXiv:2412.06080, 2024. 3

  28. [28]

    Dis- tilling monocular foundation model for fine-grained depth completion

    Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling monocular foundation model for fine-grained depth completion. InProceedings of the Computer Vision and Pat- tern Recognition Conference, 2025. 8

  29. [29]

    A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent Vehicles, 2024

    Mingyu Liu, Ekim Yurtsever, Jonathan Fossaert, Xingcheng Zhou, Walter Zimmer, Yuning Cui, Bare Luka Zagar, and Alois C Knoll. A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook.IEEE Transactions on Intelligent Vehicles, 2024. 2

  30. [30]

    One million scenes for aut onomous driving: Once dataset

    Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021. 1, 2, 3

  31. [31]

    Effect of pavement surface conditions on road traffic accident – a re- view

    Rahma Mkwata and Elizabeth Eu Mee Chong. Effect of pavement surface conditions on road traffic accident – a re- view. InE3S Web of Conferences, 2022. 1

  32. [32]

    Deep learning for safe autonomous driving: Current challenges and future direc- tions.IEEE Transactions on Intelligent Transportation Sys- tems, 22(7):4316–4336, 2020

    Khan Muhammad, Amin Ullah, Jaime Lloret, Javier Del Ser, and Victor Hugo C De Albuquerque. Deep learning for safe autonomous driving: Current challenges and future direc- tions.IEEE Transactions on Intelligent Transportation Sys- tems, 22(7):4316–4336, 2020. 1

  33. [33]

    M. A. F. Musa, Sitti Asmah Hassan, and Nordiana Mashros. The impact of roadway conditions towards accident severity on federal roads in malaysia.PLoS ONE, 15, 2020. 1

  34. [34]

    Mc2slam: Real-time inertial lidar odometry using two-scan motion compensation

    Frank Neuhaus, Tilman Koß, Robert Kohnen, and Dietrich Paulus. Mc2slam: Real-time inertial lidar odometry using two-scan motion compensation. InGerman Conference on Pattern Recognition, pages 60–72. Springer, 2018. 4, 1

  35. [35]

    Why do drivers and automation disengage the automation? results from a study among tesla users.arXiv preprint arXiv:2309.10440, 2023

    Sina Nordhoff and Joost De Winter. Why do drivers and automation disengage the automation? results from a study among tesla users.arXiv preprint arXiv:2309.10440, 2023. 1

  36. [36]

    The h3d dataset for full-surround 3d multi-object de- tection and tracking in crowded urban scenes

    Abhishek Patil, Srikanth Malla, Haiming Gang, and Yi-Ting Chen. The h3d dataset for full-surround 3d multi-object de- tection and tracking in crowded urban scenes. In2019 In- ternational Conference on Robotics and Automation (ICRA), pages 9552–9557. IEEE, 2019. 2

  37. [37]

    Speed bump and pothole detection using deep neural network with images captured through zed camera.Applied Sciences, 13(14): 8349, 2023

    Jos ´e-Eleazar Peralta-L ´opez, Joel-Artemio Morales-Viscaya, David L ´azaro-Mata, Marcos-Jes ´us Villase ˜nor-Aguilar, Juan Prado-Olivarez, Francisco-Javier P ´erez-Pinal, Jos ´e- Alfredo Padilla-Medina, Juan-Jos ´e Mart ´ınez-Nolasco, and Alejandro-Israel Barranco-Guti ´errez. Speed bump and pothole detection using deep neural network with images captur...

  38. [38]

    Unidepth: Universal monocular metric depth estimation

    Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. Unidepth: Universal monocular metric depth estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10106–10116, 2024. 3

  39. [39]

    Unidepthv2: Universal monocular metric depth estimation made simpler

    Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. Unidepthv2: Universal monocular metric depth estimation made simpler.arXiv preprint arXiv:2502.20110, 2025. 8, 9, 11, 12, 13, 14

  40. [40]

    Vi- sion transformers for dense prediction

    Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021. 3

  41. [41]

    Make3d: Learning 3d scene structure from a single still image.IEEE transactions on pattern analysis and machine intelligence, 31(5):824–840, 2008

    Ashutosh Saxena, Min Sun, and Andrew Y Ng. Make3d: Learning 3d scene structure from a single still image.IEEE transactions on pattern analysis and machine intelligence, 31(5):824–840, 2008. 7

  42. [42]

    Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles

    Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles. InField and service robotics: Results of the 11th international conference, 2017. 1, 2

  43. [43]

    Tartandrive 2.0: More modalities and better infrastructure to further self- supervised learning research in off-road driving tasks

    Matthew Sivaprakasam, Parv Maheshwari, Mateo Guaman Castro, Samuel Triest, Micah Nye, Steve Willits, Andrew Saba, Wenshan Wang, and Sebastian Scherer. Tartandrive 2.0: More modalities and better infrastructure to further self- supervised learning research in off-road driving tasks. In 2024 IEEE International Conference on Robotics and Au- tomation (ICRA),...

  44. [44]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. 1, 2, 3, 7

  45. [45]

    Bilateral propagation network for depth completion

    Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bilateral propagation network for depth completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 8

  46. [46]

    Shrec 2022: Pothole and crack detection in the road pave- ment using images and rgb-d data.Computers & Graphics, 107:161–171, 2022

    Elia Moscoso Thompson, Andrea Ranieri, Silvia Biasotti, Miguel Chicchon, Ivan Sipiran, Minh-Khoi Pham, Thang- Long Nguyen-Ho, Hai-Dang Nguyen, and Minh-Triet Tran. Shrec 2022: Pothole and crack detection in the road pave- ment using images and rgb-d data.Computers & Graphics, 107:161–171, 2022. 3

  47. [47]

    Sparsity invariant cnns

    Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In2017 international conference on 3D Vision (3DV), 2017. 2, 3, 4, 5, 7

  48. [48]

    The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10, 2017

    Paul V oigt and Axel V on dem Bussche. The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10, 2017. 4

  49. [49]

    The apolloscape open dataset for autonomous driving and its application.IEEE transactions on pattern analysis and machine intelligence, 1, 2019

    Peng Wang, Xinyu Huang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The apolloscape open dataset for autonomous driving and its application.IEEE transactions on pattern analysis and machine intelligence, 1, 2019. 2

  50. [50]

    Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

    Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. InProceedings of the Computer Vision and Pattern Recognition Conference,

  51. [51]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details.arXiv preprint arXiv:2507.02546,

  52. [52]

    Foundationstereo: Zero- shot stereo matching

    Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025. 7, 8, 9, 11, 12, 13, 14

  53. [53]

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 1, 2, 3

  54. [54]

    Global status report on road safety 2023

    World Health Organization. Global status report on road safety 2023. Technical report, World Health Organization, Geneva, 2023. ISBN: 9789240087200. 1

  55. [55]

    Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing

    Magnus Wrenninge and Jonas Unger. Synscapes: A photore- alistic synthetic dataset for street scene parsing. arxiv 2018. arXiv preprint arXiv:1810.08705, 2018. 1, 2

  56. [56]

    Pandaset: Advanced sensor suite dataset for autonomous driving

    Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC). IEEE, 2021. 1, 2, 3

  57. [57]

    Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios

    Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, and Bolei Zhou. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, 2019. 2, 3, 4, 5

  58. [58]

    Depth anything: Unleashing the power of large-scale unlabeled data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2024. 3, 6, 4

  59. [59]

    Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37, 2024. 3, 8, 9, 14

  60. [60]

    Analysis of the impact of different road conditions on accident sever- ity at highway-rail grade crossings based on explainable ma- chine learning.Symmetry, 17(1):147, 2025

    Zhen Yang, Chen Zhang, Gen Li, and Hongyi Xu. Analysis of the impact of different road conditions on accident sever- ity at highway-rail grade crossings based on explainable ma- chine learning.Symmetry, 17(1):147, 2025. 1

  61. [61]

    Metric3d: Towards zero-shot metric 3d prediction from a single image

    Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023. 3

  62. [62]

    Rsrd: A road surface reconstruction dataset and benchmark for safe and comfort- able autonomous driving.arXiv preprint arXiv:2310.02262,

    Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, and Yintao Wei. Rsrd: A road surface reconstruction dataset and benchmark for safe and comfort- able autonomous driving.arXiv preprint arXiv:2310.02262,

  63. [63]

    Open3D: A Modern Library for 3D Data Processing

    Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Open3d: A modern library for 3d data processing.arXiv preprint arXiv:1801.09847, 2018. 6, 4

  64. [64]

    Ralad: Bridging the real-to-sim domain gap in au- tonomous driving with retrieval-augmented learning.arXiv preprint arXiv:2501.12296, 2025

    Jiacheng Zuo, Haibo Hu, Zikang Zhou, Yufei Cui, Ziquan Liu, Jianping Wang, Nan Guan, Jin Wang, and Chun Jason Xue. Ralad: Bridging the real-to-sim domain gap in au- tonomous driving with retrieval-augmented learning.arXiv preprint arXiv:2501.12296, 2025. 1