All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles

Abolfazl Razi; Hazim Alzorgan; Mahlagha Fazeli; Niloufar Mehrabi; Sayed Pedram Haeri Boroujeni

arxiv: 2510.26641 · v4 · submitted 2025-10-30 · 💻 cs.CV

All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles

Sayed Pedram Haeri Boroujeni , Niloufar Mehrabi , Hazim Alzorgan , Mahlagha Fazeli , Abolfazl Razi This is my paper

Pith reviewed 2026-05-18 02:55 UTC · model grok-4.3

classification 💻 cs.CV

keywords autonomous vehiclesobject detectionsensor fusionvision-language modelslarge language modelsmultimodal perceptioncooperative perceptiontransformer models

0 comments

The pith

Synthesizing sensor fusion strategies, categorized datasets, and multimodal LLM and VLM approaches delivers a roadmap for object detection in autonomous vehicles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the fundamental spectrum of AV sensors including camera, ultrasonic, LiDAR, and radar along with their fusion strategies and limitations in dynamic driving environments. It introduces a structured categorization of datasets into ego-vehicle, infrastructure-based, and cooperative types such as V2V, V2I, V2X, and I2I to support cross-analysis of data structures. The survey then examines detection methodologies ranging from 2D and 3D pipelines to hybrid fusion and transformer-driven systems powered by Vision Transformers, SLMs, VLMs, and LLMs, with emphasis on emerging generative AI paradigms. This synthesis bridges fragmented knowledge across multimodal perception and contextual reasoning to map current capabilities, open challenges, and future opportunities. A sympathetic reader would care because reliable object detection is central to safe autonomous transportation and the review highlights practical paths forward amid rapid AI advances.

Core claim

By systematically reviewing the spectrum of AV sensors and their fusion strategies, introducing a structured categorization of ego-vehicle, infrastructure-based, and cooperative datasets, and analyzing cutting-edge detection methodologies from 2D/3D pipelines to hybrid sensor fusion with particular attention to transformer-driven approaches powered by Vision Transformers, Large and Small Language Models, and VLMs, the survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.

What carries the argument

The structured categorization of AV datasets into ego-vehicle, infrastructure-based, and cooperative types combined with analysis of sensor fusion strategies and their integration into LLM and VLM-driven perception frameworks.

If this is right

Understanding sensor capabilities and limitations supports development of more effective fusion strategies for complex environments.
Dataset categorization enables better cross-analysis to improve training of robust detection models.
Focus on transformer-driven and VLM-powered methods points toward hybrid pipelines for next-generation perception.
Identification of open challenges in contextual reasoning guides research in cooperative intelligence and multimodal LLMs.
The overall synthesis provides direction for incorporating generative AI into reliable AV object detection systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The roadmap could be extended by adding quantitative performance comparisons across the reviewed fusion and VLM methods to aid practical selection.
Implications for real-time constraints and computational efficiency in vehicle hardware may need explicit mapping beyond the current analysis.
The cooperative dataset categories suggest potential for scaling to city-wide infrastructure networks, which could be tested in simulation.
Connections to broader robotics perception tasks indicate the framework might generalize beyond driving scenarios.

Load-bearing premise

The selected literature and categorization of datasets and methods comprehensively represent the fragmented state of multimodal perception without significant selection bias.

What would settle it

Discovery of a substantial body of recent work on AV object detection that uses an entirely different dataset categorization or centers on methods and challenges not addressed in the review would show the roadmap is incomplete.

Figures

Figures reproduced from arXiv: 2510.26641 by Abolfazl Razi, Hazim Alzorgan, Mahlagha Fazeli, Niloufar Mehrabi, Sayed Pedram Haeri Boroujeni.

**Figure 1.** Figure 1: Visualization of object detection across multiple sensor modalities in autonomous vehicles. The RGB image demonstrates [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The organization of this survey paper. 5 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of major sensors used in AVs based on their types and perception performance. Sensor performance is evaluated [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of major AV datasets based on their specifications and applications. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: A comprehensive taxonomy of object detection methods in AVs, categorized into four primary types. Each category [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗

**Figure 6.** Figure 6: Overall framework of the 2D camera-based approaches in the context of autonomous driving systems. [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗

**Figure 7.** Figure 7: Overall framework of the Point-based approaches in 3D Lidar object detection. [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗

**Figure 8.** Figure 8: Overall framework of the Range-based approaches in 3D Lidar object detection. [PITH_FULL_IMAGE:figures/full_fig_p041_8.png] view at source ↗

**Figure 9.** Figure 9: Overall framework of the Voxel-based approaches in 3D Lidar object detection. [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗

**Figure 10.** Figure 10: Overall framework of the Pillar-based approaches in 3D Lidar object detection. [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗

**Figure 11.** Figure 11: Overall framework of the Voxel-Point approaches in 3D Lidar object detection. [PITH_FULL_IMAGE:figures/full_fig_p042_11.png] view at source ↗

**Figure 12.** Figure 12: Overall framework of Early-Fusion 3D object detection in the context of autonomous driving systems. [PITH_FULL_IMAGE:figures/full_fig_p047_12.png] view at source ↗

**Figure 13.** Figure 13: Overall framework of Mid-Fusion 3D object detection in the context of autonomous driving systems. [PITH_FULL_IMAGE:figures/full_fig_p048_13.png] view at source ↗

**Figure 14.** Figure 14: Overall framework of Late-Fusion 3D object detection in the context of autonomous driving systems. [PITH_FULL_IMAGE:figures/full_fig_p048_14.png] view at source ↗

**Figure 15.** Figure 15: Performance comparison of the top three algorithms from each detection category (2D, 3D, and 2D–3D fusion object [PITH_FULL_IMAGE:figures/full_fig_p052_15.png] view at source ↗

**Figure 16.** Figure 16: Performance comparison of the top three algorithms from each detection category (2D, 3D, and 2D–3D fusion object [PITH_FULL_IMAGE:figures/full_fig_p053_16.png] view at source ↗

**Figure 17.** Figure 17: Overall framework of the VLMs in the context of autonomous driving systems. [PITH_FULL_IMAGE:figures/full_fig_p053_17.png] view at source ↗

read the original abstract

Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to one core capability, reliable object detection in complex and multimodal environments. While recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress, the field still faces a critical challenge as knowledge remains fragmented across multimodal perception, contextual reasoning, and cooperative intelligence. This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs, emphasizing emerging paradigms such as Vision-Language Models (VLMs), Large Language Models (LLMs), and Generative AI rather than re-examining outdated techniques. We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies, highlighting not only their capabilities and limitations in dynamic driving environments but also their potential to integrate with recent advances in LLM/VLM-driven perception frameworks. Next, we introduce a structured categorization of AV datasets that moves beyond simple collections, positioning ego-vehicle, infrastructure-based, and cooperative datasets (e.g., V2V, V2I, V2X, I2I), followed by a cross-analysis of data structures and characteristics. Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs. By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This manuscript is a survey on object detection for autonomous vehicles. It reviews sensor modalities (camera, ultrasonic, LiDAR, Radar) and fusion strategies, introduces a three-way categorization of datasets (ego-vehicle, infrastructure-based, cooperative/V2X), and examines detection pipelines from 2D/3D methods through transformer, ViT, SLM, VLM, and LLM approaches. The central claim is that this synthesis yields a clear roadmap of current capabilities, open challenges, and future opportunities in multimodal AV perception.

Significance. If the literature selection proves representative, the survey could usefully organize a fragmented field by linking classical sensor fusion to emerging multimodal LLMs/VLMs and cooperative perception. The structured dataset categorization and forward emphasis on next-gen paradigms are strengths that could guide researchers, though the absence of quantitative cross-comparisons limits immediate utility for capability assessment.

major comments (1)

[Introduction] Introduction and abstract: The paper states it performs a 'systematic review' and delivers a 'clear roadmap' via synthesis of sensors, datasets, and methods. No section describes the literature search protocol (databases, keywords, date range, inclusion/exclusion criteria, or total papers screened). This is load-bearing for the central claim because the three-way dataset categorization and emphasis on VLMs/cooperative perception cannot be evaluated for selection bias or recency bias without such details.

minor comments (2)

[Dataset section] The cross-analysis of dataset characteristics would be strengthened by a summary table comparing sample counts, sensor coverage, and annotation types across the ego/infrastructure/cooperative categories.
[Fusion strategies] Notation for fusion strategies (e.g., early/late/hybrid) is used inconsistently between the sensor review and methodology sections; a single glossary or consistent abbreviations would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey manuscript. We have addressed the major comment regarding the literature search protocol by planning a clear revision to improve transparency.

read point-by-point responses

Referee: [Introduction] Introduction and abstract: The paper states it performs a 'systematic review' and delivers a 'clear roadmap' via synthesis of sensors, datasets, and methods. No section describes the literature search protocol (databases, keywords, date range, inclusion/exclusion criteria, or total papers screened). This is load-bearing for the central claim because the three-way dataset categorization and emphasis on VLMs/cooperative perception cannot be evaluated for selection bias or recency bias without such details.

Authors: We acknowledge that the manuscript does not include an explicit description of the literature search protocol, which limits the ability to assess selection or recency bias. Our survey is structured as a narrative synthesis emphasizing recent advances in multimodal fusion and VLM/LLM approaches rather than a formal PRISMA-style systematic review. To address this directly, we will add a dedicated subsection in the revised Introduction titled 'Literature Selection and Review Methodology.' This subsection will specify the primary databases (Google Scholar, arXiv, IEEE Xplore, and proceedings from CVPR/ECCV/ICCV), search keywords (e.g., 'object detection autonomous vehicles', 'multimodal sensor fusion', 'vision language models AV', 'cooperative perception V2X'), date range (2015-2024 with emphasis on 2020 onward for emerging paradigms), inclusion criteria (peer-reviewed papers or impactful preprints on sensor fusion, datasets, or LLM/VLM detection pipelines), and exclusion criteria (purely theoretical works without AV application or non-English sources). We believe this addition will allow readers to better evaluate the representativeness of the three-way dataset categorization and the roadmap presented. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey synthesis

full rationale

This paper is a literature survey synthesizing external work on AV object detection sensors, fusion strategies, ego/infrastructure/cooperative datasets, and transformer/VLM/LLM methods. Its central claim of delivering a 'clear roadmap of current capabilities, open challenges, and future opportunities' rests on cited independent sources rather than any internal equations, fitted parameters, or self-referential definitions that reduce to the paper's own inputs by construction. No mathematical derivations, self-citation load-bearing steps, or predictions equivalent to inputs appear in the abstract or described structure. The work is self-contained against external benchmarks via its references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on the domain assumption that AV perception knowledge is fragmented across modalities and that a structured review of sensors, datasets, and LLM/VLM methods can bridge it. No free parameters or invented entities are introduced.

axioms (1)

domain assumption Knowledge in multimodal perception, contextual reasoning, and cooperative intelligence for AV object detection remains fragmented.
Explicitly stated in the abstract as the motivation and gap the survey addresses.

pith-pipeline@v0.9.0 · 5867 in / 1093 out tokens · 37825 ms · 2026-05-18T02:55:07.663830+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies... Next, we introduce a structured categorization of AV datasets... Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 1 internal anchor

[1]

B. B. ELallid, N. Benamar, M. Bagaa, N. Mrani, Secure and efficient vehicle control of autonomous vehicles using federated deep reinforcement learning, Applied Soft Computing (2025) 113924

work page 2025
[2]

Khosravian, M

A. Khosravian, M. Masih-Tehrani, A. Amirkhani, S. Ebrahimi-Nejad, Robust autonomous vehicle control by leveraging multi-stage mpc and quantized cnn in hil framework, Applied Soft Computing 162 (2024) 111802

work page 2024
[3]

Jiang, K

H. Jiang, K. Xia, Y. Zhao, Z. Yao, Y. Jiang, Z. He, Environmental impacts and emission reduction methods of vehicles equipped with driving automation systems: An operational-level review, Transportation Research Part C: Emerging Technologies 173 (2025) 104996

work page 2025
[4]

Grosse, A

K. Grosse, A. Alahi, A qualitative ai security risk assessment of autonomous vehicles, Transportation Research Part C: Emerging Technologies 169 (2024) 104797

work page 2024
[5]

K. Wang, C. Shen, X. Li, J. Lu, Uncertainty quantification for safe and reliable autonomous vehicles: A review of methods and applications, IEEE Transactions on Intelligent Transportation Systems (2025)

work page 2025
[6]

X. Chen, X. Wang, W. Zhao, C. Wang, S. Cheng, Z. Luan, Hierarchical deep reinforcement learning based multi- agent game control for energy consumption and traffic efficiency improving of autonomous vehicles, Energy 323 (2025) 135669

work page 2025
[7]

L. Zha, C. Gong, K. Lv, Real-time localization and navigation method for autonomous vehicles based on multi- modal data fusion by integrating memory transformer and ddqn, Image and Vision Computing 156 (2025) 105484. 55

work page 2025
[8]

W. Sun, H. Shao, J. Li, T. Wu, E. Z. Fainman, Multi-type traffic sensor location problem for origin–destination estimation considering spatiotemporal correlation and sensor failure, Transportation Research Part C: Emerging Technologies 179 (2025) 105288

work page 2025
[9]

X. Chen, S. P. H. Boroujeni, X. Shu, H. Li, A. Razi, Enhancing graph neural networks in large-scale traffic incident analysis with concurrency hypothesis, in: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024, pp. 196–207

work page 2024
[10]

Chang, W

Y. Chang, W. Xiao, B. Coifman, Using spatiotemporal stacks for precise vehicle tracking from roadside 3d lidar data, Transportation research part C: emerging technologies 154 (2023) 104280

work page 2023
[11]

A. D. Beza, Z. Xie, M. Ramezani, D. Levinson, From lane-less to lane-free: Implications in the era of automated vehicles, Transportation Research Part C: Emerging Technologies 170 (2025) 104898

work page 2025
[12]

K. Wang, J. Guo, K. Chen, J. Lu, An in-depth examination of slam methods: Challenges, advancements, and applications in complex scenes for autonomous driving, IEEE Transactions on Intelligent Transportation Systems (2025)

work page 2025
[13]

Y. Zha, W. Shangguan, J. Chen, L. Chai, W. Qiu, A. M. L´ opez, Heterogeneous multiscale cooperative perception for connected autonomous vehicles via v2x interaction, IEEE Internet of Things Journal (2025)

work page 2025
[14]

Salari, L

M. Salari, L. Kattan, M. Gentili, Optimal roadside units location for path flow reconstruction in a connected vehicle environment, Transportation Research Part C: Emerging Technologies 138 (2022) 103625

work page 2022
[15]

Praveen, S

R. Praveen, S. Hundekari, P. Parida, T. Mittal, A. Sehgal, M. Bhavana, Autonomous vehicle navigation systems: Machine learning for real-time traffic prediction, in: 2025 International Conference on Computational, Communi- cation and Information Technology (ICCCIT), IEEE, 2025, pp. 809–813

work page 2025
[16]

Mohammadi, R

A. Mohammadi, R. Ahmari, V. Hemmati, F. Owusu-Ambrose, M. N. Mahmoud, P. Kebria, A. Homaifar, Detection of multiple small biased gps spoofing attacks on autonomous vehicles using time series analysis, IEEE Open Journal of Vehicular Technology (2025)

work page 2025
[17]

S. D. RS, S. D. Varshni, Embedded large language models for enhanced human-machine interface in autonomous vehicles, in: 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI), IEEE, 2025, pp. 1143–1150

work page 2025
[18]

Kumar, P

H. Kumar, P. Mamoria, D. K. Dewangan, Improving faster r-cnn for vehicle detection under varying conditions with domain adaptation technique, in: 2025 Fourth International Conference on Power, Control and Computing Technologies (ICPC2T), IEEE, 2025, pp. 1–6

work page 2025
[19]

Shrivastava, V

A. Shrivastava, V. Kansal, A. Nagpal, K. K. Dixit, K. V. Rajkumar, et al., Ai-powered object detection for autonomous vehicles: A comparative study of machine learning models, in: 2025 International Conference on Computational, Communication and Information Technology (ICCCIT), IEEE, 2025, pp. 612–617

work page 2025
[20]

Subhedar, M

J. Subhedar, M. R. Bachute, Insights of semantic segmentation using the deeplab architecture for autonomous driving, MethodsX (2025) 103387

work page 2025
[21]

S. Chen, X. Li, K. Wang, J. Sun, B. Yang, Ranging research on telematics based on mask r-cnn dual eye stereo vision ranging algorithm, in: The International Conference Optoelectronic Information and Optical Engineering (OIOE2024), Vol. 13513, SPIE, 2025, pp. 884–889

work page 2025
[22]

S. P. H. Boroujeni, N. Mehrabi, F. Afghah, C. P. McGrath, D. Bhatkar, M. A. Biradar, A. Razi, Toward ai- driven fire imagery: Attributes, challenges, comparisons, and the promise of vlms and llms, Machine Learning with Applications (2025) 100763

work page 2025
[23]

Y. Tian, F. Lin, Y. Li, T. Zhang, Q. Zhang, X. Fu, J. Huang, X. Dai, Y. Wang, C. Tian, et al., Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility, Information Fusion 122 (2025) 103158

work page 2025
[24]

Z. Guo, Z. Yagudin, A. Lykov, M. Konenkov, D. Tsetserukou, Vlm-auto: Vlm-based autonomous driving assistant with human-like behavior and understanding for complex road scenes, in: 2024 2nd International Conference on Foundation and Large Language Models (FLLM), IEEE, 2024, pp. 501–507

work page 2024
[25]

Y. Wang, S. Wang, Y. Li, M. Liu, Developments in 3d object detection for autonomous driving: A review, IEEE Sensors Journal (2025)

work page 2025
[26]

H. Wang, J. Liu, H. Dong, Z. Shao, A survey of the multi-sensor fusion object detection task in autonomous driving, Sensors 25 (9) (2025) 2794

work page 2025
[27]

H. Wang, X. Chen, Q. Yuan, P. Liu, A review of 3d object detection based on autonomous driving, The Visual Computer 41 (3) (2025) 1757–1775

work page 2025
[28]

Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, L. Wang, Robustness-aware 3d object detection in autonomous driving: A review and outlook, IEEE Transactions on Intelligent Transportation Systems (2024)

work page 2024
[29]

S. Y. Alaba, A. C. Gurbuz, J. E. Ball, Emerging trends in autonomous vehicle perception: Multimodal fusion for 3d object detection, World Electric Vehicle Journal 15 (1) (2024) 20

work page 2024
[30]

Z. Zou, K. Chen, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276

work page 2023
[31]

X. Ma, W. Ouyang, A. Simonelli, E. Ricci, 3d object detection from images for autonomous driving: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (5) (2023) 3537–3556

work page 2023
[32]

R. Qian, X. Lai, X. Li, 3d object detection for autonomous driving: A survey, Pattern Recognition 130 (2022) 108796

work page 2022
[33]

Y. Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y. Li, D. Cao, Deep learning for image and point cloud fusion in autonomous driving: A review, IEEE Transactions on Intelligent Transportation Systems 23 (2) (2021) 722–739

work page 2021
[34]

D. Feng, C. Haase-Sch¨ utz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, K. Dietmayer, Deep 56 multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and chal- lenges, IEEE Transactions on Intelligent Transportation Systems 22 (3) (2020) 1341–1360

work page 2020
[35]

J. Guo, U. Kurup, M. Shah, Is it safe to drive? an overview of factors, metrics, and datasets for driveability assessment in autonomous driving, IEEE Transactions on Intelligent Transportation Systems 21 (8) (2019) 3135– 3151

work page 2019
[36]

L. Hu, J. Zhang, J. Zhang, S. Cheng, Y. Wang, W. Zhang, N. Yu, Security analysis and adaptive false data injection against multi-sensor fusion localization for autonomous driving, Information Fusion 117 (2025) 102822

work page 2025
[37]

S. P. H. Boroujeni, A. Razi, S. Khoshdel, F. Afghah, J. L. Coen, L. O’Neill, P. Fule, A. Watts, N.-M. T. Kokolakis, K. G. Vamvoudakis, A comprehensive survey of research towards ai-enabled unmanned aerial systems in pre-, active-, and post-wildfire management, Information Fusion 108 (2024) 102369

work page 2024
[38]

H. Du, L. Ren, Y. Wang, X. Cao, C. Sun, Advancements in perception system with multi-sensor fusion for embodied agents, Information Fusion 117 (2025) 102859

work page 2025
[39]

Wu, Fusion-based modeling of an intelligent algorithm for enhanced object detection using a deep learning approach on radar and camera data, Information Fusion 113 (2025) 102647

Y. Wu, Fusion-based modeling of an intelligent algorithm for enhanced object detection using a deep learning approach on radar and camera data, Information Fusion 113 (2025) 102647

work page 2025
[40]

Y. Wu, J. Liu, M. Gong, Q. Miao, W. Ma, C. Xu, Joint semantic segmentation using representations of lidar point clouds and camera images, Information Fusion 108 (2024) 102370

work page 2024
[41]

S. Li, X. Li, H. Wang, Y. Zhou, Z. Shen, Multi-gnss ppp/ins/vision/lidar tightly integrated system for precise navigation in urban environments, Information Fusion 90 (2023) 218–232

work page 2023
[42]

Mehrabi, S

N. Mehrabi, S. P. H. Boroujeni, Age estimation based on facial images using hybrid features and particle swarm optimization, in: 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), IEEE, 2021, pp. 412–418

work page 2021
[43]

Sarlak, H

A. Sarlak, H. Alzorgan, S. P. H. Boroujeni, A. Razi, R. Amin, Enhanced cooperative perception for autonomous vehicles using imperfect communication, in: 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), IEEE, 2024, pp. 700–707

work page 2024
[44]

D. Kent, M. Alyaqoub, X. Lu, H. Khatounabadi, K. Sung, C. Scheller, A. Dalat, A. bin Thabit, R. Whitley, H. Radha, Msu-4s-the michigan state university four seasons dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22658–22667

work page 2024
[45]

Zheng, L

L. Zheng, L. Yang, Q. Lin, W. Ai, M. Liu, S. Lu, J. Liu, H. Ren, J. Mo, X. Bai, et al., Omnihd-scenes: A next-generation multimodal dataset for autonomous driving, arXiv preprint arXiv:2412.10734 (2024)

work page arXiv 2024
[46]

Alibeigi, W

M. Alibeigi, W. Ljungbergh, A. Tonderski, G. Hess, A. Lilja, C. Lindstr¨ om, D. Motorniuk, J. Fu, J. Widahl, C. Petersson, Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20178–20188

work page 2023
[47]

C. A. Diaz-Ruiz, Y. Xia, Y. You, J. Nino, J. Chen, J. Monica, X. Chen, K. Luo, Y. Wang, M. Emond, et al., Ithaca365: Dataset and driving perception under repeated and challenging weather conditions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21383–21392

work page 2022
[48]

J. Mao, M. Niu, C. Jiang, H. Liang, J. Chen, X. Liang, Y. Li, C. Ye, W. Zhang, Z. Li, et al., One million scenes for autonomous driving: Once dataset, arXiv preprint arXiv:2106.11037 (2021)

work page arXiv 2021
[49]

D´ eziel, P

J.-L. D´ eziel, P. Merriaux, F. Tremblay, D. Lessard, D. Plourde, J. Stanguennec, P. Goulet, P. Olivier, Pixset: An opportunity for 3d computer vision to go beyond point clouds with a full-waveform lidar dataset, in: 2021 ieee international intelligent transportation systems conference (itsc), IEEE, 2021, pp. 2987–2993

work page 2021
[50]

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, et al., Pandaset: Ad- vanced sensor suite dataset for autonomous driving, in: 2021 IEEE international intelligent transportation systems conference (ITSC), IEEE, 2021, pp. 3095–3101

work page 2021
[51]

Geyer, Y

J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. M¨ uhlegg, S. Dorn, et al., A2d2: Audi autonomous driving dataset, arXiv preprint arXiv:2004.06320 (2020)

work page arXiv 2004
[52]

URLhttps://public.roboflow.com/object-detection/self-driving-car

Roboflow, Self-driving car dataset, accessed: 2025-02-28 (2025). URLhttps://public.roboflow.com/object-detection/self-driving-car

work page 2025
[53]

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., Scalability in perception for autonomous driving: Waymo open dataset, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020
[54]

Q.-H. Pham, P. Sevestre, R. S. Pahwa, H. Zhan, C. H. Pang, Y. Chen, A. Mustafa, V. Chandrasekhar, J. Lin, A* 3d dataset: Towards autonomous driving in challenging environments, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 2267–2273

work page 2020
[55]

J. Bock, R. Krajewski, T. Moers, S. Runde, L. Vater, L. Eckstein, The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections, in: 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1929–1934.doi:10.1109/IV47402.2020.9304839

work page doi:10.1109/iv47402.2020.9304839 2020
[56]

Moers, L

T. Moers, L. Vater, R. Krajewski, J. Bock, A. Zlocki, L. Eckstein, The exid dataset: A real-world trajectory dataset of highly interactive highway scenarios in germany, in: 2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 958–964.doi:10.1109/IV51971.2022.9827305

work page doi:10.1109/iv51971.2022.9827305 2022
[57]

Caesar, V

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, nuscenes: A multimodal dataset for autonomous driving, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11621–11631

work page 2020
[58]

Chang, J

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, et al., Argoverse: 3d tracking and forecasting with rich maps, in: Proceedings of the IEEE/CVF conference on 57 computer vision and pattern recognition, 2019, pp. 8748–8757

work page 2019
[59]

J. Xue, J. Fang, T. Li, B. Zhang, P. Zhang, Z. Ye, J. Dou, Blvd: Building a large-scale 5d semantics benchmark for autonomous driving, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 6685–6691

work page 2019
[60]

A Commute in Data: The comma2k19 Dataset

H. Schafer, E. Santana, A. Haden, R. Biasini, A commute in data: The comma2k19 dataset, arXiv preprint arXiv:1812.05752 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[61]

F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, T. Darrell, et al., Bdd100k: A diverse driving video database with scalable annotation tooling, arXiv preprint arXiv:1805.04687 2 (5) (2018) 6

work page arXiv 2018
[62]

Huang, X

X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y. Lin, R. Yang, The apolloscape dataset for autonomous driving, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 954–960

work page 2018
[63]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele, The cityscapes dataset for semantic urban scene understanding, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223

work page 2016
[64]

Barnes, M

D. Barnes, M. Gadd, P. Murcutt, P. Newman, I. Posner, The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset, in: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, 2020, pp. 6433–6438

work page 2020
[65]

Geiger, P

A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE, 2012, pp. 3354–3361

work page 2012
[66]

X. Zhu, H. Sheng, S. Cai, B. Deng, S. Yang, Q. Liang, K. Chen, L. Gao, J. Song, J. Ye, Roscenes: A large-scale multi-view 3d dataset for roadside perception, in: European Conference on Computer Vision, Springer, 2024, pp. 331–347

work page 2024
[67]

Zimmer, C

W. Zimmer, C. Creß, H. T. Nguyen, A. C. Knoll, Tumtraf intersection dataset: All you need for urban 3d camera- lidar roadside perception, in: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2023, pp. 1030–1037

work page 2023
[68]

C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V. Lakshminarasimhan, A. Knoll, A9-dataset: Multi-sensor infrastructure-based dataset for mobility research, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 965–970

work page 2022
[69]

H. Wang, X. Zhang, Z. Li, J. Li, K. Wang, Z. Lei, R. Haibing, Ips300+: a challenging multi-modal data sets for intersection perception system, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, pp. 2539–2545

work page 2022
[70]

X. Ye, M. Shu, H. Li, Y. Shi, Y. Li, G. Wang, X. Tan, E. Ding, Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21341–21350

work page 2022
[71]

Busch, C

S. Busch, C. Koetsier, J. Axmann, C. Brenner, Lumpi: The leibniz university multi-perspective intersection dataset, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 1127–1134

work page 2022
[72]

M. Howe, I. Reid, J. Mackenzie, Weakly supervised training of monocular 3d object detectors using wide baseline multi-view traffic camera data, arXiv preprint arXiv:2110.10966 (2021)

work page arXiv 2021
[73]

W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelle, et al., Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps, arXiv preprint arXiv:1910.03088 (2019)

work page arXiv 1910
[74]

Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, J.-N. Hwang, Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8797–8806

work page 2019
[75]

Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,

A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dissanayake, N. Ahsan, Y. Li, F. S. Khan, H. Cholakkal, et al., Drivelmm-o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding, arXiv preprint arXiv:2503.10621 (2025)

work page arXiv 2025
[76]

K. Chen, Y. Li, W. Zhang, Y. Liu, P. Li, R. Gao, L. Hong, M. Tian, X. Zhao, Z. Li, et al., Automated evaluation of large vision-language models on self-driving corner cases, in: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, 2025, pp. 7817–7826

work page 2025
[77]

H.-k. Chiu, R. Hachiuma, C.-Y. Wang, S. F. Smith, Y.-C. F. Wang, M.-H. Chen, V2v-llm: Vehicle-to-vehicle cooperative autonomous driving with multi-modal large language models, arXiv preprint arXiv:2502.09980 (2025)

work page arXiv 2025
[78]

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, H. Li, Drivelm: Driving with graph visual question answering, in: European Conference on Computer Vision, Springer, 2024, pp. 256–274

work page 2024
[79]

Inoue, Y

Y. Inoue, Y. Yada, K. Tanahashi, Y. Yamaguchi, Nuscenes-mqa: Integrated evaluation of captions and qa for autonomous driving datasets using markup annotations, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 930–938

work page 2024
[80]

S. Wang, Z. Yu, X. Jiang, S. Lan, M. Shi, N. Chang, J. Kautz, Y. Li, J. M. Alvarez, Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning, arXiv preprint arXiv:2504.04348 (2025)

work page arXiv 2025

Showing first 80 references.

[1] [1]

B. B. ELallid, N. Benamar, M. Bagaa, N. Mrani, Secure and efficient vehicle control of autonomous vehicles using federated deep reinforcement learning, Applied Soft Computing (2025) 113924

work page 2025

[2] [2]

Khosravian, M

A. Khosravian, M. Masih-Tehrani, A. Amirkhani, S. Ebrahimi-Nejad, Robust autonomous vehicle control by leveraging multi-stage mpc and quantized cnn in hil framework, Applied Soft Computing 162 (2024) 111802

work page 2024

[3] [3]

Jiang, K

H. Jiang, K. Xia, Y. Zhao, Z. Yao, Y. Jiang, Z. He, Environmental impacts and emission reduction methods of vehicles equipped with driving automation systems: An operational-level review, Transportation Research Part C: Emerging Technologies 173 (2025) 104996

work page 2025

[4] [4]

Grosse, A

K. Grosse, A. Alahi, A qualitative ai security risk assessment of autonomous vehicles, Transportation Research Part C: Emerging Technologies 169 (2024) 104797

work page 2024

[5] [5]

K. Wang, C. Shen, X. Li, J. Lu, Uncertainty quantification for safe and reliable autonomous vehicles: A review of methods and applications, IEEE Transactions on Intelligent Transportation Systems (2025)

work page 2025

[6] [6]

X. Chen, X. Wang, W. Zhao, C. Wang, S. Cheng, Z. Luan, Hierarchical deep reinforcement learning based multi- agent game control for energy consumption and traffic efficiency improving of autonomous vehicles, Energy 323 (2025) 135669

work page 2025

[7] [7]

L. Zha, C. Gong, K. Lv, Real-time localization and navigation method for autonomous vehicles based on multi- modal data fusion by integrating memory transformer and ddqn, Image and Vision Computing 156 (2025) 105484. 55

work page 2025

[8] [8]

W. Sun, H. Shao, J. Li, T. Wu, E. Z. Fainman, Multi-type traffic sensor location problem for origin–destination estimation considering spatiotemporal correlation and sensor failure, Transportation Research Part C: Emerging Technologies 179 (2025) 105288

work page 2025

[9] [9]

X. Chen, S. P. H. Boroujeni, X. Shu, H. Li, A. Razi, Enhancing graph neural networks in large-scale traffic incident analysis with concurrency hypothesis, in: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024, pp. 196–207

work page 2024

[10] [10]

Chang, W

Y. Chang, W. Xiao, B. Coifman, Using spatiotemporal stacks for precise vehicle tracking from roadside 3d lidar data, Transportation research part C: emerging technologies 154 (2023) 104280

work page 2023

[11] [11]

A. D. Beza, Z. Xie, M. Ramezani, D. Levinson, From lane-less to lane-free: Implications in the era of automated vehicles, Transportation Research Part C: Emerging Technologies 170 (2025) 104898

work page 2025

[12] [12]

K. Wang, J. Guo, K. Chen, J. Lu, An in-depth examination of slam methods: Challenges, advancements, and applications in complex scenes for autonomous driving, IEEE Transactions on Intelligent Transportation Systems (2025)

work page 2025

[13] [13]

Y. Zha, W. Shangguan, J. Chen, L. Chai, W. Qiu, A. M. L´ opez, Heterogeneous multiscale cooperative perception for connected autonomous vehicles via v2x interaction, IEEE Internet of Things Journal (2025)

work page 2025

[14] [14]

Salari, L

M. Salari, L. Kattan, M. Gentili, Optimal roadside units location for path flow reconstruction in a connected vehicle environment, Transportation Research Part C: Emerging Technologies 138 (2022) 103625

work page 2022

[15] [15]

Praveen, S

R. Praveen, S. Hundekari, P. Parida, T. Mittal, A. Sehgal, M. Bhavana, Autonomous vehicle navigation systems: Machine learning for real-time traffic prediction, in: 2025 International Conference on Computational, Communi- cation and Information Technology (ICCCIT), IEEE, 2025, pp. 809–813

work page 2025

[16] [16]

Mohammadi, R

A. Mohammadi, R. Ahmari, V. Hemmati, F. Owusu-Ambrose, M. N. Mahmoud, P. Kebria, A. Homaifar, Detection of multiple small biased gps spoofing attacks on autonomous vehicles using time series analysis, IEEE Open Journal of Vehicular Technology (2025)

work page 2025

[17] [17]

S. D. RS, S. D. Varshni, Embedded large language models for enhanced human-machine interface in autonomous vehicles, in: 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI), IEEE, 2025, pp. 1143–1150

work page 2025

[18] [18]

Kumar, P

H. Kumar, P. Mamoria, D. K. Dewangan, Improving faster r-cnn for vehicle detection under varying conditions with domain adaptation technique, in: 2025 Fourth International Conference on Power, Control and Computing Technologies (ICPC2T), IEEE, 2025, pp. 1–6

work page 2025

[19] [19]

Shrivastava, V

A. Shrivastava, V. Kansal, A. Nagpal, K. K. Dixit, K. V. Rajkumar, et al., Ai-powered object detection for autonomous vehicles: A comparative study of machine learning models, in: 2025 International Conference on Computational, Communication and Information Technology (ICCCIT), IEEE, 2025, pp. 612–617

work page 2025

[20] [20]

Subhedar, M

J. Subhedar, M. R. Bachute, Insights of semantic segmentation using the deeplab architecture for autonomous driving, MethodsX (2025) 103387

work page 2025

[21] [21]

S. Chen, X. Li, K. Wang, J. Sun, B. Yang, Ranging research on telematics based on mask r-cnn dual eye stereo vision ranging algorithm, in: The International Conference Optoelectronic Information and Optical Engineering (OIOE2024), Vol. 13513, SPIE, 2025, pp. 884–889

work page 2025

[22] [22]

S. P. H. Boroujeni, N. Mehrabi, F. Afghah, C. P. McGrath, D. Bhatkar, M. A. Biradar, A. Razi, Toward ai- driven fire imagery: Attributes, challenges, comparisons, and the promise of vlms and llms, Machine Learning with Applications (2025) 100763

work page 2025

[23] [23]

Y. Tian, F. Lin, Y. Li, T. Zhang, Q. Zhang, X. Fu, J. Huang, X. Dai, Y. Wang, C. Tian, et al., Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility, Information Fusion 122 (2025) 103158

work page 2025

[24] [24]

Z. Guo, Z. Yagudin, A. Lykov, M. Konenkov, D. Tsetserukou, Vlm-auto: Vlm-based autonomous driving assistant with human-like behavior and understanding for complex road scenes, in: 2024 2nd International Conference on Foundation and Large Language Models (FLLM), IEEE, 2024, pp. 501–507

work page 2024

[25] [25]

Y. Wang, S. Wang, Y. Li, M. Liu, Developments in 3d object detection for autonomous driving: A review, IEEE Sensors Journal (2025)

work page 2025

[26] [26]

H. Wang, J. Liu, H. Dong, Z. Shao, A survey of the multi-sensor fusion object detection task in autonomous driving, Sensors 25 (9) (2025) 2794

work page 2025

[27] [27]

H. Wang, X. Chen, Q. Yuan, P. Liu, A review of 3d object detection based on autonomous driving, The Visual Computer 41 (3) (2025) 1757–1775

work page 2025

[28] [28]

Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, L. Wang, Robustness-aware 3d object detection in autonomous driving: A review and outlook, IEEE Transactions on Intelligent Transportation Systems (2024)

work page 2024

[29] [29]

S. Y. Alaba, A. C. Gurbuz, J. E. Ball, Emerging trends in autonomous vehicle perception: Multimodal fusion for 3d object detection, World Electric Vehicle Journal 15 (1) (2024) 20

work page 2024

[30] [30]

Z. Zou, K. Chen, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276

work page 2023

[31] [31]

X. Ma, W. Ouyang, A. Simonelli, E. Ricci, 3d object detection from images for autonomous driving: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (5) (2023) 3537–3556

work page 2023

[32] [32]

R. Qian, X. Lai, X. Li, 3d object detection for autonomous driving: A survey, Pattern Recognition 130 (2022) 108796

work page 2022

[33] [33]

Y. Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y. Li, D. Cao, Deep learning for image and point cloud fusion in autonomous driving: A review, IEEE Transactions on Intelligent Transportation Systems 23 (2) (2021) 722–739

work page 2021

[34] [34]

D. Feng, C. Haase-Sch¨ utz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, K. Dietmayer, Deep 56 multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and chal- lenges, IEEE Transactions on Intelligent Transportation Systems 22 (3) (2020) 1341–1360

work page 2020

[35] [35]

J. Guo, U. Kurup, M. Shah, Is it safe to drive? an overview of factors, metrics, and datasets for driveability assessment in autonomous driving, IEEE Transactions on Intelligent Transportation Systems 21 (8) (2019) 3135– 3151

work page 2019

[36] [36]

L. Hu, J. Zhang, J. Zhang, S. Cheng, Y. Wang, W. Zhang, N. Yu, Security analysis and adaptive false data injection against multi-sensor fusion localization for autonomous driving, Information Fusion 117 (2025) 102822

work page 2025

[37] [37]

S. P. H. Boroujeni, A. Razi, S. Khoshdel, F. Afghah, J. L. Coen, L. O’Neill, P. Fule, A. Watts, N.-M. T. Kokolakis, K. G. Vamvoudakis, A comprehensive survey of research towards ai-enabled unmanned aerial systems in pre-, active-, and post-wildfire management, Information Fusion 108 (2024) 102369

work page 2024

[38] [38]

H. Du, L. Ren, Y. Wang, X. Cao, C. Sun, Advancements in perception system with multi-sensor fusion for embodied agents, Information Fusion 117 (2025) 102859

work page 2025

[39] [39]

Wu, Fusion-based modeling of an intelligent algorithm for enhanced object detection using a deep learning approach on radar and camera data, Information Fusion 113 (2025) 102647

Y. Wu, Fusion-based modeling of an intelligent algorithm for enhanced object detection using a deep learning approach on radar and camera data, Information Fusion 113 (2025) 102647

work page 2025

[40] [40]

Y. Wu, J. Liu, M. Gong, Q. Miao, W. Ma, C. Xu, Joint semantic segmentation using representations of lidar point clouds and camera images, Information Fusion 108 (2024) 102370

work page 2024

[41] [41]

S. Li, X. Li, H. Wang, Y. Zhou, Z. Shen, Multi-gnss ppp/ins/vision/lidar tightly integrated system for precise navigation in urban environments, Information Fusion 90 (2023) 218–232

work page 2023

[42] [42]

Mehrabi, S

N. Mehrabi, S. P. H. Boroujeni, Age estimation based on facial images using hybrid features and particle swarm optimization, in: 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), IEEE, 2021, pp. 412–418

work page 2021

[43] [43]

Sarlak, H

A. Sarlak, H. Alzorgan, S. P. H. Boroujeni, A. Razi, R. Amin, Enhanced cooperative perception for autonomous vehicles using imperfect communication, in: 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), IEEE, 2024, pp. 700–707

work page 2024

[44] [44]

D. Kent, M. Alyaqoub, X. Lu, H. Khatounabadi, K. Sung, C. Scheller, A. Dalat, A. bin Thabit, R. Whitley, H. Radha, Msu-4s-the michigan state university four seasons dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22658–22667

work page 2024

[45] [45]

Zheng, L

L. Zheng, L. Yang, Q. Lin, W. Ai, M. Liu, S. Lu, J. Liu, H. Ren, J. Mo, X. Bai, et al., Omnihd-scenes: A next-generation multimodal dataset for autonomous driving, arXiv preprint arXiv:2412.10734 (2024)

work page arXiv 2024

[46] [46]

Alibeigi, W

M. Alibeigi, W. Ljungbergh, A. Tonderski, G. Hess, A. Lilja, C. Lindstr¨ om, D. Motorniuk, J. Fu, J. Widahl, C. Petersson, Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20178–20188

work page 2023

[47] [47]

C. A. Diaz-Ruiz, Y. Xia, Y. You, J. Nino, J. Chen, J. Monica, X. Chen, K. Luo, Y. Wang, M. Emond, et al., Ithaca365: Dataset and driving perception under repeated and challenging weather conditions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21383–21392

work page 2022

[48] [48]

J. Mao, M. Niu, C. Jiang, H. Liang, J. Chen, X. Liang, Y. Li, C. Ye, W. Zhang, Z. Li, et al., One million scenes for autonomous driving: Once dataset, arXiv preprint arXiv:2106.11037 (2021)

work page arXiv 2021

[49] [49]

D´ eziel, P

J.-L. D´ eziel, P. Merriaux, F. Tremblay, D. Lessard, D. Plourde, J. Stanguennec, P. Goulet, P. Olivier, Pixset: An opportunity for 3d computer vision to go beyond point clouds with a full-waveform lidar dataset, in: 2021 ieee international intelligent transportation systems conference (itsc), IEEE, 2021, pp. 2987–2993

work page 2021

[50] [50]

P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, et al., Pandaset: Ad- vanced sensor suite dataset for autonomous driving, in: 2021 IEEE international intelligent transportation systems conference (ITSC), IEEE, 2021, pp. 3095–3101

work page 2021

[51] [51]

Geyer, Y

J. Geyer, Y. Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V. H. Pham, M. M¨ uhlegg, S. Dorn, et al., A2d2: Audi autonomous driving dataset, arXiv preprint arXiv:2004.06320 (2020)

work page arXiv 2004

[52] [52]

URLhttps://public.roboflow.com/object-detection/self-driving-car

Roboflow, Self-driving car dataset, accessed: 2025-02-28 (2025). URLhttps://public.roboflow.com/object-detection/self-driving-car

work page 2025

[53] [53]

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., Scalability in perception for autonomous driving: Waymo open dataset, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020

[54] [54]

Q.-H. Pham, P. Sevestre, R. S. Pahwa, H. Zhan, C. H. Pang, Y. Chen, A. Mustafa, V. Chandrasekhar, J. Lin, A* 3d dataset: Towards autonomous driving in challenging environments, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 2267–2273

work page 2020

[55] [55]

J. Bock, R. Krajewski, T. Moers, S. Runde, L. Vater, L. Eckstein, The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections, in: 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1929–1934.doi:10.1109/IV47402.2020.9304839

work page doi:10.1109/iv47402.2020.9304839 2020

[56] [56]

Moers, L

T. Moers, L. Vater, R. Krajewski, J. Bock, A. Zlocki, L. Eckstein, The exid dataset: A real-world trajectory dataset of highly interactive highway scenarios in germany, in: 2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 958–964.doi:10.1109/IV51971.2022.9827305

work page doi:10.1109/iv51971.2022.9827305 2022

[57] [57]

Caesar, V

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, O. Beijbom, nuscenes: A multimodal dataset for autonomous driving, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11621–11631

work page 2020

[58] [58]

Chang, J

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, et al., Argoverse: 3d tracking and forecasting with rich maps, in: Proceedings of the IEEE/CVF conference on 57 computer vision and pattern recognition, 2019, pp. 8748–8757

work page 2019

[59] [59]

J. Xue, J. Fang, T. Li, B. Zhang, P. Zhang, Z. Ye, J. Dou, Blvd: Building a large-scale 5d semantics benchmark for autonomous driving, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 6685–6691

work page 2019

[60] [60]

A Commute in Data: The comma2k19 Dataset

H. Schafer, E. Santana, A. Haden, R. Biasini, A commute in data: The comma2k19 dataset, arXiv preprint arXiv:1812.05752 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[61] [61]

F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, T. Darrell, et al., Bdd100k: A diverse driving video database with scalable annotation tooling, arXiv preprint arXiv:1805.04687 2 (5) (2018) 6

work page arXiv 2018

[62] [62]

Huang, X

X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y. Lin, R. Yang, The apolloscape dataset for autonomous driving, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 954–960

work page 2018

[63] [63]

Cordts, M

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele, The cityscapes dataset for semantic urban scene understanding, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223

work page 2016

[64] [64]

Barnes, M

D. Barnes, M. Gadd, P. Murcutt, P. Newman, I. Posner, The oxford radar robotcar dataset: A radar extension to the oxford robotcar dataset, in: 2020 IEEE international conference on robotics and automation (ICRA), IEEE, 2020, pp. 6433–6438

work page 2020

[65] [65]

Geiger, P

A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, in: 2012 IEEE conference on computer vision and pattern recognition, IEEE, 2012, pp. 3354–3361

work page 2012

[66] [66]

X. Zhu, H. Sheng, S. Cai, B. Deng, S. Yang, Q. Liang, K. Chen, L. Gao, J. Song, J. Ye, Roscenes: A large-scale multi-view 3d dataset for roadside perception, in: European Conference on Computer Vision, Springer, 2024, pp. 331–347

work page 2024

[67] [67]

Zimmer, C

W. Zimmer, C. Creß, H. T. Nguyen, A. C. Knoll, Tumtraf intersection dataset: All you need for urban 3d camera- lidar roadside perception, in: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2023, pp. 1030–1037

work page 2023

[68] [68]

C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V. Lakshminarasimhan, A. Knoll, A9-dataset: Multi-sensor infrastructure-based dataset for mobility research, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 965–970

work page 2022

[69] [69]

H. Wang, X. Zhang, Z. Li, J. Li, K. Wang, Z. Lei, R. Haibing, Ips300+: a challenging multi-modal data sets for intersection perception system, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, pp. 2539–2545

work page 2022

[70] [70]

X. Ye, M. Shu, H. Li, Y. Shi, Y. Li, G. Wang, X. Tan, E. Ding, Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21341–21350

work page 2022

[71] [71]

Busch, C

S. Busch, C. Koetsier, J. Axmann, C. Brenner, Lumpi: The leibniz university multi-perspective intersection dataset, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 1127–1134

work page 2022

[72] [72]

M. Howe, I. Reid, J. Mackenzie, Weakly supervised training of monocular 3d object detectors using wide baseline multi-view traffic camera data, arXiv preprint arXiv:2110.10966 (2021)

work page arXiv 2021

[73] [73]

W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelle, et al., Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps, arXiv preprint arXiv:1910.03088 (2019)

work page arXiv 1910

[74] [74]

Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, J.-N. Hwang, Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8797–8806

work page 2019

[75] [75]

Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,

A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dissanayake, N. Ahsan, Y. Li, F. S. Khan, H. Cholakkal, et al., Drivelmm-o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding, arXiv preprint arXiv:2503.10621 (2025)

work page arXiv 2025

[76] [76]

K. Chen, Y. Li, W. Zhang, Y. Liu, P. Li, R. Gao, L. Hong, M. Tian, X. Zhao, Z. Li, et al., Automated evaluation of large vision-language models on self-driving corner cases, in: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, 2025, pp. 7817–7826

work page 2025

[77] [77]

H.-k. Chiu, R. Hachiuma, C.-Y. Wang, S. F. Smith, Y.-C. F. Wang, M.-H. Chen, V2v-llm: Vehicle-to-vehicle cooperative autonomous driving with multi-modal large language models, arXiv preprint arXiv:2502.09980 (2025)

work page arXiv 2025

[78] [78]

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, H. Li, Drivelm: Driving with graph visual question answering, in: European Conference on Computer Vision, Springer, 2024, pp. 256–274

work page 2024

[79] [79]

Inoue, Y

Y. Inoue, Y. Yada, K. Tanahashi, Y. Yamaguchi, Nuscenes-mqa: Integrated evaluation of captions and qa for autonomous driving datasets using markup annotations, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 930–938

work page 2024

[80] [80]

S. Wang, Z. Yu, X. Jiang, S. Lan, M. Shi, N. Chang, J. Kautz, Y. Li, J. M. Alvarez, Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning, arXiv preprint arXiv:2504.04348 (2025)

work page arXiv 2025