All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles
Pith reviewed 2026-05-18 02:55 UTC · model grok-4.3
The pith
Synthesizing sensor fusion strategies, categorized datasets, and multimodal LLM and VLM approaches delivers a roadmap for object detection in autonomous vehicles.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By systematically reviewing the spectrum of AV sensors and their fusion strategies, introducing a structured categorization of ego-vehicle, infrastructure-based, and cooperative datasets, and analyzing cutting-edge detection methodologies from 2D/3D pipelines to hybrid sensor fusion with particular attention to transformer-driven approaches powered by Vision Transformers, Large and Small Language Models, and VLMs, the survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.
What carries the argument
The structured categorization of AV datasets into ego-vehicle, infrastructure-based, and cooperative types combined with analysis of sensor fusion strategies and their integration into LLM and VLM-driven perception frameworks.
If this is right
- Understanding sensor capabilities and limitations supports development of more effective fusion strategies for complex environments.
- Dataset categorization enables better cross-analysis to improve training of robust detection models.
- Focus on transformer-driven and VLM-powered methods points toward hybrid pipelines for next-generation perception.
- Identification of open challenges in contextual reasoning guides research in cooperative intelligence and multimodal LLMs.
- The overall synthesis provides direction for incorporating generative AI into reliable AV object detection systems.
Where Pith is reading between the lines
- The roadmap could be extended by adding quantitative performance comparisons across the reviewed fusion and VLM methods to aid practical selection.
- Implications for real-time constraints and computational efficiency in vehicle hardware may need explicit mapping beyond the current analysis.
- The cooperative dataset categories suggest potential for scaling to city-wide infrastructure networks, which could be tested in simulation.
- Connections to broader robotics perception tasks indicate the framework might generalize beyond driving scenarios.
Load-bearing premise
The selected literature and categorization of datasets and methods comprehensively represent the fragmented state of multimodal perception without significant selection bias.
What would settle it
Discovery of a substantial body of recent work on AV object detection that uses an entirely different dataset categorization or centers on methods and challenges not addressed in the review would show the roadmap is incomplete.
Figures
read the original abstract
Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to one core capability, reliable object detection in complex and multimodal environments. While recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress, the field still faces a critical challenge as knowledge remains fragmented across multimodal perception, contextual reasoning, and cooperative intelligence. This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs, emphasizing emerging paradigms such as Vision-Language Models (VLMs), Large Language Models (LLMs), and Generative AI rather than re-examining outdated techniques. We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies, highlighting not only their capabilities and limitations in dynamic driving environments but also their potential to integrate with recent advances in LLM/VLM-driven perception frameworks. Next, we introduce a structured categorization of AV datasets that moves beyond simple collections, positioning ego-vehicle, infrastructure-based, and cooperative datasets (e.g., V2V, V2I, V2X, I2I), followed by a cross-analysis of data structures and characteristics. Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs. By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript is a survey on object detection for autonomous vehicles. It reviews sensor modalities (camera, ultrasonic, LiDAR, Radar) and fusion strategies, introduces a three-way categorization of datasets (ego-vehicle, infrastructure-based, cooperative/V2X), and examines detection pipelines from 2D/3D methods through transformer, ViT, SLM, VLM, and LLM approaches. The central claim is that this synthesis yields a clear roadmap of current capabilities, open challenges, and future opportunities in multimodal AV perception.
Significance. If the literature selection proves representative, the survey could usefully organize a fragmented field by linking classical sensor fusion to emerging multimodal LLMs/VLMs and cooperative perception. The structured dataset categorization and forward emphasis on next-gen paradigms are strengths that could guide researchers, though the absence of quantitative cross-comparisons limits immediate utility for capability assessment.
major comments (1)
- [Introduction] Introduction and abstract: The paper states it performs a 'systematic review' and delivers a 'clear roadmap' via synthesis of sensors, datasets, and methods. No section describes the literature search protocol (databases, keywords, date range, inclusion/exclusion criteria, or total papers screened). This is load-bearing for the central claim because the three-way dataset categorization and emphasis on VLMs/cooperative perception cannot be evaluated for selection bias or recency bias without such details.
minor comments (2)
- [Dataset section] The cross-analysis of dataset characteristics would be strengthened by a summary table comparing sample counts, sensor coverage, and annotation types across the ego/infrastructure/cooperative categories.
- [Fusion strategies] Notation for fusion strategies (e.g., early/late/hybrid) is used inconsistently between the sensor review and methodology sections; a single glossary or consistent abbreviations would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey manuscript. We have addressed the major comment regarding the literature search protocol by planning a clear revision to improve transparency.
read point-by-point responses
-
Referee: [Introduction] Introduction and abstract: The paper states it performs a 'systematic review' and delivers a 'clear roadmap' via synthesis of sensors, datasets, and methods. No section describes the literature search protocol (databases, keywords, date range, inclusion/exclusion criteria, or total papers screened). This is load-bearing for the central claim because the three-way dataset categorization and emphasis on VLMs/cooperative perception cannot be evaluated for selection bias or recency bias without such details.
Authors: We acknowledge that the manuscript does not include an explicit description of the literature search protocol, which limits the ability to assess selection or recency bias. Our survey is structured as a narrative synthesis emphasizing recent advances in multimodal fusion and VLM/LLM approaches rather than a formal PRISMA-style systematic review. To address this directly, we will add a dedicated subsection in the revised Introduction titled 'Literature Selection and Review Methodology.' This subsection will specify the primary databases (Google Scholar, arXiv, IEEE Xplore, and proceedings from CVPR/ECCV/ICCV), search keywords (e.g., 'object detection autonomous vehicles', 'multimodal sensor fusion', 'vision language models AV', 'cooperative perception V2X'), date range (2015-2024 with emphasis on 2020 onward for emerging paradigms), inclusion criteria (peer-reviewed papers or impactful preprints on sensor fusion, datasets, or LLM/VLM detection pipelines), and exclusion criteria (purely theoretical works without AV application or non-English sources). We believe this addition will allow readers to better evaluate the representativeness of the three-way dataset categorization and the roadmap presented. revision: yes
Circularity Check
No significant circularity in survey synthesis
full rationale
This paper is a literature survey synthesizing external work on AV object detection sensors, fusion strategies, ego/infrastructure/cooperative datasets, and transformer/VLM/LLM methods. Its central claim of delivering a 'clear roadmap of current capabilities, open challenges, and future opportunities' rests on cited independent sources rather than any internal equations, fitted parameters, or self-referential definitions that reduce to the paper's own inputs by construction. No mathematical derivations, self-citation load-bearing steps, or predictions equivalent to inputs appear in the abstract or described structure. The work is self-contained against external benchmarks via its references.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Knowledge in multimodal perception, contextual reasoning, and cooperative intelligence for AV object detection remains fragmented.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies... Next, we introduce a structured categorization of AV datasets... Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
B. B. ELallid, N. Benamar, M. Bagaa, N. Mrani, Secure and efficient vehicle control of autonomous vehicles using federated deep reinforcement learning, Applied Soft Computing (2025) 113924
work page 2025
-
[2]
A. Khosravian, M. Masih-Tehrani, A. Amirkhani, S. Ebrahimi-Nejad, Robust autonomous vehicle control by leveraging multi-stage mpc and quantized cnn in hil framework, Applied Soft Computing 162 (2024) 111802
work page 2024
- [3]
- [4]
-
[5]
K. Wang, C. Shen, X. Li, J. Lu, Uncertainty quantification for safe and reliable autonomous vehicles: A review of methods and applications, IEEE Transactions on Intelligent Transportation Systems (2025)
work page 2025
-
[6]
X. Chen, X. Wang, W. Zhao, C. Wang, S. Cheng, Z. Luan, Hierarchical deep reinforcement learning based multi- agent game control for energy consumption and traffic efficiency improving of autonomous vehicles, Energy 323 (2025) 135669
work page 2025
-
[7]
L. Zha, C. Gong, K. Lv, Real-time localization and navigation method for autonomous vehicles based on multi- modal data fusion by integrating memory transformer and ddqn, Image and Vision Computing 156 (2025) 105484. 55
work page 2025
-
[8]
W. Sun, H. Shao, J. Li, T. Wu, E. Z. Fainman, Multi-type traffic sensor location problem for origin–destination estimation considering spatiotemporal correlation and sensor failure, Transportation Research Part C: Emerging Technologies 179 (2025) 105288
work page 2025
-
[9]
X. Chen, S. P. H. Boroujeni, X. Shu, H. Li, A. Razi, Enhancing graph neural networks in large-scale traffic incident analysis with concurrency hypothesis, in: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, 2024, pp. 196–207
work page 2024
- [10]
-
[11]
A. D. Beza, Z. Xie, M. Ramezani, D. Levinson, From lane-less to lane-free: Implications in the era of automated vehicles, Transportation Research Part C: Emerging Technologies 170 (2025) 104898
work page 2025
-
[12]
K. Wang, J. Guo, K. Chen, J. Lu, An in-depth examination of slam methods: Challenges, advancements, and applications in complex scenes for autonomous driving, IEEE Transactions on Intelligent Transportation Systems (2025)
work page 2025
-
[13]
Y. Zha, W. Shangguan, J. Chen, L. Chai, W. Qiu, A. M. L´ opez, Heterogeneous multiscale cooperative perception for connected autonomous vehicles via v2x interaction, IEEE Internet of Things Journal (2025)
work page 2025
- [14]
-
[15]
R. Praveen, S. Hundekari, P. Parida, T. Mittal, A. Sehgal, M. Bhavana, Autonomous vehicle navigation systems: Machine learning for real-time traffic prediction, in: 2025 International Conference on Computational, Communi- cation and Information Technology (ICCCIT), IEEE, 2025, pp. 809–813
work page 2025
-
[16]
A. Mohammadi, R. Ahmari, V. Hemmati, F. Owusu-Ambrose, M. N. Mahmoud, P. Kebria, A. Homaifar, Detection of multiple small biased gps spoofing attacks on autonomous vehicles using time series analysis, IEEE Open Journal of Vehicular Technology (2025)
work page 2025
-
[17]
S. D. RS, S. D. Varshni, Embedded large language models for enhanced human-machine interface in autonomous vehicles, in: 2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI), IEEE, 2025, pp. 1143–1150
work page 2025
- [18]
-
[19]
A. Shrivastava, V. Kansal, A. Nagpal, K. K. Dixit, K. V. Rajkumar, et al., Ai-powered object detection for autonomous vehicles: A comparative study of machine learning models, in: 2025 International Conference on Computational, Communication and Information Technology (ICCCIT), IEEE, 2025, pp. 612–617
work page 2025
-
[20]
J. Subhedar, M. R. Bachute, Insights of semantic segmentation using the deeplab architecture for autonomous driving, MethodsX (2025) 103387
work page 2025
-
[21]
S. Chen, X. Li, K. Wang, J. Sun, B. Yang, Ranging research on telematics based on mask r-cnn dual eye stereo vision ranging algorithm, in: The International Conference Optoelectronic Information and Optical Engineering (OIOE2024), Vol. 13513, SPIE, 2025, pp. 884–889
work page 2025
-
[22]
S. P. H. Boroujeni, N. Mehrabi, F. Afghah, C. P. McGrath, D. Bhatkar, M. A. Biradar, A. Razi, Toward ai- driven fire imagery: Attributes, challenges, comparisons, and the promise of vlms and llms, Machine Learning with Applications (2025) 100763
work page 2025
-
[23]
Y. Tian, F. Lin, Y. Li, T. Zhang, Q. Zhang, X. Fu, J. Huang, X. Dai, Y. Wang, C. Tian, et al., Uavs meet llms: Overviews and perspectives towards agentic low-altitude mobility, Information Fusion 122 (2025) 103158
work page 2025
-
[24]
Z. Guo, Z. Yagudin, A. Lykov, M. Konenkov, D. Tsetserukou, Vlm-auto: Vlm-based autonomous driving assistant with human-like behavior and understanding for complex road scenes, in: 2024 2nd International Conference on Foundation and Large Language Models (FLLM), IEEE, 2024, pp. 501–507
work page 2024
-
[25]
Y. Wang, S. Wang, Y. Li, M. Liu, Developments in 3d object detection for autonomous driving: A review, IEEE Sensors Journal (2025)
work page 2025
-
[26]
H. Wang, J. Liu, H. Dong, Z. Shao, A survey of the multi-sensor fusion object detection task in autonomous driving, Sensors 25 (9) (2025) 2794
work page 2025
-
[27]
H. Wang, X. Chen, Q. Yuan, P. Liu, A review of 3d object detection based on autonomous driving, The Visual Computer 41 (3) (2025) 1757–1775
work page 2025
-
[28]
Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, L. Wang, Robustness-aware 3d object detection in autonomous driving: A review and outlook, IEEE Transactions on Intelligent Transportation Systems (2024)
work page 2024
-
[29]
S. Y. Alaba, A. C. Gurbuz, J. E. Ball, Emerging trends in autonomous vehicle perception: Multimodal fusion for 3d object detection, World Electric Vehicle Journal 15 (1) (2024) 20
work page 2024
-
[30]
Z. Zou, K. Chen, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: A survey, Proceedings of the IEEE 111 (3) (2023) 257–276
work page 2023
-
[31]
X. Ma, W. Ouyang, A. Simonelli, E. Ricci, 3d object detection from images for autonomous driving: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (5) (2023) 3537–3556
work page 2023
-
[32]
R. Qian, X. Lai, X. Li, 3d object detection for autonomous driving: A survey, Pattern Recognition 130 (2022) 108796
work page 2022
-
[33]
Y. Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y. Li, D. Cao, Deep learning for image and point cloud fusion in autonomous driving: A review, IEEE Transactions on Intelligent Transportation Systems 23 (2) (2021) 722–739
work page 2021
-
[34]
D. Feng, C. Haase-Sch¨ utz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, K. Dietmayer, Deep 56 multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and chal- lenges, IEEE Transactions on Intelligent Transportation Systems 22 (3) (2020) 1341–1360
work page 2020
-
[35]
J. Guo, U. Kurup, M. Shah, Is it safe to drive? an overview of factors, metrics, and datasets for driveability assessment in autonomous driving, IEEE Transactions on Intelligent Transportation Systems 21 (8) (2019) 3135– 3151
work page 2019
-
[36]
L. Hu, J. Zhang, J. Zhang, S. Cheng, Y. Wang, W. Zhang, N. Yu, Security analysis and adaptive false data injection against multi-sensor fusion localization for autonomous driving, Information Fusion 117 (2025) 102822
work page 2025
-
[37]
S. P. H. Boroujeni, A. Razi, S. Khoshdel, F. Afghah, J. L. Coen, L. O’Neill, P. Fule, A. Watts, N.-M. T. Kokolakis, K. G. Vamvoudakis, A comprehensive survey of research towards ai-enabled unmanned aerial systems in pre-, active-, and post-wildfire management, Information Fusion 108 (2024) 102369
work page 2024
-
[38]
H. Du, L. Ren, Y. Wang, X. Cao, C. Sun, Advancements in perception system with multi-sensor fusion for embodied agents, Information Fusion 117 (2025) 102859
work page 2025
-
[39]
Y. Wu, Fusion-based modeling of an intelligent algorithm for enhanced object detection using a deep learning approach on radar and camera data, Information Fusion 113 (2025) 102647
work page 2025
-
[40]
Y. Wu, J. Liu, M. Gong, Q. Miao, W. Ma, C. Xu, Joint semantic segmentation using representations of lidar point clouds and camera images, Information Fusion 108 (2024) 102370
work page 2024
-
[41]
S. Li, X. Li, H. Wang, Y. Zhou, Z. Shen, Multi-gnss ppp/ins/vision/lidar tightly integrated system for precise navigation in urban environments, Information Fusion 90 (2023) 218–232
work page 2023
-
[42]
N. Mehrabi, S. P. H. Boroujeni, Age estimation based on facial images using hybrid features and particle swarm optimization, in: 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), IEEE, 2021, pp. 412–418
work page 2021
-
[43]
A. Sarlak, H. Alzorgan, S. P. H. Boroujeni, A. Razi, R. Amin, Enhanced cooperative perception for autonomous vehicles using imperfect communication, in: 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), IEEE, 2024, pp. 700–707
work page 2024
-
[44]
D. Kent, M. Alyaqoub, X. Lu, H. Khatounabadi, K. Sung, C. Scheller, A. Dalat, A. bin Thabit, R. Whitley, H. Radha, Msu-4s-the michigan state university four seasons dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22658–22667
work page 2024
- [45]
-
[46]
M. Alibeigi, W. Ljungbergh, A. Tonderski, G. Hess, A. Lilja, C. Lindstr¨ om, D. Motorniuk, J. Fu, J. Widahl, C. Petersson, Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20178–20188
work page 2023
-
[47]
C. A. Diaz-Ruiz, Y. Xia, Y. You, J. Nino, J. Chen, J. Monica, X. Chen, K. Luo, Y. Wang, M. Emond, et al., Ithaca365: Dataset and driving perception under repeated and challenging weather conditions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21383–21392
work page 2022
- [48]
-
[49]
J.-L. D´ eziel, P. Merriaux, F. Tremblay, D. Lessard, D. Plourde, J. Stanguennec, P. Goulet, P. Olivier, Pixset: An opportunity for 3d computer vision to go beyond point clouds with a full-waveform lidar dataset, in: 2021 ieee international intelligent transportation systems conference (itsc), IEEE, 2021, pp. 2987–2993
work page 2021
-
[50]
P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, et al., Pandaset: Ad- vanced sensor suite dataset for autonomous driving, in: 2021 IEEE international intelligent transportation systems conference (ITSC), IEEE, 2021, pp. 3095–3101
work page 2021
- [51]
-
[52]
URLhttps://public.roboflow.com/object-detection/self-driving-car
Roboflow, Self-driving car dataset, accessed: 2025-02-28 (2025). URLhttps://public.roboflow.com/object-detection/self-driving-car
work page 2025
-
[53]
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, et al., Scalability in perception for autonomous driving: Waymo open dataset, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454
work page 2020
-
[54]
Q.-H. Pham, P. Sevestre, R. S. Pahwa, H. Zhan, C. H. Pang, Y. Chen, A. Mustafa, V. Chandrasekhar, J. Lin, A* 3d dataset: Towards autonomous driving in challenging environments, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 2267–2273
work page 2020
-
[55]
J. Bock, R. Krajewski, T. Moers, S. Runde, L. Vater, L. Eckstein, The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections, in: 2020 IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1929–1934.doi:10.1109/IV47402.2020.9304839
-
[56]
T. Moers, L. Vater, R. Krajewski, J. Bock, A. Zlocki, L. Eckstein, The exid dataset: A real-world trajectory dataset of highly interactive highway scenarios in germany, in: 2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 958–964.doi:10.1109/IV51971.2022.9827305
- [57]
-
[58]
M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, et al., Argoverse: 3d tracking and forecasting with rich maps, in: Proceedings of the IEEE/CVF conference on 57 computer vision and pattern recognition, 2019, pp. 8748–8757
work page 2019
-
[59]
J. Xue, J. Fang, T. Li, B. Zhang, P. Zhang, Z. Ye, J. Dou, Blvd: Building a large-scale 5d semantics benchmark for autonomous driving, in: 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019, pp. 6685–6691
work page 2019
-
[60]
A Commute in Data: The comma2k19 Dataset
H. Schafer, E. Santana, A. Haden, R. Biasini, A commute in data: The comma2k19 dataset, arXiv preprint arXiv:1812.05752 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [61]
- [62]
- [63]
- [64]
- [65]
-
[66]
X. Zhu, H. Sheng, S. Cai, B. Deng, S. Yang, Q. Liang, K. Chen, L. Gao, J. Song, J. Ye, Roscenes: A large-scale multi-view 3d dataset for roadside perception, in: European Conference on Computer Vision, Springer, 2024, pp. 331–347
work page 2024
- [67]
-
[68]
C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V. Lakshminarasimhan, A. Knoll, A9-dataset: Multi-sensor infrastructure-based dataset for mobility research, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 965–970
work page 2022
-
[69]
H. Wang, X. Zhang, Z. Li, J. Li, K. Wang, Z. Lei, R. Haibing, Ips300+: a challenging multi-modal data sets for intersection perception system, in: 2022 International Conference on Robotics and Automation (ICRA), IEEE, 2022, pp. 2539–2545
work page 2022
-
[70]
X. Ye, M. Shu, H. Li, Y. Shi, Y. Li, G. Wang, X. Tan, E. Ding, Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21341–21350
work page 2022
- [71]
- [72]
-
[73]
W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelle, et al., Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps, arXiv preprint arXiv:1910.03088 (2019)
-
[74]
Z. Tang, M. Naphade, M.-Y. Liu, X. Yang, S. Birchfield, S. Wang, R. Kumar, D. Anastasiu, J.-N. Hwang, Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8797–8806
work page 2019
-
[75]
A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dissanayake, N. Ahsan, Y. Li, F. S. Khan, H. Cholakkal, et al., Drivelmm-o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding, arXiv preprint arXiv:2503.10621 (2025)
-
[76]
K. Chen, Y. Li, W. Zhang, Y. Liu, P. Li, R. Gao, L. Hong, M. Tian, X. Zhao, Z. Li, et al., Automated evaluation of large vision-language models on self-driving corner cases, in: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, 2025, pp. 7817–7826
work page 2025
- [77]
-
[78]
C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, H. Li, Drivelm: Driving with graph visual question answering, in: European Conference on Computer Vision, Springer, 2024, pp. 256–274
work page 2024
- [79]
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.