pith. machine review for the scientific record. sign in

arxiv: 2605.10904 · v1 · submitted 2026-05-11 · 💻 cs.RO

Recognition: no theorem link

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:52 UTC · model grok-4.3

classification 💻 cs.RO
keywords closed-loop evaluationcooperative drivingmulti-agent systemsV2X communicationautonomous driving benchmarkperception sharingnegotiationend-to-end planning
0
0 comments X

The pith

Multi-agent V2X systems outperform single-agent driving in closed-loop tests but perception sharing and negotiation show clear limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MDrive, a benchmark of 225 closed-loop scenarios drawn from crash typologies and real V2X data, to test whether connected vehicles that share perception and negotiate can drive better than isolated ones. Results indicate multi-agent setups win on average, yet shared sensor data often fails to improve actual driving decisions and negotiation boosts performance only until traffic gets dense and complex. This matters because real autonomous driving requires continuous interaction with other road users, and gaps in how information turns into safe actions could limit deployment. The work also supplies open tools for creating more such tests and converting real data into simulation.

Core claim

MDrive establishes that multi-agent cooperative systems are generally better than single-agent counterparts across 225 scenarios. Perception sharing improves what agents see but does not reliably produce better planning outputs. Negotiation helps planning in many cases yet reduces performance when traffic is complex and dense. The benchmark uses NHTSA pre-crash typologies and real-world V2X datasets to create closed-loop evaluations that capture interactive driving behavior.

What carries the argument

The MDrive benchmark, which runs end-to-end multi-agent driving systems in closed-loop simulation with perception sharing and negotiation protocols across 225 interactive scenarios.

If this is right

  • Cooperative driving methods can be pursued for measurable gains over isolated agents in interactive conditions.
  • Systems must address the disconnect between improved perception and downstream planning decisions.
  • Negotiation mechanisms require safeguards to prevent performance drops in high-density traffic.
  • The provided scenario generation and Real2Sim tools enable consistent reproduction and extension of the evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Better data-fusion techniques between shared perception and planning modules could close the observed performance gap.
  • Testing the same systems on physical vehicle fleets would show whether simulation findings hold under real sensor noise and latency.
  • The benchmark's structure could guide development of communication standards that prioritize planning-relevant information over raw sensor data.

Load-bearing premise

The chosen 225 scenarios are diverse and representative enough to support broad claims about the benefits and drawbacks of multi-agent cooperation in real driving.

What would settle it

A controlled experiment showing that single-agent systems match or exceed multi-agent performance across a new set of closed-loop scenarios with similar density and interaction would undermine the general superiority result.

Figures

Figures reproduced from arXiv: 2605.10904 by Angela Magtoto, Bolei Zhou, Chen Tang, Henry Wei, Jiaqi Ma, Johnson Liu, Marco Coscoy, Rui Song, Seth Z. Zhao, Walter Zimmer, Zewei Zhou, Zhiyu Huang.

Figure 1
Figure 1. Figure 1: MDrive Overview. MDrive is a closed-loop cooperative driving benchmark. To the best of our knowledge, it is the first benchmark to systematically evaluate the real benefits of multi-agent cooperation, including the perception sharing and decision negotiation. MDrive benchmarks include three scenario buckets: MDrive-V2XPnP Real2Sim scenario reconstructed from real-world V2X driving logs, MDrive-Intersection… view at source ↗
Figure 2
Figure 2. Figure 2: MDrive Representative Scenarios. The colorful lines in each scenario represent the target routes as the local planning reference for CAVs. (1-3) showcase MDrive-V2XPnP Real2Sim scenarios constructed from real-world driving logs, capturing authentic scenario layouts and agent behaviors. (4) illustrates an MDrive-Interdrive scenario, serving as an in-domain reference without background actors. The remaining … view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the MDrive Toolbox. An open-source toolbox designed to support benchmark construction and extension through three core modules:1) Human-in-the-Loop Simulation Interface that enables expert takeover via physical controllers to collect the realistic demonstrations; 2) Agentic Scenario Generation Pipeline that leverages agentic system as structured proposers to scalably generate, validate, and cur… view at source ↗
Figure 4
Figure 4. Figure 4: shows the multi-agent−single-agent gap rises from 3.54 on Static Avoidance scenarios to 35.12 on Dynamic Avoidance scenarios and 24.58 on Dynamic Coordination scenarios in terms of Harmonic mean between DS and SR. The gain concentrates exactly where the ego must reason about other agents’ intent under partial observability. Perception sharing prevents planning failure from lack of observability. CoDriving … view at source ↗
Figure 5
Figure 5. Figure 5: Failure Mode: Perception Sharing does not Guarantee Better Planning. The ego CAV receives the detections of the collision object at T=0, but still cannot avoid it at T=10. Negotiation difficulty increases substantially in complex and dense traffic scenarios. Negotiation offers an alternative cooperative path, where agents resolve conflicts through explicit communication rather than shared perception. Acros… view at source ↗
Figure 6
Figure 6. Figure 6: Correlation of Open-loop to Closed-loop Evaluation. We compare the open-loop metrics, such as AP50 and ADE, against closed-loop metrics, such as DS and SR, for a set of 9 planners. 5.1 RQ3: Open-loop to Closed-loop Evaluation Gap The experiments in this section are designed to highlight the necessity of closed-loop evaluation. Open-loop evaluations directly leverage static real-world driving logs and are s… view at source ↗
Figure 7
Figure 7. Figure 7: Cooperative Behavior Quality Evaluation with Human Demonstration. (a): statistics of metrics comparison between CoLMDriver and Human; (b): Visualization of where CoLMDriver fails under complex and ambiguous situations. 5.2 RQ4: Cooperative Behavior Quality We conduct a human study comparing human experts against CoLMDriver on representative scenarios to quantify the quality of the specific cooperative beha… view at source ↗
read the original abstract

Vehicle-to-Everything (V2X) communication has emerged as a promising paradigm for autonomous driving, enabling connected agents to share complementary perception information and negotiate with each other to benefit the final planning. Existing V2X benchmarks, however, fall short in two ways: (i) open-loop evaluations fail to capture the inherently closed-loop nature of driving, leading to evaluation gaps, and (ii) current closed-loop evaluations lack behavioral and interactive diversity to reflect real-world driving. Thus, it is still unclear the extent of benefits of multi-agent systems for closed-loop driving. In this paper, we introduce MDrive, a closed-loop cooperative driving benchmark comprising 225 scenarios grounded in both NHTSA pre-crash typologies and real-world V2X datasets. Our benchmark results demonstrate that multi-agent systems are generally better than single-agent counterparts. However, current multi-agent systems still face two important challenges: (i) perception sharing enhances perceptions, but doesn't always translate to better planning; (ii) negotiation improves planning performance but harms it in complex and dense traffic scenarios. MDrive further provides an open-source toolbox for scenario generation, Real2Sim conversion, and human-in-the-loop simulation. Together, MDrive establishes a reproducible foundation for evaluating and improving the generalization and robustness of cooperative driving systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MDrive, a closed-loop cooperative driving benchmark consisting of 225 scenarios grounded in NHTSA pre-crash typologies and real-world V2X datasets. It reports that multi-agent systems generally outperform single-agent counterparts in closed-loop settings, while identifying two challenges: (i) perception sharing improves perception accuracy but does not always translate to better planning, and (ii) negotiation improves planning performance but degrades it in complex and dense traffic scenarios. The work also releases an open-source toolbox for scenario generation, Real2Sim conversion, and human-in-the-loop simulation.

Significance. If the empirical results hold under closer scrutiny, MDrive fills a clear gap in V2X evaluation by moving beyond open-loop setups to closed-loop interactive driving with behavioral diversity. The explicit identification of perception-sharing and negotiation failure modes supplies concrete, actionable limitations for the community. The open-source toolbox for scenario generation and Real2Sim conversion is a genuine strength that directly supports reproducibility and future extensions.

major comments (2)
  1. [Abstract and results section] Abstract and results section: the claims that 'multi-agent systems are generally better than single-agent counterparts' and that the two specific challenges exist rest on comparative results, yet no details are supplied on the precise metrics (e.g., collision rate, progress, comfort), the single-agent and multi-agent baselines, statistical tests, or the protocol for selecting and executing the 225 scenarios. Without these, it is impossible to verify whether the evidence supports the stated general conclusions.
  2. [Benchmark construction section] Benchmark construction section: no quantitative coverage analysis (distribution over agent density, interaction complexity, or edge-case frequency) is provided for the 225 scenarios. This is load-bearing for the central claim that the observed perception-sharing and negotiation harms are intrinsic limits rather than potential artifacts of under-represented dense or highly interactive regimes.
minor comments (2)
  1. Add explicit definitions and measurement procedures for 'complex and dense traffic scenarios' in the results analysis so that the second challenge can be reproduced and tested by others.
  2. The toolbox release is welcome; include a README that documents the exact command sequence to regenerate the 225 scenarios used in the reported experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our MDrive benchmark paper. The comments highlight opportunities to improve clarity and verifiability of our empirical claims, and we will incorporate revisions accordingly while preserving the core contributions of the closed-loop evaluation and open-source toolbox.

read point-by-point responses
  1. Referee: [Abstract and results section] Abstract and results section: the claims that 'multi-agent systems are generally better than single-agent counterparts' and that the two specific challenges exist rest on comparative results, yet no details are supplied on the precise metrics (e.g., collision rate, progress, comfort), the single-agent and multi-agent baselines, statistical tests, or the protocol for selecting and executing the 225 scenarios. Without these, it is impossible to verify whether the evidence supports the stated general conclusions.

    Authors: We agree that additional explicit details in the abstract and results section would strengthen verifiability of the claims. The manuscript describes the evaluation metrics (collision rate, progress, and comfort) and baselines in the experimental setup, with single-agent systems using standard end-to-end planners and multi-agent systems incorporating perception sharing and negotiation; scenario selection follows NHTSA pre-crash typologies matched to real-world V2X data. However, to directly address the concern, we will expand the results section with a summary table of all metrics and baselines, a clear description of the execution protocol, and reporting of statistical significance (paired t-tests with p-values) across the 225 scenarios. We will also update the abstract to reference these elements concisely. revision: yes

  2. Referee: [Benchmark construction section] Benchmark construction section: no quantitative coverage analysis (distribution over agent density, interaction complexity, or edge-case frequency) is provided for the 225 scenarios. This is load-bearing for the central claim that the observed perception-sharing and negotiation harms are intrinsic limits rather than potential artifacts of under-represented dense or highly interactive regimes.

    Authors: We acknowledge that a quantitative coverage analysis is important to substantiate that the identified challenges (perception-sharing not always improving planning, and negotiation harming performance in dense traffic) reflect intrinsic limits rather than sampling artifacts. The current manuscript grounds the 225 scenarios in NHTSA typologies and real-world V2X datasets to promote diversity, but does not include explicit distributions. In the revision, we will add a new subsection with quantitative analysis, including histograms and statistics on agent density, interaction complexity (e.g., number of interacting pairs), and edge-case frequency, to demonstrate coverage and support the generalizability of our findings on the two challenges. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical benchmark

full rationale

The paper introduces an empirical benchmark (MDrive) consisting of 225 scenarios grounded in external NHTSA pre-crash typologies and real-world V2X datasets, then reports comparative simulation results between multi-agent and single-agent systems. No equations, fitted parameters, derivations, or self-referential definitions appear in the provided text. Claims about multi-agent benefits and specific challenges rest directly on the benchmark outputs rather than reducing to the paper's own inputs by construction. The work is self-contained as a benchmark release with no load-bearing self-citation chains or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the chosen scenarios and closed-loop simulation faithfully capture real-world cooperative driving interactions. No free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Closed-loop simulation of multi-agent driving accurately reflects real-world interactive behavior for the purpose of benchmarking.
    Invoked when claiming that existing open-loop evaluations create evaluation gaps and that the new benchmark addresses them.

pith-pipeline@v0.9.0 · 5569 in / 1361 out tokens · 50749 ms · 2026-05-12T03:52:21.863825+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 6 internal anchors

  1. [1]

    Cooptrack: Exploring end-to-end learning for efficient cooperative sequential perception

    Jiaru Zhong, Jiahao Wang, Jiahui Xu, Xiaofan Li, Zaiqing Nie, and Haibao Yu. Cooptrack: Exploring end-to-end learning for efficient cooperative sequential perception. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26954–26965, 2025

  2. [2]

    Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication

    Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In2022 International Conference on Robotics and Automation (ICRA), pages 2583–2589. IEEE, 2022

  3. [3]

    V2x-sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving

    Yiming Li, Dekun Ma, Ziyan An, Zixun Wang, Yiqi Zhong, Siheng Chen, and Chen Feng. V2x-sim: Multi-agent collaborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Automation Letters, 7(4):10914–10921, 2022

  4. [4]

    Collaborative semantic occupancy prediction with hybrid feature fusion in connected automated vehicles

    Rui Song, Chenwei Liang, Hu Cao, Zhiran Yan, Walter Zimmer, Markus Gross, Andreas Festag, and Alois Knoll. Collaborative semantic occupancy prediction with hybrid feature fusion in connected automated vehicles. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17996–18006, 2024

  5. [5]

    V2XPnP: Vehicle-to-everything spatio-temporal fusion for multi-agent perception and prediction.Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

    Zewei Zhou, Hao Xiang, Zhaoliang Zheng, Seth Z Zhao, Mingyue Lei, Yun Zhang, Tianhui Cai, Xinyi Liu, Johnson Liu, Maheswari Bajji, et al. V2XPnP: Vehicle-to-everything spatio-temporal fusion for multi-agent perception and prediction.Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025

  6. [6]

    Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan, Xingcheng Zhou, Rui Song, and Alois C. Knoll. Tumtraf v2x cooperative perception dataset. In2024 IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF, 2024

  7. [7]

    Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, and Jiaqi Ma

    Hao Xiang, Zhaoliang Zheng, Xin Xia, Seth Z. Zhao, Letian Gao, Zewei Zhou, Tianhui Cai, Yun Zhang, and Jiaqi Ma. V2x-realo: An open online framework and dataset for cooperative perception in reality, 2025. URLhttps://arxiv.org/abs/2503.10034

  8. [8]

    Quantv2x: A fully quantized multi-agent system for cooperative perception.arXiv preprint arXiv:2509.03704, 2025

    Seth Z Zhao, Huizhi Zhang, Zhaowei Li, Juntong Peng, Anthony Chui, Zewei Zhou, Zonglin Meng, Hao Xiang, Zhiyu Huang, Fujia Wang, et al. Quantv2x: A fully quantized multi-agent system for cooperative perception.arXiv preprint arXiv:2509.03704, 2025

  9. [9]

    Coopre: Cooperative pretraining for v2x cooperative perception.arXiv preprint arXiv:2408.11241, 2024

    Seth Z Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, and Jiaqi Ma. Coopre: Cooperative pretraining for v2x cooperative perception.arXiv preprint arXiv:2408.11241, 2024

  10. [10]

    Turbotrain: Towards efficient and balanced multi-task learning for multi-agent perception and prediction

    Zewei Zhou, Seth Z Zhao, Tianhui Cai, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Turbotrain: Towards efficient and balanced multi-task learning for multi-agent perception and prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4391–4402, 2025

  11. [11]

    End-to-end autonomous driving through v2x cooperation

    Haibao Yu, Wenxian Yang, Jiaru Zhong, Zhenwei Yang, Siqi Fan, Ping Luo, and Zaiqing Nie. End-to-end autonomous driving through v2x cooperation. InProceedings of the AAAI conference on artificial intelligence, volume 39, pages 9598–9606, 2025

  12. [12]

    CooperRisk: A driving risk quantification pipeline with multi-agent cooperative perception and prediction.arXiv preprint arXiv:2506.15868, 2025

    Mingyue Lei, Zewei Zhou, Hongchen Li, Jia Hu, and Jiaqi Ma. CooperRisk: A driving risk quantification pipeline with multi-agent cooperative perception and prediction.arXiv preprint arXiv:2506.15868, 2025

  13. [13]

    mmcooper: A multi-agent multi-stage communication-efficient and collaboration-robust cooperative perception framework

    Bingyi Liu, Jian Teng, Hongfei Xue, Enshu Wang, Chuanhui Zhu, Pu Wang, and Libing Wu. mmcooper: A multi-agent multi-stage communication-efficient and collaboration-robust cooperative perception framework. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28396–28406, 2025

  14. [14]

    Mic-bev: Multi-infrastructure camera bird’s-eye-view transformer with relation-aware fusion for 3d object detection.arXiv preprint arXiv:2510.24688, 2025

    Yun Zhang, Zhaoliang Zheng, Johnson Liu, Zhiyu Huang, Zewei Zhou, Zonglin Meng, Tianhui Cai, and Jiaqi Ma. Mic-bev: Multi-infrastructure camera bird’s-eye-view transformer with relation-aware fusion for 3d object detection.arXiv preprint arXiv:2510.24688, 2025. 11

  15. [15]

    Risk map as middleware: Toward interpretable cooperative end-to-end autonomous driving for risk-aware planning.IEEE Robotics and Automation Letters, 11(1):818–825, 2025

    Mingyue Lei, Zewei Zhou, Hongchen Li, Jiaqi Ma, and Jia Hu. Risk map as middleware: Toward interpretable cooperative end-to-end autonomous driving for risk-aware planning.IEEE Robotics and Automation Letters, 11(1):818–825, 2025

  16. [16]

    BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

    Seth Z Zhao, Luobin Wang, Hongwei Ruan, Yuxin Bao, Yilan Chen, Ziyang Leng, Abhijit Ravichandran, Honglin He, Zewei Zhou, Xu Han, et al. Bridgesim: Unveiling the ol-cl gap in end-to-end autonomous driving.arXiv preprint arXiv:2604.10856, 2026

  17. [17]

    Beyond behavior cloning in autonomous driving: a survey of closed-loop training techniques.Authorea Preprints

    Peter Karkus, Maximilian Igl, Yuxiao Chen, Kashyap Chitta, Jef Packer, Bertrand Douillard, Ran Tian, Alexander Naumann, Guillermo Garcia-Cobo, Shuhan Tan, et al. Beyond behavior cloning in autonomous driving: a survey of closed-loop training techniques.Authorea Preprints

  18. [18]

    Is ego status all you need for open-loop end-to-end autonomous driving? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14864–14873, 2024

    Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, and Jose M Alvarez. Is ego status all you need for open-loop end-to-end autonomous driving? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14864–14873, 2024

  19. [19]

    Bench2drive: Towards multi-ability evaluation for end-to-end autonomous driving

    Shaocong Jia, Penghan Wu, Li Chen, Jiazhao Jiang, Junchi Yan, and Hongyang Li. Bench2drive: Towards multi-ability evaluation for end-to-end autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16413– 16423, 2024

  20. [20]

    CoLMDriver: LLM-based negotiation benefits cooperative autonomous driving.arXiv preprint arXiv:2503.08683, 2025

    Changxing Liu, Genjia Liu, Zijun Wang, Jinchang Yang, and Siheng Chen. Colmdriver: Llm- based negotiation benefits cooperative autonomous driving.arXiv preprint arXiv:2503.08683, 2025

  21. [21]

    Coopreflect: Towards natural language communication for cooperative autonomous driving via multiagent learning

    Jiaxun Cui, Chen Tang, Jarrett Holtz, Janice Nguyen, Alessandro G Allievi, Hang Qiu, and Peter Stone. Coopreflect: Towards natural language communication for cooperative autonomous driving via multiagent learning. InInternational Conference on Autonomous Agents and Multi-Agent Systems, 2026

  22. [22]

    Pre-crash scenario typology for crash avoidance research

    Wassim G Najm, John D Smith, and Mikio Yanagisawa. Pre-crash scenario typology for crash avoidance research. 2007

  23. [23]

    LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

    Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, and Kashyap Chitta. Lead: Minimizing learner-expert asymmetry in end-to-end driving.arXiv preprint arXiv:2512.20563, 2025

  24. [24]

    Fail2Drive: Benchmarking Closed-Loop Driving Generalization

    Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Fail2drive: Benchmarking closed-loop driving generalization.arXiv preprint arXiv:2604.08535, 2026

  25. [25]

    nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810,

    Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

  26. [26]

    Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

  27. [27]

    Pseudo-simulation for autonomous driving

    Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Pseudo-simulation for autonomous driving. InConference on Robot Learning (CoRL), 2025

  28. [28]

    Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving.arXiv preprint arXiv:2412.01718, 2024

    Hongyu Zhou, Longzhong Lin, Jiabao Wang, Yichong Lu, Dongfeng Bai, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. Hugsim: A real-time, photo-realistic and closed-loop simulator for autonomous driving.arXiv preprint arXiv:2412.01718, 2024

  29. [29]

    Drivearena: A closed-loop generative simulation platform for autonomous driving.arXiv preprint arXiv:2408.00415, 2024

    Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, and Yu Qiao. Drivearena: A closed-loop generative simulation platform for autonomous driving.arXiv preprint arXiv:2408.00415, 2024. 12

  30. [31]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  31. [32]

    Pointpillars: Fast encoders for object detection from point clouds

    Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019

  32. [33]

    MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying,

    Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.arXiv preprint arXiv:2306.17770, 2023

  33. [34]

    Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023

    Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Qcnext: A next-generation framework for joint multi-agent trajectory prediction.arXiv preprint arXiv:2306.10508, 2023

  34. [35]

    A comprehensive study of speed prediction in transportation system: From vehicle to traffic

    Zewei Zhou, Ziru Yang, Yuanjian Zhang, Yanjun Huang, Hong Chen, and Zhuoping Yu. A comprehensive study of speed prediction in transportation system: From vehicle to traffic. Iscience, 25(3), 2022

  35. [36]

    A reliable path planning method for lane change based on hybrid pso-iaco algorithm

    Zewei Zhou, Zhuoping Yu, Lu Xiong, Dequan Zeng, Zhiqiang Fu, Zhuoren Li, and Bo Leng. A reliable path planning method for lane change based on hybrid pso-iaco algorithm. In2021 6th International Conference on Transportation Information and Safety (ICTIS), pages 1253–1258. IEEE, 2021

  36. [37]

    Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning.arXiv preprint arXiv:2410.05582, 2024

    Zhiyu Huang, Xinshuo Weng, Maximilian Igl, Yuxiao Chen, Yulong Cao, Boris Ivanovic, Marco Pavone, and Chen Lv. Gen-drive: Enhancing diffusion generative driving policies with reward modeling and reinforcement learning fine-tuning.arXiv preprint arXiv:2410.05582, 2024

  37. [38]

    End-to-end autonomous driving: Challenges and frontiers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  38. [39]

    Driving on registers.preprint, 2026

    Ellington Kirby, Alexandre Boulch, Yihong Xu, Yuan Yin, Gilles Puy, Eloi Zablocki, Andrei Bursuc, Spyros Gidaris, Renaud Marlet, Florent Bartoccioni, Anh-Quan Cao, Nermin Samet, Tuan-Hung Vu, and Matthieu Cord. Driving on registers.preprint, 2026

  39. [40]

    RAP: 3D rasterization augmented end-to-end planning.arXiv preprint arXiv:2510.04333, 2025

    Lan Feng, Yang Gao, Eloi Zablocki, Quanyi Li, Wuyang Li, Sichao Liu, Matthieu Cord, and Alexandre Alahi. Rap: 3d rasterization augmented end-to-end planning, 2025. URL https://arxiv.org/abs/2510.04333

  40. [41]

    Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. AutoVLA: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.Advances in Neural Information Processing Systems (NeurIPS), 2025

  41. [42]

    SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

    Zewei Zhou, Ruining Yang, Yiluan Guo, Sherry X Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma, et al. SpanVLA: Efficient action bridging and learning from negative-recovery samples for vision-language-action model.arXiv preprint arXiv:2604.19710, 2026

  42. [43]

    DiffusionDriveV2: Reinforcement learning-constrained truncated diffusion modeling in end-to-end autonomous driving.arXiv preprint arXiv:2512.07745, 2025

    Jialv Zou, Shaoyu Chen, Bencheng Liao, Zhiyu Zheng, Yuehao Song, Lefei Zhang, Qian Zhang, Wenyu Liu, and Xinggang Wang. Diffusiondrivev2: Reinforcement learning-constrained trun- cated diffusion modeling in end-to-end autonomous driving.arXiv preprint arXiv:2512.07745, 2025

  43. [44]

    2025 waymo open dataset challenge: Vision-based end-to-end driving

    o Research. 2025 waymo open dataset challenge: Vision-based end-to-end driving. https: //waymo.com/open/challenges/2025/e2e-driving/, 2025. Accessed: 2025-04-25. 13

  44. [45]

    Learning distilled collaboration graph for multi-agent perception

    Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning distilled collaboration graph for multi-agent perception. InThirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021), 2021

  45. [46]

    Learning distilled collaboration graph for multi-agent perception.Advances in Neural Information Processing Systems, 34:29541–29552, 2021

    Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning distilled collaboration graph for multi-agent perception.Advances in Neural Information Processing Systems, 34:29541–29552, 2021

  46. [47]

    Where2comm: communication-efficient collaborative perception via spatial confidence maps

    Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. Where2comm: communication-efficient collaborative perception via spatial confidence maps. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc. ISBN 9781713871088

  47. [48]

    An extensible framework for open heterogeneous collaborative perception

    Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Siheng Chen, and Yanfeng Wang. An extensible framework for open heterogeneous collaborative perception. InThe Twelfth International Conference on Learning Representations, 2024

  48. [49]

    URL https://doi.org/10.1109/ICRA48891.2023.10160591

    Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, and Jiaqi Ma. V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3584–3591, 2023. doi: 10.1109/ ICRA48891.2023.10161384

  49. [50]

    Coopre: Cooperative pretraining for v2x cooperative perception

    Seth Z Zhao, Hao Xiang, Chenfeng Xu, Xin Xia, Bolei Zhou, and Jiaqi Ma. Coopre: Cooperative pretraining for v2x cooperative perception. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11765–11772. IEEE, 2025

  50. [51]

    Flow-based feature fusion for vehicle-infrastructure cooperative 3d object detection.Advances in Neural Information Processing Systems, 36, 2024

    Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Ping Luo, and Zaiqing Nie. Flow-based feature fusion for vehicle-infrastructure cooperative 3d object detection.Advances in Neural Information Processing Systems, 36, 2024

  51. [52]

    V2v-got: Vehicle-to-vehicle cooperative autonomous driving with multimodal large language models and graph-of-thoughts.arXiv preprint arXiv:2509.18053, 2025

    Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, and Stephen F Smith. V2v-got: Vehicle-to-vehicle cooperative autonomous driving with multimodal large language models and graph-of-thoughts.arXiv preprint arXiv:2509.18053, 2025

  52. [53]

    Junwei You, Pei Li, Zhuoyu Jiang, Weizhe Tang, Zilin Huang, Rui Gan, Jiaxi Liu, Yan Zhao, Sikai Chen, and Bin Ran. V2x-qa: A comprehensive reasoning dataset and benchmark for multimodal large language models in autonomous driving across ego, infrastructure, and cooperative views.arXiv preprint arXiv:2604.02710, 2026

  53. [54]

    Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey

    Jingda Wu, Chao Huang, Hailong Huang, Chen Lv, Yuntong Wang, and Fei-Yue Wang. Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey. Transportation Research Part C: Emerging Technologies, 164:104654, 2024

  54. [55]

    CARLA: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InProceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017

  55. [56]

    Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning

    Quanyi Li, Zhenghao Peng, Lan Feng, Qihang Zhang, Zhenghai Xue, and Bolei Zhou. Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning. TPAMI, 2022

  56. [57]

    Deepdrive: a simulator that allows anyone with a pc to push the state-of-the-art in self-driving.https://github.com/deepdrive/deepdrive

    Deepdrive Team. Deepdrive: a simulator that allows anyone with a pc to push the state-of-the-art in self-driving.https://github.com/deepdrive/deepdrive

  57. [58]

    RAD: Training an end-to-end driving policy via large-scale 3DGS-based reinforcement learning

    Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, haoran yin, Xiangyu Li, xinbang zhang, ying zhang, Wenyu Liu, Qian Zhang, and Xinggang Wang. RAD: Training an end-to-end driving policy via large-scale 3DGS-based reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 14

  58. [59]

    Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21634–21643, 2024

  59. [60]

    Coda-4dgs: Dynamic gaussian splatting with context and deformation awareness for autonomous driving

    Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, and Alois Knoll. Coda-4dgs: Dynamic gaussian splatting with context and deformation awareness for autonomous driving. InIEEE/CVF International Conference on Computer Vision (ICCV). IEEE/CVF, 2025

  60. [61]

    Street gaussians: Modeling dynamic urban scenes with gaussian splatting

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InEuropean Conference on Computer Vision, pages 156–173. Springer, 2024

  61. [62]

    EnerGS: Energy-Based Gaussian Splatting with Partial Geometric Priors

    Rui Song, Tianhui Cai, Markus Gross, Yun Zhang, Walter Zimmer, Zhiyu Huang, Olaf Wysocki, and Jiaqi Ma. Energs: Energy-based gaussian splatting with partial geometric priors.arXiv preprint arXiv:2604.26238, 2026

  62. [63]

    Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering.International Journal of Computer Vision, 134(3):83, 2026

    Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, and Li Zhang. Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering.International Journal of Computer Vision, 134(3):83, 2026

  63. [64]

    Opencda: an open coopera- tive driving automation framework integrated with co-simulation

    Runsheng Xu, Yi Guo, Xu Han, Xin Xia, Hao Xiang, and Jiaqi Ma. Opencda: an open coopera- tive driving automation framework integrated with co-simulation. In2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 1155–1162. IEEE, 2021

  64. [65]

    Towards collaborative autonomous driving: Simulation platform and end-to-end system.IEEE transactions on pattern analysis and machine intelligence, 2025

    Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang, et al. Towards collaborative autonomous driving: Simulation platform and end-to-end system.IEEE transactions on pattern analysis and machine intelligence, 2025

  65. [66]

    Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling

    Quanyi Li, Zhenghao Peng, Lan Feng, Zhizheng Liu, Chenda Duan, Wenjie Mo, and Bolei Zhou. Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling. Advances in Neural Information Processing Systems, 2023

  66. [67]

    Bevdriver: Leveraging bev maps in llms for robust closed-loop driving.arXiv preprint arXiv:2503.03074, 2025

    Katharina Winter, Mark Azer, and Fabian B Flohr. Bevdriver: Leveraging bev maps in llms for robust closed-loop driving.arXiv preprint arXiv:2503.03074, 2025

  67. [68]

    Carla autonomous driving leaderboard

    CARLA Team. Carla autonomous driving leaderboard. https://leaderboard.carla.org/ leaderboard/, 2026. Accessed: 2026-05-01

  68. [69]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, and Hongyang Li. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

  69. [70]

    Vad: Vectorized scene representation for efficient autonomous driving.ICCV, 2023

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving.ICCV, 2023

  70. [71]

    Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022

    Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022

  71. [72]

    Lmdrive: Closed-loop end-to-end driving with large language models

    Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15120–15130, 2024

  72. [73]

    nuscenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 15

  73. [74]

    Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan

    Yiheng Li, Seth Z. Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan. Pre-training on synthetic driving data for trajectory prediction. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5910–5917, 2024. doi: 10.1109/IROS58592.2024.10802492

  74. [75]

    F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds

    Qi Chen, Xu Ma, Sihai Tang, Jingda Guo, Qing Yang, and Song Fu. F-cooper: Feature based cooperative perception for autonomous vehicle edge computing system using 3d point clouds. InProceedings of the 4th ACM/IEEE Symposium on Edge Computing, pages 88–100, 2019

  75. [76]

    Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers.arXiv preprint arXiv:2207.02202, 2022

    Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, and Jiaqi Ma. Cobevt: Cooperative bird’s eye view semantic segmentation with sparse transformers.arXiv preprint arXiv:2207.02202, 2022

  76. [77]

    Urbanverse: Scaling urban simulation by watching city-tour videos

    Mingxuan Liu, Honglin He, Elisa Ricci, Wayne Wu, and Bolei Zhou. Urbanverse: Scaling urban simulation by watching city-tour videos. InThe Fourteenth International Conference on Learning Representations, 2026. 16 MDriveAppendices A Additional Details of MDrive Scenarios 17 A.1 Scenario Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

  77. [78]

    adaptive

    Pre-crash: These scenarios are crafted and grounded by National Highway Traffic Safety Admin- istration (NHTSA) guidance, focusing on challenging scenarios with occlusion, limited perception range, and dangerous interaction behavior. 2.Blocked Lane Obstacle: Diverse static obstacle block the travel lane. 3.Construction Zone: Work-zone lane constriction. 1...

  78. [79]

    The perception mostly misses on far-range perception, where near-range perception are necessary for planning, thus a lack of far-range perceptions might not hurt necessary planning decisions; 2) the other models are more rule-based planning agents connecting to front-end perception results, which might not exhibit outstanding performance as learning-based...