Uncertainty-Aware Motion Planning for Autonomous Driving in Mixed Traffic Environment

Hao Chen; Ming Cheng; Senzhang Wang; Ziluowen Luo; Ziyi Yang

arxiv: 2606.09958 · v1 · pith:2SG3HTW7new · submitted 2026-06-08 · 💻 cs.RO · cs.AI

Uncertainty-Aware Motion Planning for Autonomous Driving in Mixed Traffic Environment

Ming Cheng , Hao Chen , Ziyi Yang , Ziluowen Luo , Senzhang Wang This is my paper

Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords autonomous drivingmotion planninguncertainty estimationmixed trafficintent predictionreinforcement learningvalue function correction

0 comments

The pith

Autonomous vehicles plan safer trajectories in mixed traffic by modeling uncertainty in human intent predictions instead of treating them as fixed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Uncertainty-Aware Motion Planning (UAMP) for autonomous vehicles sharing roads with human drivers. It first builds a proximity-aware estimator that measures how nearby interactions shape the uncertainty in each driver's future intent. These uncertainties are combined into a joint distribution over possible behaviors of surrounding vehicles. UAMP then applies Uncertainty-Calibrated Value Learning to remove biases that arise when uncertain predictions are fed directly into reinforcement-learning value functions. Experiments across mixed-traffic scenarios indicate that the resulting policies produce fewer unsafe maneuvers and smoother rides while preserving overall traffic flow.

Core claim

UAMP introduces a proximity-aware uncertainty estimator to quantify interaction-conditioned intent uncertainty, constructs an uncertainty-guided joint intent distribution over surrounding human-driven vehicles, and uses Uncertainty-Calibrated Value Learning (UCVL) to correct value-function biases that occur when uncertain human-intent predictions are incorporated directly into the observation.

What carries the argument

Uncertainty-Calibrated Value Learning (UCVL), which adjusts the value function to account for the distribution of possible human intents rather than single deterministic predictions.

Load-bearing premise

The proximity-aware uncertainty estimator accurately quantifies interaction-conditioned intent uncertainty so that the resulting uncertainty-guided joint intent distribution and UCVL correction produce safer decisions than treating predictions as deterministic.

What would settle it

A controlled simulation in which the uncertainty estimator systematically under- or over-estimates intent variance, after which collision rates and comfort metrics show no improvement over a deterministic baseline.

Figures

Figures reproduced from arXiv: 2606.09958 by Hao Chen, Ming Cheng, Senzhang Wang, Ziluowen Luo, Ziyi Yang.

**Figure 2.** Figure 2: Overview of the UAMP framework. UAMP estimates proximity-aware uncertainty in human intent prediction from an ego-centric [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation study results between UAMP with its two vari [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of performance metrics under varying au [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

In mixed-traffic environments where autonomous and human-driven vehicles may co-exist, motion planning for autonomous vehicles requires anticipating the future behaviors of surrounding human drivers. Existing reinforcement learning-based methods generally directly incorporate the predicted human intents into the observation to enable a proactive planning. However, human intent is inherently uncertain due to the behavioral diversity, perception noise, and partial observability. Treating predicted intends as deterministic states can result in unsafe decisions for autonomous vehicles. To address this problem, we propose Uncertainty-Aware Motion Planning (UAMP), which incorporates uncertainty in human intent prediction for AV decision-making. Specifically, UAMP first introduces a proximity-aware uncertainty estimator to quantify the interaction-conditioned intent uncertainty and constructs an uncertainty-guided joint intent distribution over surrounding human-driven vehicles. Within this uncertainty set, UAMP further introduces Uncertainty-Calibrated Value Learning (UCVL) to correct value function learning biases arising from directly incorporating uncertain human intent predictions into the observation. Extensive experiments in various mixed-traffic scenarios show that UAMP significantly improves safety and driving comfort, while maintaining traffic efficiency compared with existing approaches. The code is released at https://anonymous.4open.science/r/UAMP-5638.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UAMP adds uncertainty modeling to RL AV planners but lacks visible evidence for its claims.

read the letter

The punchline is that this work adds a proximity-aware uncertainty estimator and UCVL to handle uncertain human intent in RL-based AV motion planning, which addresses a real issue in mixed traffic. The approach tries to avoid unsafe decisions by not treating predictions as fixed.

What the paper does well is frame the problem clearly. Human intent uncertainty from diversity, noise, and partial observability is a practical barrier, and building a joint distribution over surrounding vehicles while calibrating the value function is a logical way to incorporate it. Releasing the code is also helpful for others to check the implementation.

The soft spots are in the support for the claims. The abstract states that extensive experiments show significant improvements in safety and comfort while keeping efficiency, but it supplies no quantitative results, no baseline comparisons, no statistical tests, and no ablation studies. This makes it impossible to verify if the new components are what drive any gains or if other factors are at play. The stress-test note is accurate on this point—the accuracy of the estimator and the bias correction from UCVL remain unverified from the available text. Without equations or details on how the estimator is trained or calibrated, the central claim rests on unshown work.

This paper is for researchers in autonomous driving and reinforcement learning for robotics. A reader interested in uncertainty-aware planning might find the module ideas worth exploring, but only after seeing the full experiments and code.

I would recommend sending it for peer review. The idea is relevant enough that referees should have a chance to examine the full method and results.

Referee Report

2 major / 1 minor

Summary. The paper proposes Uncertainty-Aware Motion Planning (UAMP) for AVs in mixed traffic. It introduces a proximity-aware uncertainty estimator that quantifies interaction-conditioned intent uncertainty arising from behavioral diversity, perception noise, and partial observability, then builds an uncertainty-guided joint intent distribution over surrounding human-driven vehicles. UAMP further adds Uncertainty-Calibrated Value Learning (UCVL) to correct value-function biases that arise when uncertain human-intent predictions are fed directly into the observation. The abstract states that extensive experiments demonstrate significant gains in safety and driving comfort while preserving traffic efficiency relative to existing RL-based planners; code is released.

Significance. If the quantitative claims hold, the work would provide a concrete mechanism for propagating intent uncertainty into RL-based planning rather than treating predictions as deterministic, which is a practically relevant direction for mixed-traffic autonomy. The public release of code is a clear positive for reproducibility.

major comments (2)

[Abstract] Abstract: the central claim that 'UAMP significantly improves safety and driving comfort' is asserted without any numerical results, baseline names, statistical tests, or ablation studies, so the support for the claim cannot be evaluated from the available text.
[Method] Method description: no equations, network architecture, training objective, or calibration metric are supplied for the proximity-aware uncertainty estimator or for the UCVL bias-correction term, preventing verification that the uncertainty-guided distribution and UCVL produce safer decisions than deterministic-intent baselines.

minor comments (1)

[Abstract] The abstract refers to 'existing approaches' without naming the specific RL baselines used for comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We will revise the paper to address the concerns about the abstract and method description, providing more concrete support for the claims and additional technical details to facilitate verification.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'UAMP significantly improves safety and driving comfort' is asserted without any numerical results, baseline names, statistical tests, or ablation studies, so the support for the claim cannot be evaluated from the available text.

Authors: We agree that the abstract would benefit from including more specific quantitative support. In the revised manuscript, we will update the abstract to incorporate key numerical results from the experiments section (including safety and comfort metrics), name the baselines used, and reference the ablation studies, while respecting length constraints. revision: yes
Referee: [Method] Method description: no equations, network architecture, training objective, or calibration metric are supplied for the proximity-aware uncertainty estimator or for the UCVL bias-correction term, preventing verification that the uncertainty-guided distribution and UCVL produce safer decisions than deterministic-intent baselines.

Authors: We acknowledge the need for explicit technical details to enable verification. We will expand the method section in the revision to include the equations for the proximity-aware uncertainty estimator and UCVL bias-correction term, along with the network architecture, training objective, and calibration metric, to demonstrate how these elements improve upon deterministic-intent baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces UAMP with a proximity-aware uncertainty estimator and UCVL correction as extensions to existing RL motion planners. No equations, derivations, or self-citations are shown that reduce the claimed safety/comfort gains to a fitted parameter renamed as prediction, a self-definitional loop, or a load-bearing uniqueness theorem from the same authors. The method is presented as building on but distinct from prior RL work, with improvements asserted via experiments rather than any closed-form reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities; all technical details remain at the level of high-level component names.

pith-pipeline@v0.9.1-grok · 5741 in / 1053 out tokens · 22569 ms · 2026-06-27T16:18:39.467772+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references

[1]

Forecasting trajectory and be- havior of road-agents using spectral clustering in graph- lstms.IEEE Robotics and Automation Letters, 5(3):4882– 4890,

[Chandraet al., 2020 ] Rohan Chandra, Tianrui Guan, Sru- jan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, and Dinesh Manocha. Forecasting trajectory and be- havior of road-agents using spectral clustering in graph- lstms.IEEE Robotics and Automation Letters, 5(3):4882– 4890,

2020
[2]

Mixed platoon con- trol of automated and human-driven vehicles at a signal- ized intersection: Dynamical analysis and optimal control

[Chenet al., 2021 ] Chaoyi Chen, Jiawei Wang, Qing Xu, Jianqiang Wang, and Keqiang Li. Mixed platoon con- trol of automated and human-driven vehicles at a signal- ized intersection: Dynamical analysis and optimal control. Transportation research part C: emerging technologies, 127:103138,

2021
[3]

Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic

[Chenet al., 2023 ] Dong Chen, Mohammad R Hajidavalloo, Zhaojian Li, Kaian Chen, Yongqiang Wang, Longsheng Jiang, and Yue Wang. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic. IEEE Transactions on Intelligent Transportation Systems, 24(11):11623–11638,

2023
[4]

Is independent learning all you need in the starcraft multi-agent challenge?arXiv preprint arXiv:2011.09533,

[De Wittet al., 2020 ] Christian Schroeder De Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip HS Torr, Mingfei Sun, and Shimon Whiteson. Is independent learning all you need in the starcraft multi-agent challenge?arXiv preprint arXiv:2011.09533,

arXiv 2020
[5]

A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to ai-guided driving policy learning

[Di and Shi, 2021] Xuan Di and Rongye Shi. A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to ai-guided driving policy learning. Transportation research part C: emerging technologies, 125:103008,

2021
[6]

Multi- agent reinforcement learning for autonomous vehicles: A survey.Autonomous Intelligent Systems, 2(1):27,

[Dinnewethet al., 2022 ] Joris Dinneweth, Abderrahmane Boubezoul, Ren ´e Mandiau, and St ´ephane Espi ´e. Multi- agent reinforcement learning for autonomous vehicles: A survey.Autonomous Intelligent Systems, 2(1):27,

2022
[7]

Traffic control in a mixed autonomy scenario at urban intersections: An optimal control approach

[Ghosh and Parisini, 2022] Arnob Ghosh and Thomas Parisini. Traffic control in a mixed autonomy scenario at urban intersections: An optimal control approach. IEEE Transactions on Intelligent Transportation Systems, 23(10):17325–17341,

2022
[8]

Inverse model predictive control (impc) based modeling and pre- diction of human-driven vehicles in mixed traffic.IEEE Transactions on Intelligent Vehicles, 6(3):501–512,

[Guo and Jia, 2020] Longxiang Guo and Yunyi Jia. Inverse model predictive control (impc) based modeling and pre- diction of human-driven vehicles in mixed traffic.IEEE Transactions on Intelligent Vehicles, 6(3):501–512,

2020
[9]

Mappo-pis: A multi-agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making

[Guoet al., 2024 ] Yicheng Guo, Jiaqi Liu, Rongjie Yu, Peng Hang, and Jian Sun. Mappo-pis: A multi-agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making. InEuropean Confer- ence on Computer Vision, pages 244–263. Springer,

2024
[10]

Model predictive control for optimal coordination of ramp metering and variable speed limits

[Hegyiet al., 2005 ] Andreas Hegyi, Bart De Schutter, and Hans Hellendoorn. Model predictive control for optimal coordination of ramp metering and variable speed limits. Transportation Research Part C: Emerging Technologies, 13(3):185–209,

2005
[11]

Learning-based adaptive optimal con- trol for connected vehicles in mixed traffic: Robustness to driver reaction time.IEEE transactions on cybernetics, 52(6):5267–5277,

[Huanget al., 2020 ] Mengzhe Huang, Zhong-Ping Jiang, and Kaan Ozbay. Learning-based adaptive optimal con- trol for connected vehicles in mixed traffic: Robustness to driver reaction time.IEEE transactions on cybernetics, 52(6):5267–5277,

2020
[12]

A survey on trajectory-prediction methods for autonomous driving

[Huanget al., 2022 ] Yanjun Huang, Jiatong Du, Ziru Yang, Zewei Zhou, Lin Zhang, and Hong Chen. A survey on trajectory-prediction methods for autonomous driving. IEEE transactions on intelligent vehicles, 7(3):652–674,

2022
[13]

Johansson, and Saurabh Amin

[Jinet al., 2021 ] Li Jin, Mladen ˇCiˇci´c, Karl H. Johansson, and Saurabh Amin. Analysis and design of vehicle pla- tooning operations on mixed-traffic highways.IEEE Transactions on Automatic Control, 66(10):4715–4730,

2021
[14]

Connected uav and cav coordination for improved road network safety and mobility

[Kavas-Torriset al., 2021] Ozgenur Kavas-Torris, Sukru Yaren Gelbal, Mustafa Ridvan Cantas, Bilin Aksun Guvenc, and Levent Guvenc. Connected uav and cav coordination for improved road network safety and mobility. Technical report, SAE Technical Paper,

2021
[15]

Al Sallab, Senthil Yo- gamani, and Patrick P ´erez

[Kiranet al., 2022 ] B Ravi Kiran, Ibrahim Sobh, Victor Tal- paert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yo- gamani, and Patrick P ´erez. Deep reinforcement learning for autonomous driving: A survey.IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926,

2022
[16]

Lane keeping assistance with learning-based driver model and model predictive control

[Lefevreet al., 2014 ] St´ephanie Lefevre, Yiqi Gao, Dizan Vasquez, H Eric Tseng, Ruzena Bajcsy, and Francesco Borrelli. Lane keeping assistance with learning-based driver model and model predictive control. In12th Inter- national Symposium on Advanced Vehicle Control,

2014
[17]

A survey on urban traffic control un- der mixed traffic environment with connected automated vehicles.Transportation research part C: emerging tech- nologies, 154:104258,

[Liet al., 2023 ] Jinjue Li, Chunhui Yu, Zilin Shen, Zicheng Su, and Wanjing Ma. A survey on urban traffic control un- der mixed traffic environment with connected automated vehicles.Transportation research part C: emerging tech- nologies, 154:104258,

2023
[18]

Microscopic traffic simulation using sumo

[Lopezet al., 2018 ] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun- Pang Fl ¨otter¨od, Robert Hilbrich, Leonhard L ¨ucken, Johannes Rummel, Peter Wagner, and Evamarie Wießner. Microscopic traffic simulation using sumo. InIEEE Intelligent Transportation Systems Conference, pages 2575–2582,

2018
[19]

Analysis of recurrent neural networks for probabilistic modeling of driver behavior

[Mortonet al., 2016 ] Jeremy Morton, Tim A Wheeler, and Mykel J Kochenderfer. Analysis of recurrent neural networks for probabilistic modeling of driver behavior. IEEE Transactions on Intelligent Transportation Systems, 18(5):1289–1298,

2016
[20]

Traffic flow dy- namics: Data, models and simulation.Physics Today, 67(3):54–54,

[Nishinari, 2014] Katsuhiro Nishinari. Traffic flow dy- namics: Data, models and simulation.Physics Today, 67(3):54–54,

2014
[21]

Evaluating the safety impact of connected and autonomous vehicles on mo- torways.Accident Analysis & Prevention, 124:12–22,

[Papadouliset al., 2019 ] Alkis Papadoulis, Mohammed Quddus, and Marianna Imprialou. Evaluating the safety impact of connected and autonomous vehicles on mo- torways.Accident Analysis & Prevention, 124:12–22,

2019
[22]

Hierarchical control strategy for cooperative on-ramp merging of connected and auto- mated vehicles on multi-lane highways.IEEE Internet of Things Journal,

[Penget al., 2025 ] Rui Peng, Min Yang, Rui Tao, Mingye Zhang, and Renjie Zhang. Hierarchical control strategy for cooperative on-ramp merging of connected and auto- mated vehicles on multi-lane highways.IEEE Internet of Things Journal,

2025
[23]

Modeling driver’s car-following behavior based on hidden markov model and model predictive con- trol: A cyber-physical system approach

[Quet al., 2017 ] Ting Qu, Shuyou Yu, Zhuqing Shi, and Hong Chen. Modeling driver’s car-following behavior based on hidden markov model and model predictive con- trol: A cyber-physical system approach. In11th Asian Control Conference (ASCC), pages 114–119. IEEE,

2017
[24]

[Rios-Torres and Malikopoulos, 2016] Jackeline Rios- Torres and Andreas A Malikopoulos. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps.IEEE Transactions on Intelligent Transportation Systems, 18(5):1066–1077,

2016
[25]

[Shiet al., 2025 ] Haotian Shi, Kunsong Shi, Keshu Wu, Wan Li, Yang Zhou, and Bin Ran. A predictive deep reinforce- ment learning based connected automated vehicle antici- patory longitudinal control in a mixed traffic lane change condition.IEEE Internet of Things Journal,

2025
[26]

Benchmarks for reinforcement learning in mixed- autonomy traffic

[Vinitskyet al., 2018 ] Eugene Vinitsky, Aboudy Kreidieh, Luc Le Flem, Nishant Kheterpal, Kathy Jang, Cathy Wu, Fangyu Wu, Richard Liaw, Eric Liang, and Alexandre M Bayen. Benchmarks for reinforcement learning in mixed- autonomy traffic. InConference on robot learning, pages 399–409. PMLR,

2018
[27]

Mvcm car-following model for connected vehicles and simulation-based traffic analysis in mixed traffic flow

[Wanget al., 2021 ] Shuyi Wang, Bin Yu, and Miyi Wu. Mvcm car-following model for connected vehicles and simulation-based traffic analysis in mixed traffic flow. IEEE Transactions on Intelligent Transportation Systems, 23(6):5267–5274,

2021
[28]

Multiagent reinforcement learning for ecological car- following control in mixed traffic.IEEE Transactions on Transportation Electrification, 10(4):8671–8684,

[Wanget al., 2024 ] Qun Wang, Fei Ju, Huaiyu Wang, Yahui Qian, Meixin Zhu, Weichao Zhuang, and Liangmo Wang. Multiagent reinforcement learning for ecological car- following control in mixed traffic.IEEE Transactions on Transportation Electrification, 10(4):8671–8684,

2024
[29]

[Xueet al., 2023 ] Yongjie Xue, Xiaokai Zhang, Zhiyong Cui, Bin Yu, and Kun Gao. A platoon-based coopera- tive optimal control for connected autonomous vehicles at highway on-ramps under heavy traffic.Transportation re- search part C: emerging technologies, 150:104083,

2023
[30]

A con- trol theoretic formulation of green driving strategies based on inter-vehicle communications.Transportation Re- search Part C: Emerging Technologies, 41:48–60,

[Yang and Jin, 2014] Hao Yang and Wen-Long Jin. A con- trol theoretic formulation of green driving strategies based on inter-vehicle communications.Transportation Re- search Part C: Emerging Technologies, 41:48–60,

2014
[31]

The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing sys- tems, 35:24611–24624,

[Yuet al., 2022 ] Chao Yu, Akash Velu, Eugene Vinitsky, Ji- axuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing sys- tems, 35:24611–24624,

2022
[32]

The scenarios are constructed by varying three key factors: traffic density,autonomous vehicles penetration level, and human driving behavior distribution

A Appendix A.1 Different Traffic Scenarios Mixed-Traffic Scenario Design in SUMOTo evaluate the performance of UAMP under diverse mixed-traffic condi- tions, we design a set of controlled traffic scenarios in SUMO. The scenarios are constructed by varying three key factors: traffic density,autonomous vehicles penetration level, and human driving behavior ...

2021

[1] [1]

Forecasting trajectory and be- havior of road-agents using spectral clustering in graph- lstms.IEEE Robotics and Automation Letters, 5(3):4882– 4890,

[Chandraet al., 2020 ] Rohan Chandra, Tianrui Guan, Sru- jan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, and Dinesh Manocha. Forecasting trajectory and be- havior of road-agents using spectral clustering in graph- lstms.IEEE Robotics and Automation Letters, 5(3):4882– 4890,

2020

[2] [2]

Mixed platoon con- trol of automated and human-driven vehicles at a signal- ized intersection: Dynamical analysis and optimal control

[Chenet al., 2021 ] Chaoyi Chen, Jiawei Wang, Qing Xu, Jianqiang Wang, and Keqiang Li. Mixed platoon con- trol of automated and human-driven vehicles at a signal- ized intersection: Dynamical analysis and optimal control. Transportation research part C: emerging technologies, 127:103138,

2021

[3] [3]

Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic

[Chenet al., 2023 ] Dong Chen, Mohammad R Hajidavalloo, Zhaojian Li, Kaian Chen, Yongqiang Wang, Longsheng Jiang, and Yue Wang. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic. IEEE Transactions on Intelligent Transportation Systems, 24(11):11623–11638,

2023

[4] [4]

Is independent learning all you need in the starcraft multi-agent challenge?arXiv preprint arXiv:2011.09533,

[De Wittet al., 2020 ] Christian Schroeder De Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip HS Torr, Mingfei Sun, and Shimon Whiteson. Is independent learning all you need in the starcraft multi-agent challenge?arXiv preprint arXiv:2011.09533,

arXiv 2020

[5] [5]

A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to ai-guided driving policy learning

[Di and Shi, 2021] Xuan Di and Rongye Shi. A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to ai-guided driving policy learning. Transportation research part C: emerging technologies, 125:103008,

2021

[6] [6]

Multi- agent reinforcement learning for autonomous vehicles: A survey.Autonomous Intelligent Systems, 2(1):27,

[Dinnewethet al., 2022 ] Joris Dinneweth, Abderrahmane Boubezoul, Ren ´e Mandiau, and St ´ephane Espi ´e. Multi- agent reinforcement learning for autonomous vehicles: A survey.Autonomous Intelligent Systems, 2(1):27,

2022

[7] [7]

Traffic control in a mixed autonomy scenario at urban intersections: An optimal control approach

[Ghosh and Parisini, 2022] Arnob Ghosh and Thomas Parisini. Traffic control in a mixed autonomy scenario at urban intersections: An optimal control approach. IEEE Transactions on Intelligent Transportation Systems, 23(10):17325–17341,

2022

[8] [8]

Inverse model predictive control (impc) based modeling and pre- diction of human-driven vehicles in mixed traffic.IEEE Transactions on Intelligent Vehicles, 6(3):501–512,

[Guo and Jia, 2020] Longxiang Guo and Yunyi Jia. Inverse model predictive control (impc) based modeling and pre- diction of human-driven vehicles in mixed traffic.IEEE Transactions on Intelligent Vehicles, 6(3):501–512,

2020

[9] [9]

Mappo-pis: A multi-agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making

[Guoet al., 2024 ] Yicheng Guo, Jiaqi Liu, Rongjie Yu, Peng Hang, and Jian Sun. Mappo-pis: A multi-agent proximal policy optimization method with prior intent sharing for cavs’ cooperative decision-making. InEuropean Confer- ence on Computer Vision, pages 244–263. Springer,

2024

[10] [10]

Model predictive control for optimal coordination of ramp metering and variable speed limits

[Hegyiet al., 2005 ] Andreas Hegyi, Bart De Schutter, and Hans Hellendoorn. Model predictive control for optimal coordination of ramp metering and variable speed limits. Transportation Research Part C: Emerging Technologies, 13(3):185–209,

2005

[11] [11]

Learning-based adaptive optimal con- trol for connected vehicles in mixed traffic: Robustness to driver reaction time.IEEE transactions on cybernetics, 52(6):5267–5277,

[Huanget al., 2020 ] Mengzhe Huang, Zhong-Ping Jiang, and Kaan Ozbay. Learning-based adaptive optimal con- trol for connected vehicles in mixed traffic: Robustness to driver reaction time.IEEE transactions on cybernetics, 52(6):5267–5277,

2020

[12] [12]

A survey on trajectory-prediction methods for autonomous driving

[Huanget al., 2022 ] Yanjun Huang, Jiatong Du, Ziru Yang, Zewei Zhou, Lin Zhang, and Hong Chen. A survey on trajectory-prediction methods for autonomous driving. IEEE transactions on intelligent vehicles, 7(3):652–674,

2022

[13] [13]

Johansson, and Saurabh Amin

[Jinet al., 2021 ] Li Jin, Mladen ˇCiˇci´c, Karl H. Johansson, and Saurabh Amin. Analysis and design of vehicle pla- tooning operations on mixed-traffic highways.IEEE Transactions on Automatic Control, 66(10):4715–4730,

2021

[14] [14]

Connected uav and cav coordination for improved road network safety and mobility

[Kavas-Torriset al., 2021] Ozgenur Kavas-Torris, Sukru Yaren Gelbal, Mustafa Ridvan Cantas, Bilin Aksun Guvenc, and Levent Guvenc. Connected uav and cav coordination for improved road network safety and mobility. Technical report, SAE Technical Paper,

2021

[15] [15]

Al Sallab, Senthil Yo- gamani, and Patrick P ´erez

[Kiranet al., 2022 ] B Ravi Kiran, Ibrahim Sobh, Victor Tal- paert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yo- gamani, and Patrick P ´erez. Deep reinforcement learning for autonomous driving: A survey.IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926,

2022

[16] [16]

Lane keeping assistance with learning-based driver model and model predictive control

[Lefevreet al., 2014 ] St´ephanie Lefevre, Yiqi Gao, Dizan Vasquez, H Eric Tseng, Ruzena Bajcsy, and Francesco Borrelli. Lane keeping assistance with learning-based driver model and model predictive control. In12th Inter- national Symposium on Advanced Vehicle Control,

2014

[17] [17]

A survey on urban traffic control un- der mixed traffic environment with connected automated vehicles.Transportation research part C: emerging tech- nologies, 154:104258,

[Liet al., 2023 ] Jinjue Li, Chunhui Yu, Zilin Shen, Zicheng Su, and Wanjing Ma. A survey on urban traffic control un- der mixed traffic environment with connected automated vehicles.Transportation research part C: emerging tech- nologies, 154:104258,

2023

[18] [18]

Microscopic traffic simulation using sumo

[Lopezet al., 2018 ] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun- Pang Fl ¨otter¨od, Robert Hilbrich, Leonhard L ¨ucken, Johannes Rummel, Peter Wagner, and Evamarie Wießner. Microscopic traffic simulation using sumo. InIEEE Intelligent Transportation Systems Conference, pages 2575–2582,

2018

[19] [19]

Analysis of recurrent neural networks for probabilistic modeling of driver behavior

[Mortonet al., 2016 ] Jeremy Morton, Tim A Wheeler, and Mykel J Kochenderfer. Analysis of recurrent neural networks for probabilistic modeling of driver behavior. IEEE Transactions on Intelligent Transportation Systems, 18(5):1289–1298,

2016

[20] [20]

Traffic flow dy- namics: Data, models and simulation.Physics Today, 67(3):54–54,

[Nishinari, 2014] Katsuhiro Nishinari. Traffic flow dy- namics: Data, models and simulation.Physics Today, 67(3):54–54,

2014

[21] [21]

Evaluating the safety impact of connected and autonomous vehicles on mo- torways.Accident Analysis & Prevention, 124:12–22,

[Papadouliset al., 2019 ] Alkis Papadoulis, Mohammed Quddus, and Marianna Imprialou. Evaluating the safety impact of connected and autonomous vehicles on mo- torways.Accident Analysis & Prevention, 124:12–22,

2019

[22] [22]

Hierarchical control strategy for cooperative on-ramp merging of connected and auto- mated vehicles on multi-lane highways.IEEE Internet of Things Journal,

[Penget al., 2025 ] Rui Peng, Min Yang, Rui Tao, Mingye Zhang, and Renjie Zhang. Hierarchical control strategy for cooperative on-ramp merging of connected and auto- mated vehicles on multi-lane highways.IEEE Internet of Things Journal,

2025

[23] [23]

Modeling driver’s car-following behavior based on hidden markov model and model predictive con- trol: A cyber-physical system approach

[Quet al., 2017 ] Ting Qu, Shuyou Yu, Zhuqing Shi, and Hong Chen. Modeling driver’s car-following behavior based on hidden markov model and model predictive con- trol: A cyber-physical system approach. In11th Asian Control Conference (ASCC), pages 114–119. IEEE,

2017

[24] [24]

[Rios-Torres and Malikopoulos, 2016] Jackeline Rios- Torres and Andreas A Malikopoulos. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps.IEEE Transactions on Intelligent Transportation Systems, 18(5):1066–1077,

2016

[25] [25]

[Shiet al., 2025 ] Haotian Shi, Kunsong Shi, Keshu Wu, Wan Li, Yang Zhou, and Bin Ran. A predictive deep reinforce- ment learning based connected automated vehicle antici- patory longitudinal control in a mixed traffic lane change condition.IEEE Internet of Things Journal,

2025

[26] [26]

Benchmarks for reinforcement learning in mixed- autonomy traffic

[Vinitskyet al., 2018 ] Eugene Vinitsky, Aboudy Kreidieh, Luc Le Flem, Nishant Kheterpal, Kathy Jang, Cathy Wu, Fangyu Wu, Richard Liaw, Eric Liang, and Alexandre M Bayen. Benchmarks for reinforcement learning in mixed- autonomy traffic. InConference on robot learning, pages 399–409. PMLR,

2018

[27] [27]

Mvcm car-following model for connected vehicles and simulation-based traffic analysis in mixed traffic flow

[Wanget al., 2021 ] Shuyi Wang, Bin Yu, and Miyi Wu. Mvcm car-following model for connected vehicles and simulation-based traffic analysis in mixed traffic flow. IEEE Transactions on Intelligent Transportation Systems, 23(6):5267–5274,

2021

[28] [28]

Multiagent reinforcement learning for ecological car- following control in mixed traffic.IEEE Transactions on Transportation Electrification, 10(4):8671–8684,

[Wanget al., 2024 ] Qun Wang, Fei Ju, Huaiyu Wang, Yahui Qian, Meixin Zhu, Weichao Zhuang, and Liangmo Wang. Multiagent reinforcement learning for ecological car- following control in mixed traffic.IEEE Transactions on Transportation Electrification, 10(4):8671–8684,

2024

[29] [29]

[Xueet al., 2023 ] Yongjie Xue, Xiaokai Zhang, Zhiyong Cui, Bin Yu, and Kun Gao. A platoon-based coopera- tive optimal control for connected autonomous vehicles at highway on-ramps under heavy traffic.Transportation re- search part C: emerging technologies, 150:104083,

2023

[30] [30]

A con- trol theoretic formulation of green driving strategies based on inter-vehicle communications.Transportation Re- search Part C: Emerging Technologies, 41:48–60,

[Yang and Jin, 2014] Hao Yang and Wen-Long Jin. A con- trol theoretic formulation of green driving strategies based on inter-vehicle communications.Transportation Re- search Part C: Emerging Technologies, 41:48–60,

2014

[31] [31]

The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing sys- tems, 35:24611–24624,

[Yuet al., 2022 ] Chao Yu, Akash Velu, Eugene Vinitsky, Ji- axuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games.Advances in neural information processing sys- tems, 35:24611–24624,

2022

[32] [32]

The scenarios are constructed by varying three key factors: traffic density,autonomous vehicles penetration level, and human driving behavior distribution

A Appendix A.1 Different Traffic Scenarios Mixed-Traffic Scenario Design in SUMOTo evaluate the performance of UAMP under diverse mixed-traffic condi- tions, we design a set of controlled traffic scenarios in SUMO. The scenarios are constructed by varying three key factors: traffic density,autonomous vehicles penetration level, and human driving behavior ...

2021