Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy

Hong Zhang; Liwei Deng; Shiyao Zhang; Shuyu Zhang; Weijie Yuan

arxiv: 2510.11041 · v2 · submitted 2025-10-13 · 💻 cs.RO

Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy

Shiyao Zhang , Liwei Deng , Shuyu Zhang , Weijie Yuan , Hong Zhang This is my paper

Pith reviewed 2026-05-18 08:17 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous vehiclescooperative planningdeep reinforcement learninguncertainty handlingmulti-agent systemsmotion planning

0 comments

The pith

A reinforcement learning approach lets autonomous vehicles plan joint motions even when their state data is incomplete or noisy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that trains autonomous vehicles to make cooperative movement decisions while coping with gaps and errors in what each vehicle can see, decide, or share with others. It uses a specific reinforcement learning algorithm combined with recurrent processing to produce time-changing actions from imperfect information. Tests in a driving simulator show the learned strategy beats several comparison methods across varied traffic situations. A reader would care because safer and smoother multi-vehicle behavior could reduce collisions and delays in future roads filled with self-driving cars. The work implies that learning methods can turn real-world messiness into manageable inputs rather than requiring perfect data first.

Core claim

The proposed deep reinforcement learning-based autonomous cooperative planning framework learns deterministic optimal time-varying actions for autonomous vehicles using soft actor-critic with gate recurrent units, enabling effective cooperative motion planning under imperfect state information from perception, planning, and communication uncertainties, and demonstrates superior performance over baseline methods in simulation.

What carries the argument

Soft actor-critic algorithm augmented with recurrent units that convert noisy, incomplete vehicle states into coordinated real-time actions across multiple vehicles.

Load-bearing premise

The simulation environment fully captures how perception, planning, and communication uncertainties affect real physical vehicles.

What would settle it

Running the same scenarios on actual hardware vehicles and finding that the learned strategy no longer outperforms the baselines under real sensor noise and message loss.

Figures

Figures reproduced from arXiv: 2510.11041 by Hong Zhang, Liwei Deng, Shiyao Zhang, Shuyu Zhang, Weijie Yuan.

**Figure 2.** Figure 2: illustrates the complete processing flow for each agent, including the sequential state and action updates and decision-making steps. Within this framework, the GRUSAC module, which consists of a GRU based actor and two critic networks integrated into the SAC learning loop, introduces temporal modeling into both policy generation and value estimation, enabling the agent to reason over historical observati… view at source ↗

**Figure 3.** Figure 3: Predicted error versus steps via proposed GRU-SAC. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluation of the proposed DRLACP in CARLA. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

In future intelligent transportation systems, autonomous cooperative planning (ACP), becomes a promising technique to increase the effectiveness and security of multi-vehicle interactions. However, multiple uncertainties cannot be fully addressed for existing ACP strategies, e.g. perception, planning, and communication uncertainties. To address these, a novel deep reinforcement learning-based autonomous cooperative planning (DRLACP) framework is proposed to tackle various uncertainties on cooperative motion planning schemes. Specifically, the soft actor-critic (SAC) with the implementation of gate recurrent units (GRUs) is adopted to learn the deterministic optimal time-varying actions with imperfect state information occurred by planning, communication, and perception uncertainties. In addition, the real-time actions of autonomous vehicles (AVs) are demonstrated via the Car Learning to Act (CARLA) simulation platform. Evaluation results show that the proposed DRLACP learns and performs cooperative planning effectively, which outperforms other baseline methods under different scenarios with imperfect AV state information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies SAC with GRUs to cooperative AV planning under imperfect states and reports better sim results in CARLA, but the abstract gives almost no numbers or modeling details to back the outperformance claim.

read the letter

The core of this paper is taking the soft actor-critic algorithm, adding GRUs to handle time sequences, and running it on multi-vehicle cooperative motion planning when perception, planning, and communication noise corrupt the state information. They test the setup in CARLA and state that it beats other methods across scenarios. That is the main contribution on offer: a concrete RL pipeline aimed at a practical multi-agent driving problem rather than a new algorithm or theoretical result. The choice of SAC plus recurrence is reasonable for learning deterministic actions from noisy inputs, and applying it to cooperative planning with multiple uncertainty sources is a straightforward domain extension that could interest people working on intelligent transportation systems. The abstract frames the work clearly around those three uncertainty types and the need for real-time actions, which keeps the scope focused. The soft spots sit in the evaluation. No quantitative metrics appear in the abstract—no success rates, collision counts, travel times, or statistical comparisons. Baselines are mentioned but not described, uncertainty injection is not detailed, and there are no ablations or error bars. Everything rests on CARLA runs, yet the paper does not discuss how the simulator's noise models were chosen or whether they were checked against real sensor or communication traces. That leaves the central claim—that the method handles imperfect information effectively—supported only by the statement that it outperforms baselines, without the data needed to judge the size or reliability of the gains. The simulation-to-reality gap is the clearest weak point here. A reader working on RL for autonomous vehicles might still find the framework useful as an example of how to wire SAC and GRUs together for this setting, but anyone expecting strong evidence for real-world transfer would need the full results section first. I would send it to peer review for a targeted robotics or transportation venue, mainly so referees can press for the missing numbers, baseline definitions, and any calibration checks on the uncertainty models.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a deep reinforcement learning-based autonomous cooperative planning (DRLACP) framework that integrates soft actor-critic (SAC) with gated recurrent units (GRUs) to generate time-varying cooperative actions for multiple autonomous vehicles under imperfect state information arising from perception, planning, and communication uncertainties. The framework is evaluated exclusively through simulation in the CARLA platform across different scenarios, with the central claim that DRLACP learns effective policies and outperforms unspecified baseline methods.

Significance. If the empirical results can be strengthened with quantitative detail and validation, the work would address a practically relevant gap in uncertainty-aware multi-agent planning for intelligent transportation systems. The choice of SAC for continuous action spaces combined with GRUs for temporal state history is technically plausible for handling noisy or delayed observations. However, the current presentation supplies no numerical performance deltas, statistical tests, or ablation results, limiting the ability to judge whether the approach advances the state of the art beyond existing DRL planners for AV cooperation.

major comments (3)

[§5 (Evaluation)] §5 (Evaluation) and abstract: the claim that DRLACP 'outperforms other baseline methods under different scenarios with imperfect AV state information' is asserted without any reported numerical metrics (e.g., collision rate, travel time, reward, or success percentage), without error bars, without statistical significance tests, and without an explicit description of the baseline algorithms or their hyperparameters. This absence directly weakens the central empirical claim.
[§4 (Methodology) and §5] §4 (Methodology) and §5: no ablation studies are presented on the contribution of the GRU component versus a memoryless SAC baseline, nor on the specific uncertainty injection parameters (noise levels, delay distributions) used in CARLA. Without these, it is impossible to determine whether performance gains arise from the proposed architecture or from the particular simulator configuration.
[§5 and §6 (Conclusion)] §5 and §6 (Conclusion): the manuscript provides no calibration or sensitivity analysis showing that the modeled perception, planning, and communication uncertainties in CARLA produce effects representative of physical sensor traces or V2X communication. The mapping from simulation results to the headline claim of effective real-world cooperative planning therefore rests on an unverified assumption.

minor comments (2)

[Abstract and §1] The abstract and introduction repeatedly use the phrase 'imperfect AV state information' without a precise mathematical definition of the observation model (e.g., which state variables receive which noise distributions).
[Figures in §5] Figure captions and axis labels in the results section should explicitly state the performance metric being plotted and the number of random seeds used for each curve.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to strengthen the empirical claims with quantitative results, add ablation studies, and include sensitivity analysis on the uncertainty models. Our responses to each major comment are provided below.

read point-by-point responses

Referee: [§5 (Evaluation)] §5 (Evaluation) and abstract: the claim that DRLACP 'outperforms other baseline methods under different scenarios with imperfect AV state information' is asserted without any reported numerical metrics (e.g., collision rate, travel time, reward, or success percentage), without error bars, without statistical significance tests, and without an explicit description of the baseline algorithms or their hyperparameters. This absence directly weakens the central empirical claim.

Authors: We agree that the original manuscript did not provide sufficient quantitative detail to support the performance claims. In the revised version, Section 5 now includes tables reporting numerical metrics such as collision rates, average travel times, cumulative rewards, and success percentages across scenarios. Results are averaged over 10 independent runs with standard deviations shown as error bars. We have added paired t-tests with p-values to establish statistical significance. We also explicitly describe the baseline algorithms (standard SAC, MADDPG, and a rule-based cooperative planner) and list all hyperparameters in a dedicated subsection of Section 4. revision: yes
Referee: [§4 (Methodology) and §5] §4 (Methodology) and §5: no ablation studies are presented on the contribution of the GRU component versus a memoryless SAC baseline, nor on the specific uncertainty injection parameters (noise levels, delay distributions) used in CARLA. Without these, it is impossible to determine whether performance gains arise from the proposed architecture or from the particular simulator configuration.

Authors: We acknowledge the importance of ablations for isolating contributions. The revised manuscript includes new experiments comparing the full SAC+GRU model against a memoryless SAC baseline (no recurrent units). We also report results for different uncertainty injection settings, including perception noise standard deviations of 0.05–0.5, communication delays drawn from uniform distributions (0–200 ms), and planning uncertainty levels. These ablation results appear in new tables and figures in Section 5, showing that the GRU component improves robustness to temporal uncertainty while the chosen noise parameters affect performance in a consistent manner. revision: yes
Referee: [§5 and §6 (Conclusion)] §5 and §6 (Conclusion): the manuscript provides no calibration or sensitivity analysis showing that the modeled perception, planning, and communication uncertainties in CARLA produce effects representative of physical sensor traces or V2X communication. The mapping from simulation results to the headline claim of effective real-world cooperative planning therefore rests on an unverified assumption.

Authors: We agree that simulation-to-reality transfer requires careful justification. While exhaustive calibration against proprietary real-world sensor traces lies outside the scope of this simulation-focused study, the revised Section 5 now contains a sensitivity analysis sweeping the uncertainty parameters over ranges informed by published AV sensor noise statistics and V2X delay measurements. We cite relevant literature on CARLA validation and explicitly discuss the limitations of the modeled uncertainties in the conclusion, outlining future real-world experiments as necessary next steps. revision: partial

Circularity Check

0 steps flagged

No circularity: claims rest on empirical CARLA simulation results rather than self-referential equations or fitted inputs

full rationale

The paper introduces a DRLACP framework that applies standard soft actor-critic with GRUs to learn actions under imperfect state information from perception, planning, and communication uncertainties. All performance claims are demonstrated through direct simulation runs in the CARLA platform across scenarios, with comparisons to baselines. No mathematical derivation chain, equations, or first-principles results are presented that reduce by construction to fitted parameters, self-definitions, or self-citations. The central results are externally falsifiable via the simulator outputs and do not rely on any load-bearing self-referential steps or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on standard RL training assumptions plus the fidelity of the CARLA simulator; no new entities are postulated.

free parameters (1)

SAC and GRU network hyperparameters
Learning rates, network sizes, reward weights, and GRU hidden dimensions are tuned to achieve reported performance but are not enumerated.

axioms (1)

domain assumption CARLA simulation faithfully reproduces the statistical effects of perception, planning, and communication uncertainties
All evaluation results rest on this unverified modeling choice.

pith-pipeline@v0.9.0 · 5697 in / 1148 out tokens · 47050 ms · 2026-05-18T08:17:47.670354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the soft actor-critic (SAC) with the implementation of gate recurrent units (GRUs) is adopted to learn the deterministic optimal time-varying actions with imperfect state information occurred by planning, communication, and perception uncertainties... demonstrated via the CARLA simulation platform
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Evaluation results show that the proposed DRLACP learns and performs cooperative planning effectively, which outperforms other baseline methods under different scenarios with imperfect AV state information.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Collab- orative planning for catching and transporting objects in unstructured environments,

L. Pei, J. Lin, Z. Han, L. Quan, Y . Cao, C. Xu, and F. Gao, “Collab- orative planning for catching and transporting objects in unstructured environments,”IEEE Robotics and Automation Letters, 2023. 8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED OCTOBER, 2025 Success (a) State and control profiles of the proposed DRLACP with 3 A Vs. Su...

work page 2023
[2]

Motion planning and tracking control of unmanned underwater vehicles: technologies, challenges and prospects,

D. Zhu, T. Yan, and S. X. Yang, “Motion planning and tracking control of unmanned underwater vehicles: technologies, challenges and prospects,”Intelligence & Robotics, vol. 2, no. 3, 2022. [Online]. Available: https://www.oaepublish.com/articles/ir.2022.13

work page 2022
[3]

Decentralized planning for car-like robotic swarm in cluttered envi- ronments,

C. Ma, Z. Han, T. Zhang, J. Wang, L. Xu, C. Li, C. Xu, and F. Gao, “Decentralized planning for car-like robotic swarm in cluttered envi- ronments,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 9293–9300

work page 2023
[4]

Edge-assisted v2x motion planning and power control under channel uncertainty,

Z. Li, S. Wang, S. Zhang, M. Wen, K. Ye, Y .-C. Wu, and D. W. K. Ng, “Edge-assisted v2x motion planning and power control under channel uncertainty,”IEEE Transactions on V ehicular Technology, 2023

work page 2023
[5]

Edge accelerated robot navigation with collaborative motion planning,

G. Li, R. Han, S. Wang, F. Gao, Y . C. Eldar, and C. Xu, “Edge accelerated robot navigation with collaborative motion planning,” 2024. [Online]. Available: https://arxiv.org/abs/2311.08983

work page arXiv 2024
[6]

Formation and reconfiguration of tight multi-lane platoons,

R. Firoozi, X. Zhang, and F. Borrelli, “Formation and reconfiguration of tight multi-lane platoons,”Control Engineering Practice, vol. 108, p. 104714, Mar. 2021

work page 2021
[7]

Efficient rate- splitting multiple access for the internet of vehicles: Federated edge learning and latency minimization,

S. Zhang, S. Zhang, W. Yuan, Y . Li, and L. Hanzo, “Efficient rate- splitting multiple access for the internet of vehicles: Federated edge learning and latency minimization,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 5, pp. 1468–1483, May 2023

work page 2023
[8]

A distributed platoon control framework for connected automated vehicles in an urban traffic network,

B. Wang and R. Su, “A distributed platoon control framework for connected automated vehicles in an urban traffic network,”IEEE Trans- actions on Control of Network Systems, vol. 9, no. 4, pp. 1717–1730, Dec. 2022

work page 2022
[9]

The impact of flexible pla- toon formation operations,

S. Maiti, S. Winter, L. Kulik, and S. Sarkar, “The impact of flexible pla- toon formation operations,”IEEE Transactions on Intelligent V ehicles, vol. 5, no. 2, pp. 229–239, Jun. 2020

work page 2020
[10]

Fuel economy- oriented vehicle platoon control using economic model predictive con- trol,

M. Hu, C. Li, Y . Bian, H. Zhang, Z. Qin, and B. Xu, “Fuel economy- oriented vehicle platoon control using economic model predictive con- trol,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 20 836–20 849, Nov. 2022

work page 2022
[11]

Cooperative lane- change motion planning for connected and automated vehicle platoons in multi-lane scenarios,

X. Duan, C. Sun, D. Tian, J. Zhou, and D. Cao, “Cooperative lane- change motion planning for connected and automated vehicle platoons in multi-lane scenarios,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 7, pp. 7073–7091, Jul. 2023

work page 2023
[12]

Balancing computation speed and quality: A decentralized motion planning method for cooperative lane changes of connected and automated vehicles,

B. Li, Y . Zhang, Y . Feng, Y . Zhang, Y . Ge, and Z. Shao, “Balancing computation speed and quality: A decentralized motion planning method for cooperative lane changes of connected and automated vehicles,” IEEE Transactions on Intelligent V ehicles, vol. 3, no. 3, pp. 340–350, Sep. 2018

work page 2018
[13]

Collision avoidance predictive motion planning based on integrated perception and V2V communication,

S. Zhang, S. Wang, S. Yu, J. Yu, and M. Wen, “Collision avoidance predictive motion planning based on integrated perception and V2V communication,”IEEE Transactions on Intelligent Transportation Sys- tems, vol. 23, no. 7, pp. 9640–9653, July 2022

work page 2022
[14]

A hybrid deep reinforcement learning for autonomous vehicles smart-platooning,

S. B. Prathiba, G. Raja, K. Dev, N. Kumar, and M. Guizani, “A hybrid deep reinforcement learning for autonomous vehicles smart-platooning,” IEEE Transactions on V ehicular Technology, vol. 70, no. 12, pp. 13 340– 13 350, December 2021

work page 2021
[15]

A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations,

M. Li, Z. Cao, and Z. Li, “A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5309–5322, December 2021

work page 2021
[16]

Autonomous platoon control with integrated deep reinforcement learning and dynamic programming,

T. Liu, L. Lei, K. Zheng, and K. Zhang, “Autonomous platoon control with integrated deep reinforcement learning and dynamic programming,” IEEE Internet of Things Journal, vol. 10, no. 6, pp. 5476–5489, March 2023

work page 2023
[17]

Autonomous vehicle platoons in urban road networks: A joint dis- tributed reinforcement learning and model predictive control approach,

L. D’Alfonso, F. Giannini, G. Franz `e, G. Fedele, F. Pupo, and G. Fortino, “Autonomous vehicle platoons in urban road networks: A joint dis- tributed reinforcement learning and model predictive control approach,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 1, pp. 141–156, January 2024

work page 2024
[18]

Online learning-based model predictive trajectory control for connected and autonomous vehi- cles: Modeling and physical tests,

Q. Li, P. Zhang, H. Yao, Z. Chen, and X. Li, “Online learning-based model predictive trajectory control for connected and autonomous vehi- cles: Modeling and physical tests,”Journal of Intelligent and Connected V ehicles, vol. 7, no. 2, pp. 86–96, June 2024

work page 2024
[19]

Distributed model predictive control for heterogeneous platoon with leading human- driven vehicle acceleration prediction,

J. Yang, D. Chu, J. Yin, D. Pi, J. Wang, and L. Lu, “Distributed model predictive control for heterogeneous platoon with leading human- driven vehicle acceleration prediction,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 5, pp. 3944–3959, May 2024

work page 2024
[20]

Research advances and challenges of autonomous and connected ground vehicles,

A. Eskandarian, C. Wu, and C. Sun, “Research advances and challenges of autonomous and connected ground vehicles,”IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 683–711, Feb. 2021

work page 2021
[21]

Fast-lio2: Fast direct lidar- inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar- inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

work page 2053
[22]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, Nov. 2017, pp. 1–16

work page 2017
[23]

Avddpg - federated reinforcement learning applied to autonomous platoon control,

C. Boin, L. Lei, and S. X. Yang, “Avddpg - federated reinforcement learning applied to autonomous platoon control,”Intelligence & Robotics, vol. 2, no. 2, 2022. [Online]. Available: https://www. oaepublish.com/articles/ir.2022.11

work page 2022
[24]

Neupan: Direct point robot navigation with end-to-end model-based learning,

R. Han, S. Wang, S. Wang, Z. Zhang, J. Chen, S. Lin, C. Li, C. Xu, Y . C. Eldar, Q. Hao, and J. Pan, “Neupan: Direct point robot navigation with end-to-end model-based learning,”IEEE Transactions on Robotics, vol. 41, pp. 2804–2824, 2025

work page 2025
[25]

Enhancing large vision model in street scene semantic understanding through leveraging posterior optimization trajectory,

W.-B. Kou, Q. Lin, M. Tang, J. Lei, S. Wang, R. Ye, G. Zhu, and Y .-C. Wu, “Enhancing large vision model in street scene semantic understanding through leveraging posterior optimization trajectory,”

work page
[26]

Available: https://arxiv.org/abs/2501.01710

[Online]. Available: https://arxiv.org/abs/2501.01710

work page arXiv
[27]

Receding horizon maneuver generation for automated highway driving,

J. Nilsson, P. Falcone, M. Ali, and J. Sj ¨oberg, “Receding horizon maneuver generation for automated highway driving,”Control Engi- neering Practice, vol. 41, pp. 124–133, Aug. 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0967066115000726

work page 2015
[28]

Multi-uncertainty aware autonomous cooperative planning,

S. Zhang, H. Li, S. Zhang, S. Wang, D. W. Kwan Ng, and C. Xu, “Multi-uncertainty aware autonomous cooperative planning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 1018–1025

work page 2024
[29]

Rate-splitting multiple access-based satellite–vehicular communication system: A non- cooperative game theoretical approach,

S. Zhang, S. Zhang, W. Yuan, and T. Q. S. Quek, “Rate-splitting multiple access-based satellite–vehicular communication system: A non- cooperative game theoretical approach,”IEEE Open Journal of the Communications Society, vol. 4, pp. 430–441, 2023

work page 2023
[30]

Formation and reconfiguration of tight multi-lane platoons,

R. Firoozi, X. Zhang, and F. Borrelli, “Formation and reconfiguration of tight multi-lane platoons,”CoRR, vol. abs/2003.08595, Dec. 2020. [Online]. Available: https://arxiv.org/abs/2003.08595

work page arXiv 2003

[1] [1]

Collab- orative planning for catching and transporting objects in unstructured environments,

L. Pei, J. Lin, Z. Han, L. Quan, Y . Cao, C. Xu, and F. Gao, “Collab- orative planning for catching and transporting objects in unstructured environments,”IEEE Robotics and Automation Letters, 2023. 8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED OCTOBER, 2025 Success (a) State and control profiles of the proposed DRLACP with 3 A Vs. Su...

work page 2023

[2] [2]

Motion planning and tracking control of unmanned underwater vehicles: technologies, challenges and prospects,

D. Zhu, T. Yan, and S. X. Yang, “Motion planning and tracking control of unmanned underwater vehicles: technologies, challenges and prospects,”Intelligence & Robotics, vol. 2, no. 3, 2022. [Online]. Available: https://www.oaepublish.com/articles/ir.2022.13

work page 2022

[3] [3]

Decentralized planning for car-like robotic swarm in cluttered envi- ronments,

C. Ma, Z. Han, T. Zhang, J. Wang, L. Xu, C. Li, C. Xu, and F. Gao, “Decentralized planning for car-like robotic swarm in cluttered envi- ronments,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 9293–9300

work page 2023

[4] [4]

Edge-assisted v2x motion planning and power control under channel uncertainty,

Z. Li, S. Wang, S. Zhang, M. Wen, K. Ye, Y .-C. Wu, and D. W. K. Ng, “Edge-assisted v2x motion planning and power control under channel uncertainty,”IEEE Transactions on V ehicular Technology, 2023

work page 2023

[5] [5]

Edge accelerated robot navigation with collaborative motion planning,

G. Li, R. Han, S. Wang, F. Gao, Y . C. Eldar, and C. Xu, “Edge accelerated robot navigation with collaborative motion planning,” 2024. [Online]. Available: https://arxiv.org/abs/2311.08983

work page arXiv 2024

[6] [6]

Formation and reconfiguration of tight multi-lane platoons,

R. Firoozi, X. Zhang, and F. Borrelli, “Formation and reconfiguration of tight multi-lane platoons,”Control Engineering Practice, vol. 108, p. 104714, Mar. 2021

work page 2021

[7] [7]

Efficient rate- splitting multiple access for the internet of vehicles: Federated edge learning and latency minimization,

S. Zhang, S. Zhang, W. Yuan, Y . Li, and L. Hanzo, “Efficient rate- splitting multiple access for the internet of vehicles: Federated edge learning and latency minimization,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 5, pp. 1468–1483, May 2023

work page 2023

[8] [8]

A distributed platoon control framework for connected automated vehicles in an urban traffic network,

B. Wang and R. Su, “A distributed platoon control framework for connected automated vehicles in an urban traffic network,”IEEE Trans- actions on Control of Network Systems, vol. 9, no. 4, pp. 1717–1730, Dec. 2022

work page 2022

[9] [9]

The impact of flexible pla- toon formation operations,

S. Maiti, S. Winter, L. Kulik, and S. Sarkar, “The impact of flexible pla- toon formation operations,”IEEE Transactions on Intelligent V ehicles, vol. 5, no. 2, pp. 229–239, Jun. 2020

work page 2020

[10] [10]

Fuel economy- oriented vehicle platoon control using economic model predictive con- trol,

M. Hu, C. Li, Y . Bian, H. Zhang, Z. Qin, and B. Xu, “Fuel economy- oriented vehicle platoon control using economic model predictive con- trol,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 20 836–20 849, Nov. 2022

work page 2022

[11] [11]

Cooperative lane- change motion planning for connected and automated vehicle platoons in multi-lane scenarios,

X. Duan, C. Sun, D. Tian, J. Zhou, and D. Cao, “Cooperative lane- change motion planning for connected and automated vehicle platoons in multi-lane scenarios,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 7, pp. 7073–7091, Jul. 2023

work page 2023

[12] [12]

Balancing computation speed and quality: A decentralized motion planning method for cooperative lane changes of connected and automated vehicles,

B. Li, Y . Zhang, Y . Feng, Y . Zhang, Y . Ge, and Z. Shao, “Balancing computation speed and quality: A decentralized motion planning method for cooperative lane changes of connected and automated vehicles,” IEEE Transactions on Intelligent V ehicles, vol. 3, no. 3, pp. 340–350, Sep. 2018

work page 2018

[13] [13]

Collision avoidance predictive motion planning based on integrated perception and V2V communication,

S. Zhang, S. Wang, S. Yu, J. Yu, and M. Wen, “Collision avoidance predictive motion planning based on integrated perception and V2V communication,”IEEE Transactions on Intelligent Transportation Sys- tems, vol. 23, no. 7, pp. 9640–9653, July 2022

work page 2022

[14] [14]

A hybrid deep reinforcement learning for autonomous vehicles smart-platooning,

S. B. Prathiba, G. Raja, K. Dev, N. Kumar, and M. Guizani, “A hybrid deep reinforcement learning for autonomous vehicles smart-platooning,” IEEE Transactions on V ehicular Technology, vol. 70, no. 12, pp. 13 340– 13 350, December 2021

work page 2021

[15] [15]

A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations,

M. Li, Z. Cao, and Z. Li, “A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5309–5322, December 2021

work page 2021

[16] [16]

Autonomous platoon control with integrated deep reinforcement learning and dynamic programming,

T. Liu, L. Lei, K. Zheng, and K. Zhang, “Autonomous platoon control with integrated deep reinforcement learning and dynamic programming,” IEEE Internet of Things Journal, vol. 10, no. 6, pp. 5476–5489, March 2023

work page 2023

[17] [17]

Autonomous vehicle platoons in urban road networks: A joint dis- tributed reinforcement learning and model predictive control approach,

L. D’Alfonso, F. Giannini, G. Franz `e, G. Fedele, F. Pupo, and G. Fortino, “Autonomous vehicle platoons in urban road networks: A joint dis- tributed reinforcement learning and model predictive control approach,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 1, pp. 141–156, January 2024

work page 2024

[18] [18]

Online learning-based model predictive trajectory control for connected and autonomous vehi- cles: Modeling and physical tests,

Q. Li, P. Zhang, H. Yao, Z. Chen, and X. Li, “Online learning-based model predictive trajectory control for connected and autonomous vehi- cles: Modeling and physical tests,”Journal of Intelligent and Connected V ehicles, vol. 7, no. 2, pp. 86–96, June 2024

work page 2024

[19] [19]

Distributed model predictive control for heterogeneous platoon with leading human- driven vehicle acceleration prediction,

J. Yang, D. Chu, J. Yin, D. Pi, J. Wang, and L. Lu, “Distributed model predictive control for heterogeneous platoon with leading human- driven vehicle acceleration prediction,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 5, pp. 3944–3959, May 2024

work page 2024

[20] [20]

Research advances and challenges of autonomous and connected ground vehicles,

A. Eskandarian, C. Wu, and C. Sun, “Research advances and challenges of autonomous and connected ground vehicles,”IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 683–711, Feb. 2021

work page 2021

[21] [21]

Fast-lio2: Fast direct lidar- inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar- inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

work page 2053

[22] [22]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, Nov. 2017, pp. 1–16

work page 2017

[23] [23]

Avddpg - federated reinforcement learning applied to autonomous platoon control,

C. Boin, L. Lei, and S. X. Yang, “Avddpg - federated reinforcement learning applied to autonomous platoon control,”Intelligence & Robotics, vol. 2, no. 2, 2022. [Online]. Available: https://www. oaepublish.com/articles/ir.2022.11

work page 2022

[24] [24]

Neupan: Direct point robot navigation with end-to-end model-based learning,

R. Han, S. Wang, S. Wang, Z. Zhang, J. Chen, S. Lin, C. Li, C. Xu, Y . C. Eldar, Q. Hao, and J. Pan, “Neupan: Direct point robot navigation with end-to-end model-based learning,”IEEE Transactions on Robotics, vol. 41, pp. 2804–2824, 2025

work page 2025

[25] [25]

Enhancing large vision model in street scene semantic understanding through leveraging posterior optimization trajectory,

W.-B. Kou, Q. Lin, M. Tang, J. Lei, S. Wang, R. Ye, G. Zhu, and Y .-C. Wu, “Enhancing large vision model in street scene semantic understanding through leveraging posterior optimization trajectory,”

work page

[26] [26]

Available: https://arxiv.org/abs/2501.01710

[Online]. Available: https://arxiv.org/abs/2501.01710

work page arXiv

[27] [27]

Receding horizon maneuver generation for automated highway driving,

J. Nilsson, P. Falcone, M. Ali, and J. Sj ¨oberg, “Receding horizon maneuver generation for automated highway driving,”Control Engi- neering Practice, vol. 41, pp. 124–133, Aug. 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0967066115000726

work page 2015

[28] [28]

Multi-uncertainty aware autonomous cooperative planning,

S. Zhang, H. Li, S. Zhang, S. Wang, D. W. Kwan Ng, and C. Xu, “Multi-uncertainty aware autonomous cooperative planning,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 1018–1025

work page 2024

[29] [29]

Rate-splitting multiple access-based satellite–vehicular communication system: A non- cooperative game theoretical approach,

S. Zhang, S. Zhang, W. Yuan, and T. Q. S. Quek, “Rate-splitting multiple access-based satellite–vehicular communication system: A non- cooperative game theoretical approach,”IEEE Open Journal of the Communications Society, vol. 4, pp. 430–441, 2023

work page 2023

[30] [30]

Formation and reconfiguration of tight multi-lane platoons,

R. Firoozi, X. Zhang, and F. Borrelli, “Formation and reconfiguration of tight multi-lane platoons,”CoRR, vol. abs/2003.08595, Dec. 2020. [Online]. Available: https://arxiv.org/abs/2003.08595

work page arXiv 2003