Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

Ehsan Sabouni; Fei Miao; H M Sabbir Ahmad; Keshawn Smith; Mainak Mondal; Song Han; Wenchao Li; Zhili Zhang

REVIEW 2 major objections 1 minor 1 cited by

A MARL framework trains driving policies in simulation and transfers them directly to physical vehicles while adding safety shields.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-19 11:15 UTC pith:FO3WDX5Y

load-bearing objection The paper integrates robust MARL with V2V communication and CBF safety shields for a hardware demo on small-scale vehicles, but the sim-to-real transfer claims rest on thin evidence. the 2 major comments →

arxiv 2506.00982 v3 pith:FO3WDX5Y submitted 2025-06-01 cs.RO cs.MA

Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

Keshawn Smith , Zhili Zhang , H M Sabbir Ahmad , Ehsan Sabouni , Mainak Mondal , Song Han , Wenchao Li , Fei Miao This is my paper

classification cs.RO cs.MA

keywords multi-agent reinforcement learningautonomous vehiclessim-to-real transfervehicle-to-vehicle communicationcontrol barrier functionsrobust learningsafety shieldshardware experiments

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-agent reinforcement learning policies for autonomous vehicles can be formulated with state and action representations that explicitly account for physical system complexities, trained robustly in simulation, and then transferred zero-shot to hardware. It incorporates vehicle-to-vehicle communication for shared information and uses Control Barrier Functions as modular safety shields to enforce guarantees during both training and deployment. A sympathetic reader would care because this addresses the persistent sim-to-real gap and safety concerns that have limited learning-based methods in real multi-robot systems, potentially allowing safer coordinated driving without extensive real-world retraining.

Core claim

RSR-RSMARL is a Robust and Safe MARL framework that supports Real-Sim-Real policy adaptation for multi-agent systems with communication among agents. It leverages state representations that include shared information among agents and action representations that consider real system complexities. The policy is trained with a robust MARL algorithm to enable zero-shot transfer to hardware despite the sim-to-real gap. A safety shield module using Control Barrier Functions provides safety guarantees for each individual agent. Experiments on 1/10th-scale autonomous vehicles with V2V communication show that the framework enhances driving safety and coordination across multiple configurations.

What carries the argument

The RSR-RSMARL framework, which combines robust MARL training, state and action representations that include shared V2V information and real-system details, Real-Sim-Real adaptation, and modular Control Barrier Function safety shields to support zero-shot hardware transfer.

Load-bearing premise

State and action representations that capture real system complexities, together with robust training, are enough to overcome sim-to-real discrepancies and model uncertainties so that the policies work directly on physical hardware.

What would settle it

Deploy the simulator-trained policies on the 1/10th-scale vehicles without any fine-tuning and observe whether safety or coordination breaks down in the presence of communication delays, model uncertainties, or dynamic obstacles.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Multi-agent vehicle teams can maintain individual safety guarantees while using shared communication to improve overall coordination.
Zero-shot transfer from simulation becomes feasible for MARL policies when representations are designed around physical complexities rather than idealized models.
Safety shields based on Control Barrier Functions can be added modularly without retraining the core policy for hardware use.
The same framework supports testing across varied team sizes and scenarios once the representations and training are fixed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach might scale to full-size vehicles if the state representations are adjusted for higher speeds and longer communication ranges.
Similar combinations of robust training and barrier-function shields could apply to other multi-agent domains such as drone coordination or warehouse robots.
If communication is intermittent, the framework's reliance on shared states would need explicit robustness extensions that the current experiments do not test.
The method could be combined with online adaptation modules to handle larger distribution shifts not seen in the 1/10-scale tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

The paper integrates robust MARL with V2V communication and CBF safety shields for a hardware demo on small-scale vehicles, but the sim-to-real transfer claims rest on thin evidence.

read the letter

The core takeaway is that this work shows a complete pipeline—robust multi-agent RL training, shared state representations via V2V, real-sim-real adaptation, and per-agent CBF safety modules—running on 1/10th-scale cars with actual communication hardware. That combination is more than most MARL papers deliver, and the hardware step itself is useful to see even at small scale. They get credit for treating communication as part of the state rather than an afterthought and for keeping the safety layer modular so it does not interfere with the learned policy during training. The architecture description is clear enough that someone could replicate the high-level structure. The abstract also notes success across multiple configurations, which suggests the framework is not tuned to one narrow case. The main weakness is the missing numbers. No success rates, collision counts, baseline comparisons, or direct measurements of the sim-to-real gap appear in the provided material. Without those, it is difficult to judge whether the state and action representations actually closed the transfer gap or whether the CBF shield simply overrode risky actions on hardware. The stress-test concern about unquantified discrepancies in state distributions, delays, and noise holds up here; if those details are in the full paper they are not highlighted, which weakens the zero-shot claim. This paper is aimed at researchers working on multi-agent autonomy who already know the MARL and CBF literature and want to see an end-to-end hardware attempt. A reader in that group would find the integration and the small-robot demonstration worth reading, even if they would still need to run their own experiments to trust the transfer results. I would send it to peer review. The hardware component and the modular safety design are concrete enough to justify referee time, provided the authors supply the quantitative metrics and gap analysis that are currently absent.

Referee Report

2 major / 1 minor

Summary. The paper proposes RSR-RSMARL, a novel Robust and Safe Multi-Agent Reinforcement Learning framework with V2V communication for autonomous vehicles. It enables Real-Sim-Real (RSR) policy adaptation by designing state (including shared information) and action representations that account for real-system complexities, training via a robust MARL algorithm for zero-shot hardware transfer, and adding a Control Barrier Function (CBF) safety shield per agent. The central claim is that this yields enhanced driving safety and coordination, supported by both simulation results and hardware experiments on 1/10th-scale vehicles across multiple configurations.

Significance. If the hardware results hold with quantitative support, the work would be significant for multi-agent autonomy: it directly tackles sim-to-real transfer, communication design, and safety in a single modular architecture. The combination of representation choices, robust training, and CBF shielding offers a concrete path toward deployable MARL policies on physical vehicles, which remains rare in the literature.

major comments (2)

[Abstract and Section 4] Abstract and Section 4: The manuscript asserts that 'Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination,' yet supplies no quantitative metrics (success rates, collision counts, trajectory error, or statistical significance), no baselines, and no error analysis or training hyperparameters. This absence directly undermines the central empirical claim of effective zero-shot transfer.
[Section 4 and RSR adaptation description] Section 4 and RSR adaptation description: No explicit quantification of the sim-to-real gap is provided (e.g., Wasserstein distance between state distributions, actuator latency mismatch, or sensor noise statistics). Without these measurements it is impossible to determine whether any observed hardware performance arises from the chosen state/action representations or from unstated environment simplifications or CBF intervention. This is load-bearing for the zero-shot guarantee.

minor comments (1)

[Abstract] The expansion of the acronym RSR-RSMARL is not stated on first use; adding '(Robust and Safe Real-Sim-Real Multi-Agent Reinforcement Learning)' would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We value the constructive criticism regarding the presentation of our hardware experiments and the quantification of the sim-to-real gap. We believe these points can be addressed through targeted revisions and additional analysis, which we outline below.

read point-by-point responses

Referee: [Abstract and Section 4] Abstract and Section 4: The manuscript asserts that 'Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination,' yet supplies no quantitative metrics (success rates, collision counts, trajectory error, or statistical significance), no baselines, and no error analysis or training hyperparameters. This absence directly undermines the central empirical claim of effective zero-shot transfer.

Authors: We acknowledge the validity of this observation. The current manuscript emphasizes qualitative demonstrations and figures in Section 4 to illustrate the hardware performance. In the revision, we will incorporate quantitative metrics such as success rates, number of collisions, trajectory errors with standard deviations, and p-values for statistical significance. Baselines including non-communicative MARL and MARL without CBF will be added, along with a table summarizing hyperparameters and error analysis. This will provide the necessary quantitative support for the zero-shot transfer claims. revision: yes
Referee: [Section 4 and RSR adaptation description] Section 4 and RSR adaptation description: No explicit quantification of the sim-to-real gap is provided (e.g., Wasserstein distance between state distributions, actuator latency mismatch, or sensor noise statistics). Without these measurements it is impossible to determine whether any observed hardware performance arises from the chosen state/action representations or from unstated environment simplifications or CBF intervention. This is load-bearing for the zero-shot guarantee.

Authors: We agree that providing explicit measures of the sim-to-real gap would enhance the rigor of our claims. We will revise Section 4 to include an analysis of the sim-to-real discrepancies, such as statistical comparisons of state distributions (including Wasserstein distance where applicable), measured actuator latencies, and sensor noise levels from the hardware setup. We will also clarify how the designed state and action representations mitigate these gaps and evaluate the contribution of the CBF safety shield through ablation studies. These additions will better justify the zero-shot transfer performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework and transfer claims rest on external hardware validation

full rationale

The paper introduces RSR-RSMARL as a design combining state/action representations that incorporate real-system complexities, robust MARL training, and a modular CBF safety shield. Central claims are validated by direct hardware experiments on 1/10-scale vehicles with V2V communication across multiple configurations. No equations, fitted parameters, or results are shown to reduce by construction to quantities defined within the same experiment. No load-bearing self-citation chains or uniqueness theorems imported from prior author work appear in the derivation. The sim-to-real transfer is presented as an empirical outcome of the chosen representations and robust training rather than a tautological re-statement of inputs. This qualifies as self-contained against external benchmarks (hardware runs), warranting score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters or invented entities; relies on standard domain assumptions about CBF safety and sim-to-real transfer feasibility.

axioms (1)

domain assumption Control barrier functions provide per-agent safety guarantees in both simulation and hardware
Invoked for the safety shield module in the abstract.

pith-pipeline@v0.9.0 · 5831 in / 1155 out tokens · 49742 ms · 2026-05-19T11:15:46.444624+00:00 · methodology

0 comments

read the original abstract

Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.

Figures

Figures reproduced from arXiv: 2506.00982 by Ehsan Sabouni, Fei Miao, H M Sabbir Ahmad, Keshawn Smith, Mainak Mondal, Song Han, Wenchao Li, Zhili Zhang.

**Figure 2.** Figure 2: The figure illustrates the hardware policy execution stage pipeline of an agent, with all [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Discounted Efficiency Returns during Training [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: CARLA Simulation: Successful Merging and Lane Change [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: CARLA Simulation: Failure Case with Rear-End Collision [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Real-World Environment Setting A.7.1 Communication Framework: Hardware and Software Each vehicle in the fleet was equipped with an onboard Jetson Orin Nano (8GB) running ROS1 Noetic on Ubuntu 20.04. For real-world deployment, we rely on a communication infrastructure built upon the Robot Operating System (ROS 1). ROS provides a publish-subscribe messaging architecture that enables real-time data exchange … view at source ↗

**Figure 7.** Figure 7: Real-World Test: Lane Following and Obstacle Avoidance [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Real-World Test: Drift Caused by Abrupt Control Perturbation [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: CBF Intervention Frequency: With vs. Without V2V Communication [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the kinematic bicycle model ... The CBF is the additional safety constraint ... min u ½∥u−uref∥² s.t. ∂h/∂t + Lf h + Lg h u ≥ −γh

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SCALE-COMM: Shared, Contrastively-Aligned Latent Embeddings for MARL Communication
cs.RO 2026-05 unverdicted novelty 5.0

SCALE-COMM uses contrastive alignment on latent embeddings to decouple and stabilize communication learning from policy optimization in decentralized MARL, showing gains on benchmarks and a warehouse task.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

U. I. J. P. Office. Saving lives with connectivity: A plan to accelerate v2x deployment non- binding contents, 2024

work page 2024
[2]

Zhang, S

Z. Zhang, S. Han, J. Wang, and F. Miao. Spatial-temporal-aware safe multi-agent reinforce- ment learning of connected autonomous vehicles in challenging scenarios. pages 5574–5580, 2023

work page 2023
[3]

Hyldmar, Y

N. Hyldmar, Y . He, and A. Prorok. A fleet of miniature cars for experiments in cooperative driving. Proceedings - IEEE International Conference on Robotics and Automation , 2019- May:3238–3244, 5 2019. ISSN 10504729. doi:10.1109/ICRA.2019.8794445

work page doi:10.1109/icra.2019.8794445 2019
[4]

In: IEEE Int

A. Miller, K. Rim, P. Chopra, P. Kelkar, and M. Likhachev. Cooperative perception and lo- calization for cooperative driving. Proceedings - IEEE International Conference on Robotics and Automation, pages 1256–1262, 5 2020. ISSN 10504729. doi:10.1109/ICRA40945.2020. 9197463

work page doi:10.1109/icra40945.2020 2020
[5]

Zhang, H

Z. Zhang, H. M. S. Ahmad, E. Sabouni, Y . Sun, F. Huang, W. Li, and F. Miao. Safety guar- anteed robust multi-agent reinforcement learning with hierarchical control for connected and automated vehicles. 9 2023. URL https://arxiv.org/abs/2309.11057v2

work page arXiv 2023
[6]

S. Han, H. Wang, S. Su, Y . Shi, and F. Miao. Stable and efficient shapley value-based reward reallocation for multi-agent reinforcement learning of autonomous vehicles. Proceedings - IEEE International Conference on Robotics and Automation, pages 8765–8771, 3 2022. ISSN 10504729. doi:10.1109/ICRA46639.2022.9811626. URL https://arxiv.org/abs/ 2203.06333v2

work page doi:10.1109/icra46639.2022.9811626 2022
[7]

Rios-Torres and A

J. Rios-Torres and A. A. Malikopoulos. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps. IEEE Transac- tions on Intelligent Transportation Systems , 18:1066–1077, 5 2017. ISSN 15249050. doi: 10.1109/TITS.2016.2600504

work page doi:10.1109/tits.2016.2600504 2017
[8]

S. Han, S. Zhou, J. Wang, L. Pepin, C. Ding, J. Fu, and F. Miao. A multi-agent reinforcement learning approach for safe and efficient behavior planning of connected autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems , 25(5):3654–3670, 2024. doi:10. 1109/TITS.2023.3336670

work page arXiv 2024
[9]

In: CEREBROVASCULAR DISEASES (2015)

A. Mokhtarian, P. Scheffe, M. Kloock, S. Sch ¨afer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, J. Betz, S. Wilson, S. Berman, A. Prorok, and B. Alrifaee. A survey on small-scale testbeds for connected and automated vehicles and robot swarms. 2024. doi:10.13140/RG.2. 2.16176.74248/1. URL https://arxiv.org/abs/2408.03539

work page doi:10.13140/rg.2 2024
[10]

Y . Shao, M. A. M. Zulkefli, Z. Sun, and P. Huang. Evaluating connected and autonomous vehicles using a hardware-in-the-loop testbed and a living lab. Transportation Research Part C: Emerging Technologies, 102:121–135, 5 2019. ISSN 0968-090X. doi:10.1016/J.TRC.2019. 03.010

work page doi:10.1016/j.trc.2019 2019
[11]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Mart´ın-Mart´ın, and P. Stone. Deep reinforce- ment learning for robotics: A survey of real-world successes. 8 2024. doi:10.1146/((please). URL https://arxiv.org/abs/2408.03539v2

work page doi:10.1146/((please 2024
[12]

Y . Feng, C. Hong, Y . Niu, S. Liu, Y . Yang, W. Yu, T. Zhang, J. Tan, and D. Zhao. Learning multi-agent loco-manipulation for long-horizon quadrupedal pushing, accepted, ICRA2025. URL https://arxiv.org/abs/2411.07104

work page arXiv
[13]

Werner, T

P. Werner, T. Seyde, P. Drews, T. M. Balch, I. Gilitschenski, W. Schwarting, G. Rosman, S. Karaman, and D. Rus. Dynamic multi-team racing: Competitive driving on 1/10-th scale vehicles via learning in simulation. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=fvXFBCHVGn. 10

work page 2023
[14]

S. Han, S. Su, S. He, S. Han, H. Yang, and F. Miao. What is the solution for state adversarial multi-agent reinforcement learning? arXiv preprint arXiv:2212.02705, 2022

work page arXiv 2022
[15]

Liang, Y

Y . Liang, Y . Sun, R. Zheng, and F. Huang. Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning. Advances in Neural Information Processing Systems, 35:22547–22561, 2022

work page 2022
[16]

Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal. Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation. 3 2024. URL https://arxiv.org/abs/2403.03949v1

work page arXiv 2024
[17]

M. T. Villasevil, A. Jain, V . Macha, J. Yuan, L. L. Ankile, A. Simeonov, P. Agrawal, and A. Gupta. Scaling robot-learning by crowdsourcing simulation environments

work page
[18]

W. Zhao, J. P. Queralta, and T. Westerlund. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pages 737–744, 12 2020. doi:10.1109/SSCI47803.2020.9308468

work page doi:10.1109/ssci47803.2020.9308468 2020
[19]

Jiang, C

Y . Jiang, C. Wang, R. Zhang, J. Wu, and L. Fei-Fei. Transic: Sim-to-real policy transfer by learning from online correction. In Conference on Robot Learning, 2024

work page 2024
[20]

S. S. Sandha, L. Garcia, B. Balaji, F. Anwar, and M. Srivastava. Sim2real transfer for deep reinforcement learning with stochastic state transition delays. In J. Kober, F. Ramos, and C. Tomlin, editors, Proceedings of the 2020 Conference on Robot Learning , volume 155 of Proceedings of Machine Learning Research, pages 1066–1083. PMLR, 16–18 Nov 2021. URL ...

work page 2020
[21]

Brunke, M

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig. Safe learn- ing in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022

work page 2022
[22]

ElSayed-Aly, S

I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng. Safe multi- agent reinforcement learning via shielding. In Proceedings of the 20th International Confer- ence on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 483–491, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450383073

work page 2021
[23]

Z. Cai, H. Cao, W. Lu, L. Zhang, and H. Xiong. Safe multi-agent reinforcement learning through decentralized multiple control barrier functions, 2021

work page 2021
[24]

J. Wang, S. Yang, Z. An, S. Han, Z. Zhang, R. Mangharam, M. Ma, and F. Miao. Multi- agent reinforcement learning guided by signal temporal logic specifications. arXiv preprint arXiv:2306.06808, 2023

work page arXiv 2023
[25]

S. He, S. Han, S. Su, S. Han, S. Zou, and F. Miao. Robust multi-agent reinforcement learning with state uncertainty. Transactions on Machine Learning Research, 2023

work page 2023
[26]

The foundations of locale theory

A. Mokhtarian, P. Scheffe, M. Kloock, S. Sch ¨afer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, J. Betz, S. Wilson, S. Berman, A. Prorok, and B. Alrifaee. A survey on small-scale testbeds for connected and automated vehicles and robot swarms. 2024. doi:10.13140/RG.2.2. 16176.74248/1. URL https://rgdoi.net/10.13140/RG.2.2.16176.74248/1

work page doi:10.13140/rg.2.2 2024
[27]

Z. Qin, H. Wang, and X. Li. Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceed- ings, Part XXIV 16, pages 276–291. Springer, 2020

work page 2020
[28]

Y . Li, D. Ma, Z. An, Z. Wang, Y . Zhong, S. Chen, and C. Feng. V2x-sim: Multi-agent col- laborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Au- tomation Letters, 7:10914–10921, 2 2022. ISSN 23773766. doi:10.1109/LRA.2022.3192802. URL https://arxiv.org/abs/2202.08449v2. 11

work page doi:10.1109/lra.2022.3192802 2022
[29]

H. M. S. Ahmad, E. Sabouni, A. Dickson, W. Xiao, C. G. Cassandras, and W. Li. Secure control of connected and automated vehicles using trust-aware robust event-triggered control barrier functions, 2024. URL https://arxiv.org/abs/2401.02306

work page arXiv 2024
[30]

J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli. Autonomous driving using model predic- tive control and a kinematic bicycle vehicle model. In Intelligent Vehicles Symposium, Seoul, Korea, 2015

work page 2015
[31]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 12 A Appendix A.1 Modeling and Algorithmic Details A.1.1 Vehicle Dynamic Model We adopt a kinematic bicycle model to describe the motion of each F1/10th vehicle. The state of each vehicle is represented asx = [X, ...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

U. I. J. P. Office. Saving lives with connectivity: A plan to accelerate v2x deployment non- binding contents, 2024

work page 2024

[2] [2]

Zhang, S

Z. Zhang, S. Han, J. Wang, and F. Miao. Spatial-temporal-aware safe multi-agent reinforce- ment learning of connected autonomous vehicles in challenging scenarios. pages 5574–5580, 2023

work page 2023

[3] [3]

Hyldmar, Y

N. Hyldmar, Y . He, and A. Prorok. A fleet of miniature cars for experiments in cooperative driving. Proceedings - IEEE International Conference on Robotics and Automation , 2019- May:3238–3244, 5 2019. ISSN 10504729. doi:10.1109/ICRA.2019.8794445

work page doi:10.1109/icra.2019.8794445 2019

[4] [4]

In: IEEE Int

A. Miller, K. Rim, P. Chopra, P. Kelkar, and M. Likhachev. Cooperative perception and lo- calization for cooperative driving. Proceedings - IEEE International Conference on Robotics and Automation, pages 1256–1262, 5 2020. ISSN 10504729. doi:10.1109/ICRA40945.2020. 9197463

work page doi:10.1109/icra40945.2020 2020

[5] [5]

Zhang, H

Z. Zhang, H. M. S. Ahmad, E. Sabouni, Y . Sun, F. Huang, W. Li, and F. Miao. Safety guar- anteed robust multi-agent reinforcement learning with hierarchical control for connected and automated vehicles. 9 2023. URL https://arxiv.org/abs/2309.11057v2

work page arXiv 2023

[6] [6]

S. Han, H. Wang, S. Su, Y . Shi, and F. Miao. Stable and efficient shapley value-based reward reallocation for multi-agent reinforcement learning of autonomous vehicles. Proceedings - IEEE International Conference on Robotics and Automation, pages 8765–8771, 3 2022. ISSN 10504729. doi:10.1109/ICRA46639.2022.9811626. URL https://arxiv.org/abs/ 2203.06333v2

work page doi:10.1109/icra46639.2022.9811626 2022

[7] [7]

Rios-Torres and A

J. Rios-Torres and A. A. Malikopoulos. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps. IEEE Transac- tions on Intelligent Transportation Systems , 18:1066–1077, 5 2017. ISSN 15249050. doi: 10.1109/TITS.2016.2600504

work page doi:10.1109/tits.2016.2600504 2017

[8] [8]

S. Han, S. Zhou, J. Wang, L. Pepin, C. Ding, J. Fu, and F. Miao. A multi-agent reinforcement learning approach for safe and efficient behavior planning of connected autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems , 25(5):3654–3670, 2024. doi:10. 1109/TITS.2023.3336670

work page arXiv 2024

[9] [9]

In: CEREBROVASCULAR DISEASES (2015)

A. Mokhtarian, P. Scheffe, M. Kloock, S. Sch ¨afer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, J. Betz, S. Wilson, S. Berman, A. Prorok, and B. Alrifaee. A survey on small-scale testbeds for connected and automated vehicles and robot swarms. 2024. doi:10.13140/RG.2. 2.16176.74248/1. URL https://arxiv.org/abs/2408.03539

work page doi:10.13140/rg.2 2024

[10] [10]

Y . Shao, M. A. M. Zulkefli, Z. Sun, and P. Huang. Evaluating connected and autonomous vehicles using a hardware-in-the-loop testbed and a living lab. Transportation Research Part C: Emerging Technologies, 102:121–135, 5 2019. ISSN 0968-090X. doi:10.1016/J.TRC.2019. 03.010

work page doi:10.1016/j.trc.2019 2019

[11] [11]

C. Tang, B. Abbatematteo, J. Hu, R. Chandra, R. Mart´ın-Mart´ın, and P. Stone. Deep reinforce- ment learning for robotics: A survey of real-world successes. 8 2024. doi:10.1146/((please). URL https://arxiv.org/abs/2408.03539v2

work page doi:10.1146/((please 2024

[12] [12]

Y . Feng, C. Hong, Y . Niu, S. Liu, Y . Yang, W. Yu, T. Zhang, J. Tan, and D. Zhao. Learning multi-agent loco-manipulation for long-horizon quadrupedal pushing, accepted, ICRA2025. URL https://arxiv.org/abs/2411.07104

work page arXiv

[13] [13]

Werner, T

P. Werner, T. Seyde, P. Drews, T. M. Balch, I. Gilitschenski, W. Schwarting, G. Rosman, S. Karaman, and D. Rus. Dynamic multi-team racing: Competitive driving on 1/10-th scale vehicles via learning in simulation. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=fvXFBCHVGn. 10

work page 2023

[14] [14]

S. Han, S. Su, S. He, S. Han, H. Yang, and F. Miao. What is the solution for state adversarial multi-agent reinforcement learning? arXiv preprint arXiv:2212.02705, 2022

work page arXiv 2022

[15] [15]

Liang, Y

Y . Liang, Y . Sun, R. Zheng, and F. Huang. Efficient adversarial training without attacking: Worst-case-aware robust reinforcement learning. Advances in Neural Information Processing Systems, 35:22547–22561, 2022

work page 2022

[16] [16]

Reconciling reality through simulation: A real-to-sim- to-real approach for robust manipulation,

M. Torne, A. Simeonov, Z. Li, A. Chan, T. Chen, A. Gupta, and P. Agrawal. Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation. 3 2024. URL https://arxiv.org/abs/2403.03949v1

work page arXiv 2024

[17] [17]

M. T. Villasevil, A. Jain, V . Macha, J. Yuan, L. L. Ankile, A. Simeonov, P. Agrawal, and A. Gupta. Scaling robot-learning by crowdsourcing simulation environments

work page

[18] [18]

W. Zhao, J. P. Queralta, and T. Westerlund. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, pages 737–744, 12 2020. doi:10.1109/SSCI47803.2020.9308468

work page doi:10.1109/ssci47803.2020.9308468 2020

[19] [19]

Jiang, C

Y . Jiang, C. Wang, R. Zhang, J. Wu, and L. Fei-Fei. Transic: Sim-to-real policy transfer by learning from online correction. In Conference on Robot Learning, 2024

work page 2024

[20] [20]

S. S. Sandha, L. Garcia, B. Balaji, F. Anwar, and M. Srivastava. Sim2real transfer for deep reinforcement learning with stochastic state transition delays. In J. Kober, F. Ramos, and C. Tomlin, editors, Proceedings of the 2020 Conference on Robot Learning , volume 155 of Proceedings of Machine Learning Research, pages 1066–1083. PMLR, 16–18 Nov 2021. URL ...

work page 2020

[21] [21]

Brunke, M

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig. Safe learn- ing in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022

work page 2022

[22] [22]

ElSayed-Aly, S

I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng. Safe multi- agent reinforcement learning via shielding. In Proceedings of the 20th International Confer- ence on Autonomous Agents and MultiAgent Systems, AAMAS ’21, page 483–491, Richland, SC, 2021. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450383073

work page 2021

[23] [23]

Z. Cai, H. Cao, W. Lu, L. Zhang, and H. Xiong. Safe multi-agent reinforcement learning through decentralized multiple control barrier functions, 2021

work page 2021

[24] [24]

J. Wang, S. Yang, Z. An, S. Han, Z. Zhang, R. Mangharam, M. Ma, and F. Miao. Multi- agent reinforcement learning guided by signal temporal logic specifications. arXiv preprint arXiv:2306.06808, 2023

work page arXiv 2023

[25] [25]

S. He, S. Han, S. Su, S. Han, S. Zou, and F. Miao. Robust multi-agent reinforcement learning with state uncertainty. Transactions on Machine Learning Research, 2023

work page 2023

[26] [26]

The foundations of locale theory

A. Mokhtarian, P. Scheffe, M. Kloock, S. Sch ¨afer, Heeseung Bang, Viet-Anh Le, Sangeet Ulhas, J. Betz, S. Wilson, S. Berman, A. Prorok, and B. Alrifaee. A survey on small-scale testbeds for connected and automated vehicles and robot swarms. 2024. doi:10.13140/RG.2.2. 16176.74248/1. URL https://rgdoi.net/10.13140/RG.2.2.16176.74248/1

work page doi:10.13140/rg.2.2 2024

[27] [27]

Z. Qin, H. Wang, and X. Li. Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceed- ings, Part XXIV 16, pages 276–291. Springer, 2020

work page 2020

[28] [28]

Y . Li, D. Ma, Z. An, Z. Wang, Y . Zhong, S. Chen, and C. Feng. V2x-sim: Multi-agent col- laborative perception dataset and benchmark for autonomous driving. IEEE Robotics and Au- tomation Letters, 7:10914–10921, 2 2022. ISSN 23773766. doi:10.1109/LRA.2022.3192802. URL https://arxiv.org/abs/2202.08449v2. 11

work page doi:10.1109/lra.2022.3192802 2022

[29] [29]

H. M. S. Ahmad, E. Sabouni, A. Dickson, W. Xiao, C. G. Cassandras, and W. Li. Secure control of connected and automated vehicles using trust-aware robust event-triggered control barrier functions, 2024. URL https://arxiv.org/abs/2401.02306

work page arXiv 2024

[30] [30]

J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli. Autonomous driving using model predic- tive control and a kinematic bicycle vehicle model. In Intelligent Vehicles Symposium, Seoul, Korea, 2015

work page 2015

[31] [31]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 12 A Appendix A.1 Modeling and Algorithmic Details A.1.1 Vehicle Dynamic Model We adopt a kinematic bicycle model to describe the motion of each F1/10th vehicle. The state of each vehicle is represented asx = [X, ...

work page internal anchor Pith review Pith/arXiv arXiv 2017