pith. sign in

arxiv: 2601.16578 · v1 · submitted 2026-01-23 · 💻 cs.RO · cs.SY· eess.SY

Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab

Pith reviewed 2026-05-16 12:20 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY
keywords sim-to-real transfermulti-agent reinforcement learningconnected and automated vehiclesbenchmarkzero-shot evaluationmotion planningdigital twin
0
0 comments X

The pith

A benchmark platform tests MARL policies for vehicles across simulation, digital twin, and physical hardware in zero-shot fashion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a reproducible benchmark built on the Cyber-Physical Mobility Lab to measure how multi-agent reinforcement learning policies for connected automated vehicles transfer from simulation to real hardware without retraining. It integrates three domains—pure simulation, a high-fidelity digital twin, and the physical testbed—to run the same policy and record performance. Deployment of one SigmaRL-trained policy shows two distinct drops: one from differences in the control software stacks and one from added environmental realism. A reader would care because MARL controllers for vehicles must cross this gap reliably before they can be used in traffic, and the open-source platform makes such measurements repeatable and comparable.

Core claim

The paper claims that the CPM Lab setup supplies a structured three-domain platform for zero-shot evaluation of MARL motion-planning policies, and that running a SigmaRL-trained policy through simulation, digital twin, and hardware isolates performance degradation into architectural mismatches between control stacks and the additional gap caused by increasing environmental realism.

What carries the argument

The integrated benchmark platform that combines simulation, high-fidelity digital twin, and physical testbed to enable zero-shot MARL policy evaluation for connected automated vehicles.

If this is right

  • MARL motion planners for connected vehicles can be evaluated for transfer without retraining on each new domain.
  • Performance losses can be attributed separately to control-stack differences versus added realism.
  • An open-source multi-domain testbed supports systematic, reproducible study of sim-to-real issues in vehicle MARL.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same platform could be used to compare transfer performance across different MARL algorithms beyond the one tested.
  • Aligning simulation control interfaces more closely with hardware might reduce one of the two observed degradation sources.
  • Policies trained with explicit awareness of these gaps could show smaller drops when moved to hardware.

Load-bearing premise

The chosen CPM Lab configuration and the particular SigmaRL policy sufficiently represent the general sim-to-real transfer problems that arise for MARL methods in connected automated vehicles.

What would settle it

A deployment of the same policy that produces no measurable performance loss across the three domains, or that attributes losses to factors other than control-stack architecture and environmental realism, would falsify the identified degradation sources.

Figures

Figures reproduced from arXiv: 2601.16578 by Bassam Alrifaee, Fynn Belderink, Jianye Xu, Julius Beerwerth, Simon Sch\"afer.

Figure 1
Figure 1. Figure 1: Illustration of how SigmaRL’s policy output is applied within the control horizon and then transitioned to a rules-based approach for the remaining prediction horizon, yielding a complete trajectory for each agent 𝑖 in the scene. 4 Evaluation We present a baseline evaluation to assess the zero-shot sim-to-real transferability of a policy trained in simula￾tion using SigmaRL. Code reproducing the simulation… view at source ↗
Figure 2
Figure 2. Figure 2: Example initial setup and representative trajectories for one configuration. One representative run per environment is shown, using the same initial positions across all environments; the run was selected from the digital twin as the one with centerline deviation closest to that environment’s mean. Trajectories are shown over the 18 s evaluation horizon; in the physical lab, a collision during this interva… view at source ↗
read the original abstract

We present a reproducible benchmark for evaluating sim-to-real transfer of Multi-Agent Reinforcement Learning (MARL) policies for Connected and Automated Vehicles (CAVs). The platform, based on the Cyber-Physical Mobility Lab (CPM Lab) [1], integrates simulation, a high-fidelity digital twin, and a physical testbed, enabling structured zero-shot evaluation of MARL motion-planning policies. We demonstrate its use by deploying a SigmaRL-trained policy [2] across all three domains, revealing two complementary sources of performance degradation: architectural differences between simulation and hardware control stacks, and the sim-to-real gap induced by increasing environmental realism. The open-source setup enables systematic analysis of sim-to-real challenges in MARL under realistic, reproducible conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents a reproducible benchmark platform based on the Cyber-Physical Mobility Lab (CPM Lab) for zero-shot evaluation of sim-to-real transfer in multi-agent reinforcement learning (MARL) policies for connected and automated vehicles (CAVs). The platform integrates simulation, a high-fidelity digital twin, and physical hardware testbed. Its use is demonstrated by deploying a single SigmaRL-trained policy across all three domains, which reveals two sources of performance degradation: architectural mismatches between simulation and hardware control stacks, and the sim-to-real gap from increasing environmental realism. The setup is open-source to support systematic analysis.

Significance. If the attribution of degradation sources holds under broader testing, the benchmark would provide a valuable, structured tool for reproducible study of sim-to-real challenges in MARL for CAVs, addressing a practical gap in the field. The integration of simulation through hardware and the open-source release are clear strengths that could enable community follow-up work.

major comments (1)
  1. [Demonstration / experimental evaluation] The central claim that the benchmark isolates two complementary sources of performance degradation (control-stack architecture differences and environmental realism) rests on deployment of only a single SigmaRL-trained policy. This leaves open the possibility that the observed drops arise from policy-specific features (e.g., observation space, reward design, or centralized training) rather than the stated general factors. Evaluation with at least one additional MARL algorithm possessing different coordination mechanisms is required to support the generality of the identified sources.
minor comments (1)
  1. [Abstract] The abstract states that the demonstration 'reveals' the two degradation sources but provides no quantitative metrics, tables of performance values, or error analysis; the full manuscript should include these to allow readers to assess the magnitude and statistical reliability of the reported drops.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and describe the planned revisions.

read point-by-point responses
  1. Referee: [Demonstration / experimental evaluation] The central claim that the benchmark isolates two complementary sources of performance degradation (control-stack architecture differences and environmental realism) rests on deployment of only a single SigmaRL-trained policy. This leaves open the possibility that the observed drops arise from policy-specific features (e.g., observation space, reward design, or centralized training) rather than the stated general factors. Evaluation with at least one additional MARL algorithm possessing different coordination mechanisms is required to support the generality of the identified sources.

    Authors: We agree that reliance on a single SigmaRL policy limits the strength of the generality claim for the two identified degradation sources. While the benchmark platform itself is the primary contribution and the observed drops are attributable to control-stack mismatches and increasing realism (as the policy is held fixed across domains), we acknowledge that policy-specific factors cannot be fully ruled out without broader testing. In the revised manuscript we will add results from at least one additional MARL algorithm with different coordination mechanisms (e.g., a fully decentralized variant) to confirm that the same degradation patterns appear across policy classes. revision: yes

Circularity Check

0 steps flagged

Benchmark is self-contained empirical evaluation with no derivation chain

full rationale

The paper introduces a platform integrating simulation, digital twin, and physical testbed for zero-shot MARL evaluation in CAVs, then reports empirical results from deploying one external SigmaRL policy. No equations, fitted parameters, or predictions appear that reduce to inputs by construction. Citations to prior CPM Lab and SigmaRL work are external references rather than load-bearing self-citations that justify a uniqueness theorem or ansatz. The central claims rest on observed performance differences across domains, which are falsifiable measurements rather than self-defined or renamed known results. This is a standard benchmark paper with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no details on parameters or assumptions beyond the described platform.

pith-pipeline@v0.9.0 · 5440 in / 1009 out tokens · 38026 ms · 2026-05-16T12:20:33.824821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Cyber- physical mobility lab: An open-source platform for networked and autonomous vehicles,

    M. Kloock, P. Scheffe, J. Maczijewski, A. Kampmann, A. Mokhtarian, S. Kowalewski, and B. Alrifaee, “Cyber- physical mobility lab: An open-source platform for networked and autonomous vehicles,” inEuropean Control Conference (ECC), 2021, pp. 1937–1944

  2. [2]

    SigmaRL: A sample-efficient and generalizable multi-agent reinforcement learning frame- work for motion planning,

    J. Xu, P. Hu, and B. Alrifaee, “SigmaRL: A sample-efficient and generalizable multi-agent reinforcement learning frame- work for motion planning,” inIEEE International Conference on Intelligent Transportation Systems (ITSC), 2024, pp. 768–775

  3. [3]

    Multi-agent reinforcement learning for connected and automated vehicles control: Recent advancements and future prospects,

    M. Hua, D. Chen, X. Qi, K. Jiang, Z. E. Liu, Q. Zhou, and H. Xu, “Multi-agent reinforcement learning for connected and automated vehicles control: Recent advancements and future prospects,”arXiv preprint arXiv:2312.11084, 2023

  4. [4]

    A review of cooperative multi-agentdeepreinforcementlearning,

    A. Oroojlooy and D. Hajinezhad, “A review of cooperative multi-agentdeepreinforcementlearning,”Applied Intelligence, vol. 53, no. 11, pp. 13677–13722, Jun. 2023. [Online]. Available: https://link.springer.com/10.1007/s10489-022- 04105-y

  5. [5]

    Sim-to- real transfer in deep reinforcement learning for robotics: A survey,

    W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to- real transfer in deep reinforcement learning for robotics: A survey,” in2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 737–744

  6. [6]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

  7. [7]

    Sim2real transfer for reinforcement learning without dynamics ran- domization,

    M. Kaspar, J. D. Muñoz Osorio, and J. Bock, “Sim2real transfer for reinforcement learning without dynamics ran- domization,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4383– 4388

  8. [8]

    Learning dexterous in-hand manipulation,

    O. M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020

  9. [9]

    Sim-to-real transfer of robotic control with dynamics randomization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 3803–3810

  10. [10]

    F1TENTH: An open-source evaluation environment for continuous control and reinforcement learning,

    M. O’Kelly, H. Zheng, D. Karthik, and R. Mangharam, “F1TENTH: An open-source evaluation environment for continuous control and reinforcement learning,”Proceedings of Machine Learning Research, vol. 123, 2020

  11. [11]

    Adversarial differentiable data augmentation for autonomous systems,

    M. Shu, Y. Shen, M. C. Lin, and T. Goldstein, “Adversarial differentiable data augmentation for autonomous systems,” in2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 14069–14075

  12. [12]

    Duckietown: An open, inexpensive and flexible platform for autonomy educa- tion and research,

    L. Paull, J. Tani, H. Ahn, J. Alonso-Mora, L. Carlone, M. Cap, Y. F. Chen, C. Choi, J. Dusek, Y. Fang, D. Hoehener, S.-Y. Liu, M. Novitzky, I. F. Okuyama, J. Pazis, G. Rosman, V. Var- ricchio, H.-C. Wang, D. Yershov, H. Zhao, M. Benjamin, C. Carr, M. Zuber, S. Karaman, E. Frazzoli, D. Del Vecchio, D. Rus, J. How, J. Leonard, and A. Censi, “Duckietown: An ...

  13. [13]

    The robotarium: Automation of a remotely accessible, multi-robot testbed,

    S. Wilson, P. Glotfelter, S. Mayya, G. Notomista, Y. Emam, X. Cai, and M. Egerstedt, “The robotarium: Automation of a remotely accessible, multi-robot testbed,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2922–2929, 2021

  14. [14]

    A survey on small-scale testbeds for connected and automated vehicles and robot swarms: A guide for creating a new testbed,

    A. Mokhtarian, J. Xu, P. Scheffe, M. Kloock, S. Schäfer, H. Bang, V.-A. Le, S. Ulhas, J. Betz, S. Wilson, S. Berman, L. Paull, A. Prorok, and B. Alrifaee, “A survey on small-scale testbeds for connected and automated vehicles and robot swarms: A guide for creating a new testbed,”IEEE Robotics & Automation Magazine, pp. 2–19, 2024

  15. [15]

    Transferring multi-agent reinforcement learning policies for autonomous driving using sim-to-real,

    E. Candela, L. Parada, L. Marques, T.-A. Georgescu, Y. Demiris, and P. Angeloudis, “Transferring multi-agent reinforcement learning policies for autonomous driving using sim-to-real,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 8814– 8820

  16. [16]

    From small-scale to full-scale: Assessing the potential for transferability of experimental results in small-scale cav testbeds,

    S. Schäfer and B. Alrifaee, “From small-scale to full-scale: Assessing the potential for transferability of experimental results in small-scale cav testbeds,” inIEEE International Conference on Vehicular Electronics and Safety (ICVES), 2024, pp. 1–6

  17. [17]

    The surprising effectiveness of ppo in coop- erative multi-agent games,

    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in coop- erative multi-agent games,”Advances in neural information processing systems, vol. 35, pp. 24611–24624, 2022. 10

  18. [18]

    Rajamani,Vehicle Dynamics and Control, ser

    R. Rajamani,Vehicle Dynamics and Control, ser. Mechanical Engineering Series. New York: Springer Science, 2006

  19. [19]

    Networked and autonomous model-scale vehicles for experiments in research and education,

    P. Scheffe, J. Maczijewski, M. Kloock, A. Kampmann, A. Derks, S. Kowalewski, and B. Alrifaee, “Networked and autonomous model-scale vehicles for experiments in research and education,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 17332–17337, 2020

  20. [20]

    Vision-based real-time indoor positioning system for multiple vehicles,

    M. Kloock, P. Scheffe, I. Tülleners, J. Maczijewski, S. Kowalewski, and B. Alrifaee, “Vision-based real-time indoor positioning system for multiple vehicles,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 15446–15453, 2020. Author Information Julius Beerwerth Dept. of Aerospace Engineering, University of the Bundeswehr Munich, 85579 Neubiberg, Germany julius.beer...