Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Hamsa Balakrishnan; Jasmine Jerry Aloor; Siddharth Nayak; Sydney Dolan

arxiv: 2502.00558 · v2 · pith:FNBD54Y3new · submitted 2025-02-01 · 💻 cs.MA

Asynchronous Cooperative Multi-Agent Reinforcement Learning with Limited Communication

Sydney Dolan , Siddharth Nayak , Jasmine Jerry Aloor , Hamsa Balakrishnan This is my paper

Pith reviewed 2026-05-23 03:33 UTC · model grok-4.3

classification 💻 cs.MA

keywords multi-agent reinforcement learningasynchronous communicationgraph transformerscooperative navigationdynamic graphslimited communicationMARL

0 comments

The pith

AsynCoMARL learns communication protocols from dynamic graphs to match baseline performance with 26% fewer messages in asynchronous multi-agent settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how multiple agents can cooperate on navigation and task completion when communication is infrequent, unsynchronized, and limited by the environment. It introduces AsynCoMARL, which represents communications as a dynamic graph whose edges appear only at actual message exchanges and processes that graph with transformers to discover protocols. Experiments show this yields success and collision rates comparable to established synchronous methods while cutting message volume by 26 percent. A reader would care because real-world robot teams and vehicle fleets often face bandwidth limits or unreliable links that break traditional assumptions of constant synchronized exchange.

Core claim

AsynCoMARL is an asynchronous cooperative multi-agent reinforcement learning algorithm that models inter-agent communications as a dynamic graph with edges present only during communication events and applies graph transformers to learn protocols from those sparse structures; the resulting policies achieve success and collision rates similar to leading baselines while transmitting 26% fewer messages.

What carries the argument

Graph transformers operating on dynamic graphs whose edges form exclusively at communication events, allowing protocol learning from infrequent and asynchronous interactions.

If this is right

Agents can complete cooperative navigation tasks in unknown environments despite constrained and asynchronous communication.
Effective protocols can be learned from the sparse data of actual message exchanges rather than continuous synchronized channels.
Overall message traffic can be reduced without degrading task outcomes relative to synchronous baselines.
The approach supports operation in settings where communication links form and break dynamically.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-transformer structure might transfer to other multi-agent domains such as distributed sensing or logistics fleets.
Reducing message volume could lower energy use or spectrum congestion in large-scale deployments.
Further tests with physical robots would reveal whether simulation results hold when message delays and losses are real rather than modeled.
Combining the method with explicit bandwidth budgets could produce protocols that automatically respect hard communication limits.

Load-bearing premise

That a dynamic graph whose edges appear only during actual messages, processed by graph transformers, suffices to learn protocols that preserve performance when communications are infrequent and unsynchronized.

What would settle it

A controlled test in which AsynCoMARL produces noticeably lower success rates or higher collision rates than the baselines once message counts are restricted to the reported level.

Figures

Figures reproduced from arXiv: 2502.00558 by Hamsa Balakrishnan, Jasmine Jerry Aloor, Siddharth Nayak, Sydney Dolan.

**Figure 2.** Figure 2: Attention weights for agent 0 in the n = 5 agent Cooperative Navigation task. We compare the changes in graph transformer attention at three discrete periods during the episode at the beginning, middle, and end. communication frequencies of agents 0 and 2; both of these two agents communicated more frequently throughout the episode than agent 0 did with agents 3 and 4. We find that the graph transformer co… view at source ↗

read the original abstract

We consider the problem setting in which multiple autonomous agents must cooperatively navigate and perform tasks in an unknown, communication-constrained environment. Traditional multi-agent reinforcement learning (MARL) approaches assume synchronous communications and perform poorly in such environments. We propose AsynCoMARL, an asynchronous MARL approach that uses graph transformers to learn communication protocols from dynamic graphs. AsynCoMARL can accommodate infrequent and asynchronous communications between agents, with edges of the graph only forming when agents communicate with each other. We show that AsynCoMARL achieves similar success and collision rates as leading baselines, despite 26\% fewer messages being passed between agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AsynCoMARL applies dynamic graphs and graph transformers to asynchronous MARL, claiming similar task performance with 26% fewer messages than baselines.

read the letter

The core contribution is a method that builds communication graphs only at actual message events and processes them with graph transformers to learn protocols under asynchrony and limited bandwidth. This directly targets the mismatch between standard synchronous MARL assumptions and real settings like robotics where agents cannot coordinate on a clock. The reported outcome—matching success and collision rates while cutting messages—is the concrete result worth checking. The construction itself looks internally consistent: edges appear only when communication happens, so the model does not rely on hidden synchronous assumptions. That matches the stress-test note. The main limitation is that the abstract supplies no information on environments, baselines, run counts, variance, or statistical tests, so the performance claim cannot be evaluated from the given text. If the full paper includes those details and they hold, the work is a modest but useful step for communication-constrained MARL. Readers already working on graph-based agent coordination or deployment constraints will find the most value; others can skip it. The paper is coherent on its own terms and addresses a practical gap, so it should go to peer review rather than desk rejection.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes AsynCoMARL, an asynchronous cooperative multi-agent reinforcement learning method that models communication via dynamic graphs (with edges forming only at actual communication events) processed by graph transformers. It claims to achieve similar success and collision rates to leading baselines while using 26% fewer messages in unknown, communication-constrained environments.

Significance. If the empirical performance claims hold under rigorous validation, the work addresses a relevant gap in MARL by relaxing the synchronous communication assumption common in prior methods, potentially enabling more practical deployments in bandwidth-limited settings. The dynamic-graph construction is internally consistent with the asynchronous problem formulation.

major comments (1)

[Abstract] Abstract: The central claim of similar success/collision rates with 26% fewer messages is presented without any details on experimental environments, baselines, trial counts, variance, or statistical tests. This omission prevents verification of the data-to-claim link and is load-bearing for the paper's primary contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment point by point below and will incorporate revisions where appropriate to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of similar success/collision rates with 26% fewer messages is presented without any details on experimental environments, baselines, trial counts, variance, or statistical tests. This omission prevents verification of the data-to-claim link and is load-bearing for the paper's primary contribution.

Authors: We agree that the abstract's brevity limits the inclusion of full experimental details, which are instead provided in the Experiments section (including environments such as multi-agent navigation tasks in unknown settings, specific baselines, trial counts, reported variance, and statistical comparisons). To directly address the concern, we will revise the abstract to incorporate a concise reference to the experimental setup and key validation aspects (e.g., trial counts and environments) while preserving its length constraints. This revision will make the central claim more self-contained without misrepresenting the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claim is an empirical performance comparison (similar success/collision rates with 26% fewer messages) between AsynCoMARL and baselines. The approach models asynchronous communication via dynamic graphs with edges only at actual communication events, processed by graph transformers; this is a direct modeling choice for the problem setting rather than a derived result. No equations, parameter fits presented as predictions, uniqueness theorems, or self-citation chains appear in the provided text. The derivation chain is self-contained as an algorithmic proposal validated by experiments, with no reductions of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters or invented entities are described; the primary domain assumption is the adequacy of the dynamic-graph-plus-transformer model for learning asynchronous protocols.

axioms (1)

domain assumption Graph transformers applied to dynamic graphs (edges present only on actual communication) can learn effective communication protocols that preserve task performance under asynchrony.
This modeling choice is the core premise enabling the asynchronous capability claimed in the abstract.

pith-pipeline@v0.9.0 · 5636 in / 1277 out tokens · 43565 ms · 2026-05-23T03:33:05.792631+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

edges of the graph only forming when agents communicate with each other... graph transformer... UniMP multi-head dot product attention
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dynamic weighted directed graph... masked adjacency matrix Amasked = A ◦ D

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 7 internal anchors

[1]

Nesnas, Lorraine M

Issa A.D. Nesnas, Lorraine M. Fesq, and Richard A. V olpe. Autonomy for space robots: Past, present, and future. Current Robotics Reports, 2(3):251–263, Jun 2021. doi:10.1007/s43154-021-00057-2. pages 1

work page doi:10.1007/s43154-021-00057-2 2021
[2]

Coordination of marine multi robot systems with communication constraints

Antoni Martorell-Torres, José Guerrero-Sastre, and Gabriel Oliver-Codina. Coordination of marine multi robot systems with communication constraints. Applied Ocean Research , 142:103848, 2024. ISSN 0141-

work page 2024
[3]

URL https://www.sciencedirect.com/science/ article/pii/S0141118723003899

doi:https://doi.org/10.1016/j.apor.2023.103848. URL https://www.sciencedirect.com/science/ article/pii/S0141118723003899. pages 1

work page doi:10.1016/j.apor.2023.103848 2023
[4]

Burgard, M

W. Burgard, M. Moors, C. Stachniss, and F.E. Schneider. Coordinated multi-robot exploration.IEEE Transactions on Robotics, 21(3):376–386, 2005. doi:10.1109/TRO.2004.839232. pages 1

work page doi:10.1109/tro.2004.839232 2005
[5]

Satellite navigation and coordination with limited information sharing, 2023

Sydney Dolan, Siddharth Nayak, and Hamsa Balakrishnan. Satellite navigation and coordination with limited information sharing, 2023. pages 2, 7

work page 2023
[6]

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Shariq Iqbal and Fei Sha. Actor-attention-critic for multi-agent reinforcement learning, 2019. URL https: //arxiv.org/abs/1810.02912. pages 2, 7, 8, 9

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

A survey of multi-agent reinforcement learning with communi- cation, 2022

Changxi Zhu, Mehdi Dastani, and Shihan Wang. A survey of multi-agent reinforcement learning with communi- cation, 2022. URL https://arxiv.org/abs/2203.08975. pages 2

work page arXiv 2022
[8]

Learning multiagent communication with backpropagation,

Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. Learning multiagent communication with backpropagation,

work page
[9]

URL https://arxiv.org/abs/1605.07736. pages 2

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks, 2018. URL https://arxiv.org/abs/1812.09755. pages 2 12 AsynCoMARL A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Learning Attentional Communication for Multi-Agent Cooperation

Jiechuan Jiang and Zongqing Lu. Learning attentional communication for multi-agent cooperation. CoRR, abs/1805.07733, 2018. URL http://arxiv.org/abs/1805.07733. pages 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Rabbat, and Joelle Pineau

Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael G. Rabbat, and Joelle Pineau. Tarmac: Targeted multi-agent communication. CoRR, abs/1810.11187, 2018. URL http://arxiv.org/ abs/1810.11187. pages 2

work page arXiv 2018
[13]

Graph convolutional reinforcement learning

Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. Graph convolutional reinforcement learning. In ICLR, 2020. pages 2, 8

work page 2020
[14]

Gupta, Peter Morales, Ross Allen, and Mykel J

Sheng Li, Jayesh K. Gupta, Peter Morales, Ross Allen, and Mykel J. Kochenderfer. Deep implicit coordination graphs for multi-agent reinforcement learning, 2021. URL https://arxiv.org/abs/2006.11438. pages 2

work page arXiv 2021
[15]

Scalable multi-agent reinforcement learning through intelligent information aggregation

Siddharth Nayak, Kenneth Choi, Wenqi Ding, Sydney Dolan, Karthik Gopalakrishnan, and Hamsa Balakrishnan. Scalable multi-agent reinforcement learning through intelligent information aggregation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conferenc...

work page 2023
[16]

Multi-agent graph-attention communication and teaming

Yaru Niu, Rohan R Paleja, and Matthew C Gombolay. Multi-agent graph-attention communication and teaming. In AAMAS, volume 21, page 20th, 2021. pages 2

work page 2021
[17]

Learning transferable cooperative behavior in multi-agent teams, 2019

Akshat Agarwal, Sumit Kumar, and Katia Sycara. Learning transferable cooperative behavior in multi-agent teams, 2019. pages 2, 4

work page 2019
[18]

Event-triggered communication and control of networked systems for multi-agent consensus

Cameron Nowzari, Eloy Garcia, and Jorge Cortés. Event-triggered communication and control of networked systems for multi-agent consensus. Automatica, 105:1–27, 2019. ISSN 0005-1098. doi:https://doi.org/10.1016/j.automatica.2019.03.009. URL https://www.sciencedirect.com/science/ article/pii/S000510981930130X. pages 2

work page doi:10.1016/j.automatica.2019.03.009 2019
[19]

Event-triggered multi- agent reinforcement learning with communication under limited-bandwidth constraint, 2020

Guangzheng Hu, Yuanheng Zhu, Dongbin Zhao, Mengchen Zhao, and Jianye Hao. Event-triggered multi- agent reinforcement learning with communication under limited-bandwidth constraint, 2020. URL https: //arxiv.org/abs/2010.04978. pages 2

work page arXiv 2020
[20]

Efficient communication in multi-agent reinforcement learning via variance based control, 2019

Sai Qian Zhang, Qi Zhang, and Jieyu Lin. Efficient communication in multi-agent reinforcement learning via variance based control, 2019. URL https://arxiv.org/abs/1909.02682. pages 2

work page arXiv 2019
[21]

Model-based sparse communication in multi-agent reinforcement learning

Shuai Han, Mehdi Dastani, and Shihan Wang. Model-based sparse communication in multi-agent reinforcement learning. AAMAS ’23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, page 439–447, May 2023. pages 2

work page 2023
[22]

Bono, Brendan D

Kunal Menda, Yi-Chun Chen, Justin Grana, James W. Bono, Brendan D. Tracey, Mykel J. Kochenderfer, and David Wolpert. Deep reinforcement learning for event-driven multi-agent decision processes. IEEE Transactions on In- telligent Transportation Systems, 20(4):1259–1268, April 2019. ISSN 1558-0016. doi:10.1109/tits.2018.2848264. URL http://dx.doi.org/10.110...

work page doi:10.1109/tits.2018.2848264 2019
[23]

Konidaris, and Leslie Kaelbling

Christopher Amato, George D. Konidaris, and Leslie Kaelbling. Planning with macro-actions in decentralized pomdps. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems , 2012. pages 3

work page 2012
[24]

Asynchronous actor-critic for multi-agent reinforcement learning, 2022

Yuchen Xiao, Weihao Tan, and Christopher Amato. Asynchronous actor-critic for multi-agent reinforcement learning, 2022. URL https://arxiv.org/abs/2209.10113. pages 3

work page arXiv 2022
[25]

Agent-oriented centralized critic for asynchronous multi-agent reinforcement learning

Sunghoon Hong, Whiyoung Jung, Deunsol Yoon, Kanghoon Lee, and Woohyung Lim. Agent-oriented centralized critic for asynchronous multi-agent reinforcement learning. In The Sixteenth Workshop on Adaptive and Learning Agents, 2024. URL https://openreview.net/forum?id=qfAY7DoJaD. pages 3

work page 2024
[26]

Multi-agent actor-critic for mixed cooperative-competitive environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR, abs/1706.02275, 2017. URL http://arxiv.org/abs/1706. 02275. pages 4

work page arXiv 2017
[27]

The surprising effectiveness of PPO in cooperative multi-agent games

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of PPO in cooperative multi-agent games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track , 2022. URL https://openreview.net/forum?id= YVXaxB6L2Pl. pages 4, 8

work page 2022
[28]

Masked label prediction: Unified message passing model for semi-supervised classification

Yunsheng Shi, Zhengjie Huang, Wenjin Wang, Hui Zhong, Shikun Feng, and Yu Sun. Masked label prediction: Unified massage passing model for semi-supervised classification. CoRR, abs/2009.03509, 2020. URL https: //arxiv.org/abs/2009.03509. pages 5 13 AsynCoMARL A PREPRINT

work page arXiv 2009
[29]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation, 2018. URL https://arxiv.org/abs/1506.02438. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. URL https://arxiv. org/abs/1412.6980. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Multi-agent actor-critic for mixed cooperative-competitive environments, 2020

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments, 2020. pages 7

work page 2020
[33]

Vallado and Wayne D

David A. Vallado and Wayne D. McClain. Fundamentals of astrodynamics and applications . Microcosm Press,

work page
[34]

Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems

Matteo Gallici, Mario Martin, and Ivan Masmitja. Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems. arXiv preprint arXiv:2301.05334, 2023. pages 7, 8, 9

work page arXiv 2023
[35]

Gcs: Graph-based coordination strategy for multi-agent reinforcement learning, 2022

Jingqing Ruan, Yali Du, Xuantang Xiong, Dengpeng Xing, Xiyun Li, Linghui Meng, Haifeng Zhang, Jun Wang, and Bo Xu. Gcs: Graph-based coordination strategy for multi-agent reinforcement learning, 2022. URL https://arxiv.org/abs/2201.06257. pages 8, 16

work page arXiv 2022
[36]

Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration, 2023

Chao Yu, Xinyi Yang, Jiaxuan Gao, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin Huang, Huazhong Yang, Yi Wu, and Yu Wang. Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration, 2023. URL https://arxiv.org/abs/2301.03398. pages 8, 9

work page arXiv 2023
[37]

Context-aware communication for multi-agent reinforcement learning, 2024

Xinran Li and Jun Zhang. Context-aware communication for multi-agent reinforcement learning, 2024. URL https://arxiv.org/abs/2312.15600. pages 8

work page arXiv 2024
[38]

Interactive supercomputing on 40,000 cores for machine learning and data analysis

Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Charles Yee, and Peter Michaleas. Interactive supercomputing on 40,000 cores for machine learning and data analysis...

work page 2018
[39]

Jennifer A Roberts and Peter C. E. Roberts. The development of high fidelity linearised j2 models for satellite formation flying control. In 14th AAS/AIAA Space Flight Mechanics Meeting , Feb 2004. pages 15

work page 2004
[40]

Orbit perturbations

Ulrich Walter. Orbit perturbations. Astronautics, page 555–660, 2018. doi:10.1007/978-3-319-74373-8_12. pages 15 14 AsynCoMARL A PREPRINT 8 Appendix 8.1 Baseline Implementation Details We rely on the following implementations for each baseline and provide links to those implementations here. Note that we used the same hyperparameters as used in their orig...

work page doi:10.1007/978-3-319-74373-8_12 2018
[41]

asyncMAPPO: https://github.com/yang-xy20/async_mappo/tree/main

work page
[42]

GCS: https://github.com/LXXXXR/GCS_aamas337/tree/master

work page
[43]

DGN: https://github.com/jiechuanjiang/pytorch_DGN

work page
[44]

CACOM: https://github.com/LXXXXR/CACOM/tree/main

work page
[45]

Actor-Attention Critic: https://github.com/shariqiqbal2810/MAAC/tree/master

work page
[46]

TransfQmix: https://github.com/mttga/pymarl_transformers/tree/main 8.2 Environment Implementation Details We rely on the following implementations for the two environments we used in our experiments

work page
[47]

Cooperative Navigation: https://github.com/sydneyid/satellite-cooperative-nav

work page
[48]

Each agent is supposed to go to its distinct goal while avoiding collisions with other entities in the environment

Rover-Tower: https://github.com/shariqiqbal2810/MAAC/tree/master 8.3 Cooperative Navigation Environment Description There are n agents and n goals, along with static obstacles in the environment. Each agent is supposed to go to its distinct goal while avoiding collisions with other entities in the environment. Agents start at random locations at the begin...

work page
[49]

torch-geometric = 2.3.1

work page
[50]

tensorboardX = 2.6.2.2

work page
[51]

wandb = 0.17.4 8.7 Hyperparameters Common Hyperparameters Value number of att heads 3 GAT Encoder num heads 4 num layers 4 decoder hidden dim 64 Table 7: Common Hyperparameters used in GCS Common Hyperparameters Value recurrent data chunk length 10 gradient clip norm 10.0 gae lambda 0.95 gamma 0.99 value loss Huber loss huber delta 10.0 batch size num env...

work page

[1] [1]

Nesnas, Lorraine M

Issa A.D. Nesnas, Lorraine M. Fesq, and Richard A. V olpe. Autonomy for space robots: Past, present, and future. Current Robotics Reports, 2(3):251–263, Jun 2021. doi:10.1007/s43154-021-00057-2. pages 1

work page doi:10.1007/s43154-021-00057-2 2021

[2] [2]

Coordination of marine multi robot systems with communication constraints

Antoni Martorell-Torres, José Guerrero-Sastre, and Gabriel Oliver-Codina. Coordination of marine multi robot systems with communication constraints. Applied Ocean Research , 142:103848, 2024. ISSN 0141-

work page 2024

[3] [3]

URL https://www.sciencedirect.com/science/ article/pii/S0141118723003899

doi:https://doi.org/10.1016/j.apor.2023.103848. URL https://www.sciencedirect.com/science/ article/pii/S0141118723003899. pages 1

work page doi:10.1016/j.apor.2023.103848 2023

[4] [4]

Burgard, M

W. Burgard, M. Moors, C. Stachniss, and F.E. Schneider. Coordinated multi-robot exploration.IEEE Transactions on Robotics, 21(3):376–386, 2005. doi:10.1109/TRO.2004.839232. pages 1

work page doi:10.1109/tro.2004.839232 2005

[5] [5]

Satellite navigation and coordination with limited information sharing, 2023

Sydney Dolan, Siddharth Nayak, and Hamsa Balakrishnan. Satellite navigation and coordination with limited information sharing, 2023. pages 2, 7

work page 2023

[6] [6]

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Shariq Iqbal and Fei Sha. Actor-attention-critic for multi-agent reinforcement learning, 2019. URL https: //arxiv.org/abs/1810.02912. pages 2, 7, 8, 9

work page internal anchor Pith review Pith/arXiv arXiv 2019

[7] [7]

A survey of multi-agent reinforcement learning with communi- cation, 2022

Changxi Zhu, Mehdi Dastani, and Shihan Wang. A survey of multi-agent reinforcement learning with communi- cation, 2022. URL https://arxiv.org/abs/2203.08975. pages 2

work page arXiv 2022

[8] [8]

Learning multiagent communication with backpropagation,

Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. Learning multiagent communication with backpropagation,

work page

[9] [9]

URL https://arxiv.org/abs/1605.07736. pages 2

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks, 2018. URL https://arxiv.org/abs/1812.09755. pages 2 12 AsynCoMARL A PREPRINT

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Learning Attentional Communication for Multi-Agent Cooperation

Jiechuan Jiang and Zongqing Lu. Learning attentional communication for multi-agent cooperation. CoRR, abs/1805.07733, 2018. URL http://arxiv.org/abs/1805.07733. pages 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Rabbat, and Joelle Pineau

Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael G. Rabbat, and Joelle Pineau. Tarmac: Targeted multi-agent communication. CoRR, abs/1810.11187, 2018. URL http://arxiv.org/ abs/1810.11187. pages 2

work page arXiv 2018

[13] [13]

Graph convolutional reinforcement learning

Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. Graph convolutional reinforcement learning. In ICLR, 2020. pages 2, 8

work page 2020

[14] [14]

Gupta, Peter Morales, Ross Allen, and Mykel J

Sheng Li, Jayesh K. Gupta, Peter Morales, Ross Allen, and Mykel J. Kochenderfer. Deep implicit coordination graphs for multi-agent reinforcement learning, 2021. URL https://arxiv.org/abs/2006.11438. pages 2

work page arXiv 2021

[15] [15]

Scalable multi-agent reinforcement learning through intelligent information aggregation

Siddharth Nayak, Kenneth Choi, Wenqi Ding, Sydney Dolan, Karthik Gopalakrishnan, and Hamsa Balakrishnan. Scalable multi-agent reinforcement learning through intelligent information aggregation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conferenc...

work page 2023

[16] [16]

Multi-agent graph-attention communication and teaming

Yaru Niu, Rohan R Paleja, and Matthew C Gombolay. Multi-agent graph-attention communication and teaming. In AAMAS, volume 21, page 20th, 2021. pages 2

work page 2021

[17] [17]

Learning transferable cooperative behavior in multi-agent teams, 2019

Akshat Agarwal, Sumit Kumar, and Katia Sycara. Learning transferable cooperative behavior in multi-agent teams, 2019. pages 2, 4

work page 2019

[18] [18]

Event-triggered communication and control of networked systems for multi-agent consensus

Cameron Nowzari, Eloy Garcia, and Jorge Cortés. Event-triggered communication and control of networked systems for multi-agent consensus. Automatica, 105:1–27, 2019. ISSN 0005-1098. doi:https://doi.org/10.1016/j.automatica.2019.03.009. URL https://www.sciencedirect.com/science/ article/pii/S000510981930130X. pages 2

work page doi:10.1016/j.automatica.2019.03.009 2019

[19] [19]

Event-triggered multi- agent reinforcement learning with communication under limited-bandwidth constraint, 2020

Guangzheng Hu, Yuanheng Zhu, Dongbin Zhao, Mengchen Zhao, and Jianye Hao. Event-triggered multi- agent reinforcement learning with communication under limited-bandwidth constraint, 2020. URL https: //arxiv.org/abs/2010.04978. pages 2

work page arXiv 2020

[20] [20]

Efficient communication in multi-agent reinforcement learning via variance based control, 2019

Sai Qian Zhang, Qi Zhang, and Jieyu Lin. Efficient communication in multi-agent reinforcement learning via variance based control, 2019. URL https://arxiv.org/abs/1909.02682. pages 2

work page arXiv 2019

[21] [21]

Model-based sparse communication in multi-agent reinforcement learning

Shuai Han, Mehdi Dastani, and Shihan Wang. Model-based sparse communication in multi-agent reinforcement learning. AAMAS ’23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, page 439–447, May 2023. pages 2

work page 2023

[22] [22]

Bono, Brendan D

Kunal Menda, Yi-Chun Chen, Justin Grana, James W. Bono, Brendan D. Tracey, Mykel J. Kochenderfer, and David Wolpert. Deep reinforcement learning for event-driven multi-agent decision processes. IEEE Transactions on In- telligent Transportation Systems, 20(4):1259–1268, April 2019. ISSN 1558-0016. doi:10.1109/tits.2018.2848264. URL http://dx.doi.org/10.110...

work page doi:10.1109/tits.2018.2848264 2019

[23] [23]

Konidaris, and Leslie Kaelbling

Christopher Amato, George D. Konidaris, and Leslie Kaelbling. Planning with macro-actions in decentralized pomdps. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems , 2012. pages 3

work page 2012

[24] [24]

Asynchronous actor-critic for multi-agent reinforcement learning, 2022

Yuchen Xiao, Weihao Tan, and Christopher Amato. Asynchronous actor-critic for multi-agent reinforcement learning, 2022. URL https://arxiv.org/abs/2209.10113. pages 3

work page arXiv 2022

[25] [25]

Agent-oriented centralized critic for asynchronous multi-agent reinforcement learning

Sunghoon Hong, Whiyoung Jung, Deunsol Yoon, Kanghoon Lee, and Woohyung Lim. Agent-oriented centralized critic for asynchronous multi-agent reinforcement learning. In The Sixteenth Workshop on Adaptive and Learning Agents, 2024. URL https://openreview.net/forum?id=qfAY7DoJaD. pages 3

work page 2024

[26] [26]

Multi-agent actor-critic for mixed cooperative-competitive environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR, abs/1706.02275, 2017. URL http://arxiv.org/abs/1706. 02275. pages 4

work page arXiv 2017

[27] [27]

The surprising effectiveness of PPO in cooperative multi-agent games

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of PPO in cooperative multi-agent games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track , 2022. URL https://openreview.net/forum?id= YVXaxB6L2Pl. pages 4, 8

work page 2022

[28] [28]

Masked label prediction: Unified message passing model for semi-supervised classification

Yunsheng Shi, Zhengjie Huang, Wenjin Wang, Hui Zhong, Shikun Feng, and Yu Sun. Masked label prediction: Unified massage passing model for semi-supervised classification. CoRR, abs/2009.03509, 2020. URL https: //arxiv.org/abs/2009.03509. pages 5 13 AsynCoMARL A PREPRINT

work page arXiv 2009

[29] [29]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation, 2018. URL https://arxiv.org/abs/1506.02438. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. URL https://arxiv. org/abs/1412.6980. pages 6

work page internal anchor Pith review Pith/arXiv arXiv 2017

[32] [32]

Multi-agent actor-critic for mixed cooperative-competitive environments, 2020

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments, 2020. pages 7

work page 2020

[33] [33]

Vallado and Wayne D

David A. Vallado and Wayne D. McClain. Fundamentals of astrodynamics and applications . Microcosm Press,

work page

[34] [34]

Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems

Matteo Gallici, Mario Martin, and Ivan Masmitja. Transfqmix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems. arXiv preprint arXiv:2301.05334, 2023. pages 7, 8, 9

work page arXiv 2023

[35] [35]

Gcs: Graph-based coordination strategy for multi-agent reinforcement learning, 2022

Jingqing Ruan, Yali Du, Xuantang Xiong, Dengpeng Xing, Xiyun Li, Linghui Meng, Haifeng Zhang, Jun Wang, and Bo Xu. Gcs: Graph-based coordination strategy for multi-agent reinforcement learning, 2022. URL https://arxiv.org/abs/2201.06257. pages 8, 16

work page arXiv 2022

[36] [36]

Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration, 2023

Chao Yu, Xinyi Yang, Jiaxuan Gao, Jiayu Chen, Yunfei Li, Jijia Liu, Yunfei Xiang, Ruixin Huang, Huazhong Yang, Yi Wu, and Yu Wang. Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration, 2023. URL https://arxiv.org/abs/2301.03398. pages 8, 9

work page arXiv 2023

[37] [37]

Context-aware communication for multi-agent reinforcement learning, 2024

Xinran Li and Jun Zhang. Context-aware communication for multi-agent reinforcement learning, 2024. URL https://arxiv.org/abs/2312.15600. pages 8

work page arXiv 2024

[38] [38]

Interactive supercomputing on 40,000 cores for machine learning and data analysis

Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Charles Yee, and Peter Michaleas. Interactive supercomputing on 40,000 cores for machine learning and data analysis...

work page 2018

[39] [39]

Jennifer A Roberts and Peter C. E. Roberts. The development of high fidelity linearised j2 models for satellite formation flying control. In 14th AAS/AIAA Space Flight Mechanics Meeting , Feb 2004. pages 15

work page 2004

[40] [40]

Orbit perturbations

Ulrich Walter. Orbit perturbations. Astronautics, page 555–660, 2018. doi:10.1007/978-3-319-74373-8_12. pages 15 14 AsynCoMARL A PREPRINT 8 Appendix 8.1 Baseline Implementation Details We rely on the following implementations for each baseline and provide links to those implementations here. Note that we used the same hyperparameters as used in their orig...

work page doi:10.1007/978-3-319-74373-8_12 2018

[41] [41]

asyncMAPPO: https://github.com/yang-xy20/async_mappo/tree/main

work page

[42] [42]

GCS: https://github.com/LXXXXR/GCS_aamas337/tree/master

work page

[43] [43]

DGN: https://github.com/jiechuanjiang/pytorch_DGN

work page

[44] [44]

CACOM: https://github.com/LXXXXR/CACOM/tree/main

work page

[45] [45]

Actor-Attention Critic: https://github.com/shariqiqbal2810/MAAC/tree/master

work page

[46] [46]

TransfQmix: https://github.com/mttga/pymarl_transformers/tree/main 8.2 Environment Implementation Details We rely on the following implementations for the two environments we used in our experiments

work page

[47] [47]

Cooperative Navigation: https://github.com/sydneyid/satellite-cooperative-nav

work page

[48] [48]

Each agent is supposed to go to its distinct goal while avoiding collisions with other entities in the environment

Rover-Tower: https://github.com/shariqiqbal2810/MAAC/tree/master 8.3 Cooperative Navigation Environment Description There are n agents and n goals, along with static obstacles in the environment. Each agent is supposed to go to its distinct goal while avoiding collisions with other entities in the environment. Agents start at random locations at the begin...

work page

[49] [49]

torch-geometric = 2.3.1

work page

[50] [50]

tensorboardX = 2.6.2.2

work page

[51] [51]

wandb = 0.17.4 8.7 Hyperparameters Common Hyperparameters Value number of att heads 3 GAT Encoder num heads 4 num layers 4 decoder hidden dim 64 Table 7: Common Hyperparameters used in GCS Common Hyperparameters Value recurrent data chunk length 10 gradient clip norm 10.0 gae lambda 0.95 gamma 0.99 value loss Huber loss huber delta 10.0 batch size num env...

work page