Heterogeneous Policy Networks for Composite Robot Team Communication and Coordination
Pith reviewed 2026-06-26 16:47 UTC · model grok-4.3
The pith
Heterogeneous robot teams learn to coordinate using binarized messages that cut bandwidth 200-fold while raising performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HetNet extends heterogeneous graph-attention networks to support scaling heterogeneous robot teams and end-to-end training for learning highly efficient binarized messaging, enabling diverse communication models that coordinate cooperative heterogeneous teams more effectively than homogeneous approaches.
What carries the argument
Heterogeneous graph-attention networks extended for binarized messaging and scalable coordination in HetNet.
If this is right
- Communication shifts from detrimental to helpful once agent heterogeneity is modeled explicitly.
- Binarized messaging achieves a 200x reduction in required bandwidth while maintaining or improving team utility.
- Performance gains range from 5.84% to 707.65% over the next-best baseline across tested domains.
- The graph structure enables end-to-end training and scaling beyond prior homogeneous network limits.
Where Pith is reading between the lines
- The binarization approach could extend to other bandwidth-limited multi-agent settings such as drone fleets or vehicle platoons.
- Physical robot experiments would test whether simulated bandwidth savings survive real-world channel noise and delays.
- The same graph-attention structure might help mixed human-robot teams where roles differ sharply.
Load-bearing premise
Heterogeneous graph-attention networks can be extended to support scaling and end-to-end training of binarized messaging while preserving coordination performance.
What would settle it
A new heterogeneous robot coordination task where the HetNet method shows no performance gain over the next-best baseline or fails to reduce communication bandwidth by a large factor would falsify the claim.
Figures
read the original abstract
High-performing human-human teams learn intelligent and efficient communication and coordination strategies to maximize their joint utility. These teams implicitly understand the different roles of heterogeneous team members and adapt their communication protocols accordingly. Multi-Agent Reinforcement Learning (MARL) has attempted to develop computational methods for synthesizing such joint coordination-communication strategies, but emulating heterogeneous communication patterns across agents with different state, action, and observation spaces has remained a challenge. Without properly modeling agent heterogeneity, as in prior MARL work that leverages homogeneous graph networks, communication becomes less helpful and can even deteriorate the team's performance. In the past, we proposed Heterogeneous Policy Networks (HetNet) to learn efficient and diverse communication models for coordinating cooperative heterogeneous teams. In this extended work, we extend Heterogeneous Policy Networks (HetNet) to support scaling heterogeneous robot teams. Building on heterogeneous graph-attention networks, we show that HetNet not only facilitates learning heterogeneous collaborative policies but also enables end-to-end training for learning highly efficient binarized messaging. Our empirical evaluation shows that HetNet sets a new state of the art in learning coordination and communication strategies for heterogeneous multi-agent teams by achieving an 5.84% to 707.65% performance improvement over the next-best baseline across multiple domains while simultaneously achieving a 200x reduction in the required communication bandwidth.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends the authors' prior Heterogeneous Policy Networks (HetNet) work to support scaling in heterogeneous robot teams. Building on heterogeneous graph-attention networks, it claims to enable learning of heterogeneous collaborative policies and end-to-end training of binarized messaging. The central empirical claim is that HetNet achieves 5.84% to 707.65% performance improvement over the next-best baseline across multiple domains while delivering a 200x reduction in communication bandwidth, establishing a new state of the art for coordination and communication in heterogeneous multi-agent teams.
Significance. If the reported performance gains and bandwidth reduction are reproducible with rigorous experimental controls, the work would advance MARL methods for heterogeneous teams by addressing role-specific communication and enabling efficient binarized protocols. The dual benefit of improved joint utility and extreme compression is relevant for real-world robot team deployment where bandwidth is constrained.
major comments (2)
- [Abstract] Abstract: the central performance claim (5.84%–707.65% gains and 200× bandwidth reduction) is presented without any description of experimental setup, domains, baselines, number of trials, statistical tests, or variance; this renders the SOTA assertion unverifiable from the provided text and places the load-bearing empirical result on unexamined ground.
- The extension from prior HetNet work to heterogeneous graph-attention networks plus end-to-end binarized messaging is asserted to preserve coordination performance while scaling, but no architecture diagram, binarization mechanism (e.g., straight-through estimator details), training procedure, or ablation isolating the binarization effect is supplied; without these the weakest assumption cannot be checked.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback. We respond to each major comment below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (5.84%–707.65% gains and 200× bandwidth reduction) is presented without any description of experimental setup, domains, baselines, number of trials, statistical tests, or variance; this renders the SOTA assertion unverifiable from the provided text and places the load-bearing empirical result on unexamined ground.
Authors: Abstracts have strict length constraints and are designed to highlight key results rather than full methodology. The full manuscript details the experimental domains, baselines (including homogeneous graph networks), number of trials (20 independent runs with different random seeds), statistical significance testing, and variance reporting in Section 4. To address the concern, we will revise the abstract to include a concise reference to the evaluation protocol and domains. revision: yes
-
Referee: [—] The extension from prior HetNet work to heterogeneous graph-attention networks plus end-to-end binarized messaging is asserted to preserve coordination performance while scaling, but no architecture diagram, binarization mechanism (e.g., straight-through estimator details), training procedure, or ablation isolating the binarization effect is supplied; without these the weakest assumption cannot be checked.
Authors: The manuscript contains an architecture diagram (Figure 1) depicting the heterogeneous graph-attention layers and binarized messaging pathway. Section 3.2 specifies the straight-through estimator for binarization, the end-to-end training procedure is given in Algorithm 1 and Section 3.3, and Section 4.3 presents ablations isolating the binarization contribution. These elements are supplied in the full text; if any were unclear, we can add explicit cross-references in the revision. revision: no
Circularity Check
Minor self-citation of prior HetNet work; central claims rest on independent empirical results
full rationale
The paper cites its own prior HetNet work to establish the base heterogeneous policy network but grounds all new claims (scaling to larger teams, end-to-end binarized messaging, and reported performance/bandwidth gains) in fresh multi-domain experiments against external baselines. No load-bearing derivation step, equation, or uniqueness claim reduces by construction to a fitted parameter or self-citation chain. The empirical SOTA assertions are therefore externally falsifiable and do not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Heterogeneous agents with differing state, action, and observation spaces can be coordinated via graph-attention networks without loss of critical information in binarization.
Reference graph
Works this paper leans on
-
[1]
Communication overhead: The hidden cost of team cognition
J. MacMillan, E. E. Entin, and D. Serfaty, “Communication overhead: The hidden cost of team cognition.” 2004
2004
-
[2]
The influence of shared mental models on team process and performance
J. E. Mathieu, T. S. Heffner, G. F. Goodwin, E. Salas, and J. A. Cannon- Bowers, “The influence of shared mental models on team process and performance.”Journal of applied psychology, vol. 85, no. 2, p. 273, 2000
2000
-
[3]
Mixed-initiative multiagent apprenticeship learning for human training of robot teams,
E. Seraj, J. Y . Xiong, M. L. Schrum, and M. Gombolay, “Mixed-initiative multiagent apprenticeship learning for human training of robot teams,” inThirty-seventh Conference on Neural Information Processing Systems, 2023
2023
-
[4]
Toward an understanding of team performance and training
E. Salas, T. L. Dickinson, S. A. Converse, and S. I. Tannenbaum, “Toward an understanding of team performance and training.” 1992
1992
-
[5]
Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming,
S. Konan, E. Seraj, and M. Gombolay, “Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming,” 2022
2022
-
[6]
The effects of interpersonal communication style on task performance and well being,
H. Taylor, “The effects of interpersonal communication style on task performance and well being,” Ph.D. dissertation, University of Buck- ingham, 2007. 15
2007
-
[7]
A comprehensive survey of multiagent reinforcement learning,
L. Busoniu, R. Babuka, and B. D. Schutter, “A comprehensive survey of multiagent reinforcement learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, pp. 156–172, 2008
2008
-
[8]
Coordinating multi-agent reinforcement learning with limited communication
C. Zhang and V . R. Lesser, “Coordinating multi-agent reinforcement learning with limited communication.” inInternational Conference on Autonomous Agents and Multiagent Systems, 2013, pp. 1101–1108
2013
-
[9]
Coordinated multiagent reinforcement learning for teams of mobile sensing robots,
C. Yu, X. Wang, and Z. Feng, “Coordinated multiagent reinforcement learning for teams of mobile sensing robots,” inProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2297–2299
2019
-
[10]
Co- operative multi-agent deep reinforcement learning in soccer domains,
J. M. Catacora Ocana, F. Riccio, R. Capobianco, and D. Nardi, “Co- operative multi-agent deep reinforcement learning in soccer domains,” inProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 1865–1867
2019
-
[11]
Grand- master level in starcraft ii using multi-agent reinforcement learning,
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgievet al., “Grand- master level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019
2019
-
[12]
Dota 2 with large scale deep reinforcement learning,
C. Berner, G. Brockman, B. Chan, V . Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. J ´ozefowicz, S. Gray, C. Olsson, J. W. Pachocki, M. Petrov, H. P. de Oliveira Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang, “Dota 2 with large scale deep reinforcement le...
Pith/arXiv arXiv 1912
-
[13]
Multi-agent reinforcement learning: A selective overview of theories and algorithms,
K. Zhang, Z. Yang, and T. Bas ¸ar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,”Handbook of Rein- forcement Learning and Control, pp. 321–384, 2021
2021
-
[14]
Tarmac: Targeted multi-agent communication,
A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “Tarmac: Targeted multi-agent communication,” inInterna- tional Conference on Machine Learning. PMLR, 2019, pp. 1538–1546
2019
-
[15]
Learning correlated communication topology in multi- agent reinforcement learning,
Y . Du, B. Liu, V . Moens, Z. Liu, Z. Ren, J. Wang, X. Chen, and H. Zhang, “Learning correlated communication topology in multi- agent reinforcement learning,” inProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 2021, pp. 456–464
2021
-
[16]
Accnet: Actor-coordinator-critic net for
H. Mao, Z. Gong, Y . Ni, and Z. Xiao, “Accnet: Actor-coordinator-critic net for” learning-to-communicate” with deep multi-agent reinforcement learning,”arXiv preprint arXiv:1706.03235, 2017
Pith/arXiv arXiv 2017
-
[17]
Strata: unified framework for task assignments in large teams of heterogeneous agents
H. Ravichandar, K. Shaw, and S. Chernova, “Strata: unified framework for task assignments in large teams of heterogeneous agents.”(JAAMAS), vol. 34, no. 2, p. 38, 2020
2020
-
[18]
Heterogeneous graph attention networks for learning diverse commu- nication,
E. Seraj, Z. Wang, R. Paleja, M. Sklar, A. Patel, and M. Gombolay, “Heterogeneous graph attention networks for learning diverse commu- nication,”arXiv preprint arXiv:2108.09568, 2021
arXiv 2021
-
[19]
A hierarchical coordination framework for joint perception-action tasks in composite robot teams,
E. Seraj, L. Chen, and M. C. Gombolay, “A hierarchical coordination framework for joint perception-action tasks in composite robot teams,” IEEE Transactions on Robotics, 2021
2021
-
[20]
Safe coordination of human-robot firefighting teams,
E. Seraj, A. Silva, and M. Gombolay, “Safe coordination of human-robot firefighting teams,”arXiv preprint arXiv:1903.06847, 2019
Pith/arXiv arXiv 1903
-
[21]
Intermittent connectivity maintenance with heterogeneous robots,
R. Aragues, D. V . Dimarogonas, P. Guallar, and C. Sagues, “Intermittent connectivity maintenance with heterogeneous robots,” IEEE Transactions on Robotics, vol. 37, pp. 225–245, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:226526712
2021
-
[22]
Multi-uav planning for coopera- tive wildfire coverage and tracking with quality-of-service guarantees,
E. Seraj, A. Silva, and M. Gombolay, “Multi-uav planning for coopera- tive wildfire coverage and tracking with quality-of-service guarantees,” Autonomous Agents and Multi-Agent Systems, vol. 36, no. 2, p. 39, 2022
2022
-
[23]
Learning efficient diverse communication for cooperative heterogeneous teaming,
E. Seraj, Z. Wang, R. R. Paleja, D. Martin, M. Sklar, A. Patel, and M. C. Gombolay, “Learning efficient diverse communication for cooperative heterogeneous teaming,” inAdaptive Agents and Multi-Agent Systems, 2022
2022
-
[24]
Learning when to communicate at scale in multiagent cooperative and competitive tasks,
A. Singh, T. Jain, and S. Sukhbaatar, “Learning when to communicate at scale in multiagent cooperative and competitive tasks,”arXiv preprint arXiv:1812.09755, 2018
Pith/arXiv arXiv 2018
-
[25]
Learning multiagent communication with backpropagation,
S. Sukhbaatar, R. Ferguset al., “Learning multiagent communication with backpropagation,” inAdvances in Neural Information Processing Systems, 2016, pp. 2244–2252
2016
-
[26]
Multi-agent game abstraction via graph attention neural network,
Y . Liu, W. Wang, Y . Hu, J. Hao, X. Chen, and Y . Gao, “Multi-agent game abstraction via graph attention neural network,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7211–7218
2020
-
[27]
Learning attentional communication for multi- agent cooperation,
J. Jiang and Z. Lu, “Learning attentional communication for multi- agent cooperation,”Advances in Neural Information Processing Systems, vol. 31, pp. 7254–7264, 2018
2018
-
[28]
Multi-agent reinforcement learning for networked system control,
T. Chu, S. Chinchali, and S. Katti, “Multi-agent reinforcement learning for networked system control,” inInternational Conference on Learning Representations, 2019
2019
-
[29]
Stigmergic independent rein- forcement learning for multiagent collaboration,
X. Xu, R. Li, Z. Zhao, and H. Zhang, “Stigmergic independent rein- forcement learning for multiagent collaboration,”IEEE Transactions on Neural Networks and Learning Systems, 2021
2021
-
[30]
Embodied, intelligent communication for multi-agent cooper- ation,
E. Seraj, “Embodied, intelligent communication for multi-agent cooper- ation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 13, 2023, pp. 16 135–16 136
2023
-
[31]
Improving coordination in small-scale multi- agent deep reinforcement learning through memory-driven communica- tion,
E. Pesce and G. Montana, “Improving coordination in small-scale multi- agent deep reinforcement learning through memory-driven communica- tion,”Machine Learning, pp. 1–21, 2020
2020
-
[32]
Coordinated control of uavs for human- centered active sensing of wildfires,
E. Seraj and M. Gombolay, “Coordinated control of uavs for human- centered active sensing of wildfires,” in2020 American Control Confer- ence (ACC). IEEE, 2020, pp. 1845–1852
2020
-
[33]
Learning to communicate with deep multi-agent reinforcement learning,
J. Foerster, I. A. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,”Advances in Neural Information Processing Systems, vol. 29, pp. 2137–2145, 2016
2016
-
[34]
Learning to schedule communication in multi-agent reinforcement learning,
D. Kim, S. Moon, D. Hostallero, W. J. Kang, T. Lee, K. Son, and Y . Yi, “Learning to schedule communication in multi-agent reinforcement learning,” inInternational Conference on Learning Representations, 2018
2018
-
[35]
Multi-agent graph-attention communication and teaming,
Y . Niu, R. Paleja, and M. Gombolay, “Multi-agent graph-attention communication and teaming,” inProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 2021, pp. 964–973
2021
-
[36]
Counterfactual multi-agent policy gradients,
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018
2018
-
[37]
Communication topologies between learning agents in deep reinforcement learning,
D. Adjodah, D. Calacci, A. Dubey, A. Goyal, P. Krafft, E. Moro, and A. Pentland, “Communication topologies between learning agents in deep reinforcement learning,”arXiv preprint arXiv:1902.06740, 2019
arXiv 1902
-
[38]
Graph convolutional reinforce- ment learning,
J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph convolutional reinforce- ment learning,” inInternational Conference on Learning Representa- tions (ICLR), 2020
2020
-
[39]
Learning structured communication for multi-agent reinforce- ment learning,
J. Sheng, X. Wang, B. Jin, J. Yan, W. Li, T.-H. Chang, J. Wang, and H. Zha, “Learning structured communication for multi-agent reinforce- ment learning,”arXiv preprint arXiv:2002.04235, 2020
arXiv 2002
-
[40]
Multi- agent communication heterogeneity,
M. Bravo, J. A. Reyes-Ortiz, J. Rodr ´ıguez, and B. Silva-L´opez, “Multi- agent communication heterogeneity,” in2015 International Conference on Computational Science and Computational Intelligence (CSCI), 2015, pp. 583–588
2015
-
[41]
Distributed coordination of hetero- geneous multi-agent systems with output feedback control,
S. Xiong, Q. Wu, and Y . Wang, “Distributed coordination of hetero- geneous multi-agent systems with output feedback control,” in2019 IEEE International Conference on Unmanned Systems and Artificial Intelligence (ICUSAI). IEEE, 2019, pp. 106–111
2019
-
[42]
Towards heterogeneous multi-agent reinforcement learning with graph neural networks,
D. D. R. Meneghetti and R. A. d. C. Bianchi, “Towards heterogeneous multi-agent reinforcement learning with graph neural networks,”arXiv preprint arXiv:2009.13161, 2020
arXiv 2009
-
[43]
Multi-agent fault-tolerant reinforcement learning with noisy environments,
C. Luo, X. Liu, X. Chen, and J. Luo, “Multi-agent fault-tolerant reinforcement learning with noisy environments,” in2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), 2020, pp. 164–171
2020
-
[44]
Multi-agent deep reinforcement learning with extremely noisy observations,
O. Kilinc and G. Montana, “Multi-agent deep reinforcement learning with extremely noisy observations,” 2018
2018
-
[45]
Online planning for multi- agent systems with bounded communication,
F. Wu, S. Zilberstein, and X. Chen, “Online planning for multi- agent systems with bounded communication,”Artificial Intelligence, vol. 175, no. 2, pp. 487–511, 2011. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0004370210001578
2011
-
[46]
Planning not to talk: Multiagent systems that are robust to communication loss,
M. O. Karabag, C. Neary, and U. Topcu, “Planning not to talk: Multiagent systems that are robust to communication loss,” 2022
2022
-
[47]
Communication learning via backpropagation in discrete channels with unknown noise,
B. Freed, G. Sartoretti, J. Hu, and H. Choset, “Communication learning via backpropagation in discrete channels with unknown noise,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 7160–7168, Apr. 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/6205
2020
-
[48]
Learning- based physical layer communications for multi-agent collaboration,
A. Mostaani, O. Simeone, S. Chatzinotas, and B. Ottersten, “Learning- based physical layer communications for multi-agent collaboration,” 2019
2019
-
[49]
T.-Y . Tung, S. Kobus, J. R. Pujol, and D. Gunduz, “Effective communications: A joint learning and communication framework for multi-agent reinforcement learning over noisy channels,” 2021. [Online]. Available: https://arxiv.org/abs/2101.10369
arXiv 2021
-
[50]
Planning and acting in partially observable stochastic domains,
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,”Artif. Intell., vol. 101, pp. 99–134, 1998. 16
1998
-
[51]
R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction. MIT press, 2018
2018
-
[52]
High- dimensional continuous control using generalized advantage estimation,
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015
Pith/arXiv arXiv 2015
-
[53]
Graph neural networks: A review of methods and applications,
J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, 2020
2020
-
[54]
A comprehensive survey on graph neural networks,
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y . Philip, “A comprehensive survey on graph neural networks,”IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020
2020
-
[55]
Graph Attention Networks,
P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph Attention Networks,”International Conference on Learning Representations, 2018
2018
-
[56]
Learning scheduling policies for multi- robot coordination with graph attention networks,
Z. Wang and M. Gombolay, “Learning scheduling policies for multi- robot coordination with graph attention networks,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4509–4516, 2020
2020
-
[57]
Fully convolutional networks for semantic segmentation,
E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,”2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, 2014
2015
-
[58]
Categorical reparameterization with gumbel-softmax,
E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,”arXiv preprint arXiv:1611.01144, 2016
Pith/arXiv arXiv 2016
-
[59]
Collaboration of ai agents via cooperative multi-agent deep reinforcement learning,
N. Balachandar, J. Dieter, and G. S. Ramachandran, “Collaboration of ai agents via cooperative multi-agent deep reinforcement learning,”arXiv preprint arXiv:1907.00327, 2019
Pith/arXiv arXiv 1907
-
[60]
Grid-wise control for multi-agent reinforcement learning in video game ai,
L. Han, P. Sun, Y . Du, J. Xiong, Q. Wang, X. Sun, H. Liu, and T. Zhang, “Grid-wise control for multi-agent reinforcement learning in video game ai,” inInternational Conference on Machine Learning. PMLR, 2019, pp. 2576–2585
2019
-
[61]
Tensor action spaces for multi-agent robot transfer learning,
D. Schwab, Y . Zhu, and M. Veloso, “Tensor action spaces for multi-agent robot transfer learning,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 5380–5386
2020
-
[62]
Scaling multi-agent reinforcement learning via state upsampling
L. Pimentel, R. Paleja, Z. Wang, E. Seraj, J. Pagan, and M. Gombolay, “Scaling multi-agent reinforcement learning via state upsampling.” San- dia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2022
2022
-
[63]
The star- craft multi-agent challenge,
M. Samvelyan, T. Rashid, C. S. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. Torr, J. N. Foerster, and S. Whiteson, “The star- craft multi-agent challenge,” inInternational Conference on Autonomous Agents and Multiagent Systems, 2019
2019
-
[64]
Heterogeneous drone fleet for radiological inspection,
A. Vale, R. Ventura, J. Corisco, N. Catarino, N. Veiga, and S. Sargento, “Heterogeneous drone fleet for radiological inspection,” inUnmanned Aerial Vehicles Applications: Challenges and Trends. Springer, 2023, pp. 127–168
2023
-
[65]
The surprising effectiveness of ppo in cooperative, multi-agent games,
C. Yu, A. Velu, E. Vinitsky, Y . Wang, A. Bayen, and Y . Wu, “The surprising effectiveness of ppo in cooperative, multi-agent games,”arXiv preprint arXiv:2103.01955, 2021
arXiv 2021
-
[66]
Multi- agent actor-critic for mixed cooperative-competitive environments,
R. Lowe, Y . Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi- agent actor-critic for mixed cooperative-competitive environments,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6382–6393
2017
-
[67]
McGraw Hill, 2007
Proakis,Digital Communications 5th Edition. McGraw Hill, 2007
2007
-
[68]
Otung,Noise in Communication Systems, 2021, pp
I. Otung,Noise in Communication Systems, 2021, pp. 431–472
2021
-
[69]
J. A. Connelly,Low-Noise Electronic System Design, 1st ed. USA: John Wiley & Sons, Inc., 1993
1993
-
[70]
Firecommander: An interactive, probabilistic multi-agent environment for joint perception-action tasks,
E. Seraj, X. Wu, and M. Gombolay, “Firecommander: An interactive, probabilistic multi-agent environment for joint perception-action tasks,” arXiv e-prints, pp. arXiv–2011, 2020
2011
-
[71]
M. A. Finney,FARSITE, Fire Area Simulator–model development and evaluation. US Department of Agriculture, Forest Service, Rocky Mountain Research Station, 1998, no. 4
1998
-
[72]
Learning efficient diverse communication for cooper- ative heterogeneous teaming,
E. Seraj, Z. Wang, R. Paleja, D. Martin, M. Sklar, A. Patel, and M. Gombolay, “Learning efficient diverse communication for cooper- ative heterogeneous teaming,” inProceedings of the 21st international conference on autonomous agents and multiagent systems, 2022, pp. 1173–1182
2022
-
[73]
The robotarium: Globally impactful opportunities, challenges, and lessons learned in remote-access, distributed control of multirobot systems,
S. Wilson, P. Glotfelter, L. Wang, S. Mayya, G. Notomista, M. L. Mote, and M. Egerstedt, “The robotarium: Globally impactful opportunities, challenges, and lessons learned in remote-access, distributed control of multirobot systems,”IEEE Control Systems, vol. 40, pp. 26– 44, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID: 210695356
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.