Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

Rohan Paleja; Wei Sheng

arxiv: 2605.15400 · v1 · pith:6QMA4A2Cnew · submitted 2026-05-14 · 💻 cs.AI

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

Wei Sheng , Rohan Paleja This is my paper

Pith reviewed 2026-05-19 15:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords zero-shot coordinationhuman-machine teaminginfluence shapingmulti-agent systemsOvercooked-AIteam coordinationAI collaboration

0 comments

The pith

Influence-Based Team Steering lets AI discover and guide toward effective coordination patterns instead of relying only on varied simulated partners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Influence-Based Team Steering to help AI agents coordinate with new human partners without needing data from those exact people or team sizes. It claims that generating many different partner simulations alone falls short once teams grow or communication weakens. The method instead shapes influence to push agents toward discovering and sticking with strong joint action patterns. This matters because it could cut the cost of collecting human interaction data for every new setup. Tests in a cooking game with simulations, varied partner styles, and real humans back the performance gains over prior approaches.

Core claim

The central claim is that Influence-Based Team Steering uses influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes. This remedies the insufficiency of partner coverage alone in zero-shot coordination as settings scale and communication degrades. The framework is evaluated in Overcooked-AI across two-agent and three-agent cases with simulated partners, synthetic variations, and a 30-subject study with two real humans and one machine, where it improves team performance against competing baselines.

What carries the argument

Influence-Based Team Steering, a framework that applies influence shaping to discover diverse high-performing coordination modes and steer agent trajectories toward stronger modes.

If this is right

IBTS produces higher team scores than baselines when paired with simulated partners that vary in style.
The performance gains extend to three-agent teams, showing the learned coordination transfers past simple pairs.
Real human-machine teams with two people and one AI teammate reach better results under IBTS than under prior zero-shot methods.
The approach works across synthetic changes in how partners behave, not just fixed simulations.
Scaled zero-shot coordination benefits from adding coordination mechanisms to partner variety rather than using variety alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same steering idea could let agents adjust mid-task when a new human joins without restarting training.
Applying the method to other collaborative tasks like navigation or assembly might show whether influence shaping generalizes beyond the tested game.
If the discovered modes prove stable, teams could maintain performance even when one member changes unexpectedly.

Load-bearing premise

The premise that influence shaping can reliably discover and direct agents to high-performing coordination modes that transfer from simulations to real humans and from pairs to three-agent teams.

What would settle it

A replication human study in the same game setup where teams using Influence-Based Team Steering show no higher scores than teams trained only on partner diversity would indicate the steering and transfer do not hold.

Figures

Figures reproduced from arXiv: 2605.15400 by Rohan Paleja, Wei Sheng.

**Figure 2.** Figure 2: Overcooked-AI layouts used in the human study. We evaluate 2-agent and 3-agent settings [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: In-distribution simulated evaluation across 2-agent and 3-agent Overcooked layouts. Bars [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Synthetic LLM partner-style evaluation. Each panel corresponds to one partner personality. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 2.** Figure 2: The layout order was fixed as FC, PL, and AA, while the machine partner order was [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 5.** Figure 5: Human-study scores across one-human–one-machine and two-human–one-machine settings. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity of MAPPO+IS to the event horizon K on PL-3 and AA-3. Bars show mean final episode return over 12 seeds, and error bars show ±1 standard deviation. We ablate the event horizon K in the event-level influence label while holding the MAPPO+IS pipeline fixed. We compare K ∈ {1, 4, 7} on the 3-agent Pipeline (PL-3) and Asymmetric Advantages (AA-3) layouts, ranging from immediate next-step labeling t… view at source ↗

**Figure 7.** Figure 7: Case study showing that standard cooperative MARL baselines can struggle to discover [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Layout overview for the 2-agent, 3-agent, and 4-agent settings. In the 2-agent setting, the [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Post-game questionnaire ratings averaged over fluency, trust, satisfaction, and work balance. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Team-effectiveness questionnaire used in the human study. Items measure perceived [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: Personality questionnaire used in the human study. These responses are collected as [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 12.** Figure 12: NASA-TLX workload questionnaire used to measure participants’ perceived workload [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

read the original abstract

While AI agents are rapidly advancing from isolated tools to interactive collaborators, data-driven human-machine teaming (HMT) methods remain costly in their reliance on human interaction data across domains, teammates, and team sizes. Zero-shot coordination (ZSC) addresses this bottleneck by simulating diverse partner populations to approximate how unseen partners might behave. However, partner coverage alone is insufficient as team settings scale and communication becomes degraded. To remedy this deficiency, we propose Influence-Based Team Steering (IBTS), a framework that uses influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes. We assess IBTS on Overcooked-AI in both two-agent and three-agent settings, allowing us to test whether learned coordination structure transfers beyond dyadic interaction. Our evaluation includes simulated partners, synthetic partner-style variation, and, to our knowledge, the first 30-subject Overcooked-AI HMT study involving two real human teammates and one machine teammate. Across these evaluations, IBTS improves team performance against competing baselines, highlighting the need for scaled ZSC to combine sparse-reward coordination mechanisms with partner-variation coverage rather than relying on diversity alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IBTS adds influence shaping on top of partner diversity for zero-shot teaming and reports gains in a 30-subject Overcooked human study, but the evidence does not yet isolate what the shaping actually contributes.

read the letter

The main takeaway is that this paper takes standard zero-shot coordination, which relies on partner diversity in simulation, and layers on influence shaping to push agents toward better coordination modes while the team is interacting. They evaluate the resulting IBTS framework in Overcooked-AI, moving from two-agent to three-agent teams and including what they call the first human-machine study with two real humans and one AI agent across 30 subjects. Performance looks better than the baselines they compare against in simulation, synthetic variations, and the human trials.

Referee Report

2 major / 2 minor

Summary. The paper claims that Influence-Based Team Steering (IBTS) improves upon zero-shot coordination by using influence shaping to discover diverse high-performing team interaction patterns and steer trajectories toward them. This is evaluated in Overcooked-AI for two- and three-agent teams with simulated partners, synthetic variations, and a novel 30-subject human-machine teaming study involving two humans and one AI agent, demonstrating performance gains over competing baselines.

Significance. If the results hold, IBTS could advance scalable human-AI collaboration by addressing limitations of partner diversity alone in ZSC, particularly for multi-agent teams and real human partners. The 30-subject study with two humans and one machine is a clear strength as the first such evaluation in Overcooked-AI, providing empirical grounding for transfer claims from simulation to live interaction.

major comments (2)

[Human Study Evaluation] Human Study Evaluation: The 30-subject Overcooked-AI HMT study reports aggregate performance gains for IBTS over baselines, but without ablations removing the steering component, trajectory-level analysis, or mode clustering, it is unclear whether gains arise from the influence mechanism or from stronger base policies and partner diversity alone. This directly affects the central claim that influence shaping enables reliable discovery and steering that transfers to unseen real humans and three-agent teams.
[Framework Description (§3)] Framework Description (§3): The influence shaping mechanism is presented as incentivizing discovery of high-performing coordination modes, but the manuscript provides insufficient formalization of the influence function or explicit tests for generalization from simulated dyadic partners to real humans in three-agent settings, leaving the transfer step under-supported.

minor comments (2)

[Abstract] Abstract: The claim of improvements 'across these evaluations' would benefit from brief mention of key metrics or number of trials to allow readers to assess the strength of the reported gains.
[Notation and Figures] Notation and Figures: Ensure consistent use of symbols for influence terms across sections and that all result figures include error bars or statistical significance markers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the novelty of the 30-subject human-machine teaming study as well as the broader potential of the IBTS framework. We address each major comment below and outline the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: [Human Study Evaluation] Human Study Evaluation: The 30-subject Overcooked-AI HMT study reports aggregate performance gains for IBTS over baselines, but without ablations removing the steering component, trajectory-level analysis, or mode clustering, it is unclear whether gains arise from the influence mechanism or from stronger base policies and partner diversity alone. This directly affects the central claim that influence shaping enables reliable discovery and steering that transfers to unseen real humans and three-agent teams.

Authors: We agree that additional analyses isolating the contribution of influence shaping would provide clearer support for the central claims. In the revised manuscript we will add an ablation that removes the steering component while retaining the partner-diversity training and base policies. We will also include trajectory-level performance breakdowns and clustering of coordination modes to show that performance improvements arise specifically from steering toward higher-performing patterns discovered via influence shaping. These additions will more directly substantiate transfer to real human partners and three-agent teams. revision: yes
Referee: [Framework Description (§3)] Framework Description (§3): The influence shaping mechanism is presented as incentivizing discovery of high-performing coordination modes, but the manuscript provides insufficient formalization of the influence function or explicit tests for generalization from simulated dyadic partners to real humans in three-agent settings, leaving the transfer step under-supported.

Authors: We will expand Section 3 with a more precise mathematical definition of the influence function, including its exact formulation and the mechanism by which it incentivizes discovery of diverse high-performing modes. While the current evaluations already cover simulated dyadic partners, synthetic variations, and real-human three-agent teams, we will add an explicit subsection discussing the generalization pathway and any supporting analysis or visualizations that illustrate how coordination structures transfer beyond the training distribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity in IBTS derivation or claims.

full rationale

The paper proposes a new Influence-Based Team Steering framework that combines influence shaping with partner diversity for zero-shot coordination. Central claims rest on empirical results from simulated partners, synthetic variations, and a 30-subject human Overcooked-AI study rather than any self-definitional reduction, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps in the abstract or described evaluation chain collapse to prior inputs by construction; the method introduces independent steering logic evaluated against baselines. This is the expected non-finding for a framework paper whose value is in the proposed mechanism and transfer experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters or axioms; influence shaping appears as a core mechanism but its implementation assumptions are not stated.

pith-pipeline@v0.9.0 · 5741 in / 958 out tokens · 29619 ms · 2026-05-19T15:22:53.902711+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rinf_i,t := 1/(n-1) sum max(q_i->j(y=1|o_t,a_i) - omega_j(y=1|o_t),0)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

[1]

An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

Christopher Amato. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

work page arXiv 2024
[2]

Mind the gaps: How ai shortcomings and human concerns may disrupt team cognition in human-ai teams (hats)

Rhea Basappa, Caitlin Lancaster, Rohit Mallick, Christopher Flathmann, and Nathan McNeese. Mind the gaps: How ai shortcomings and human concerns may disrupt team cognition in human-ai teams (hats). InProceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 69, pages 354–359. SAGE Publications Sage CA: Los Angeles, CA, 2025

work page 2025
[3]

Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008

Andrea Bauer, Dirk Wollherr, and Martin Buss. Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008

work page 2008
[4]

The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27: 819–840, 2002

Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27: 819–840, 2002

work page 2002
[5]

On the utility of learning about humans for human-ai coordination.Advances in Neural Information Processing Systems, 32, 2019

Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. On the utility of learning about humans for human-ai coordination.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[6]

Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning

Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning. InInternational Conference on Neural Information Processing, pages 395–402. Springer, 2020

work page 2020
[7]

Diversity is not all you need: Training a robust cooperative agent needs specialist partners.Advances in Neural Information Processing Systems, 37:56401–56423, 2024

Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Diversity is not all you need: Training a robust cooperative agent needs specialist partners.Advances in Neural Information Processing Systems, 37:56401–56423, 2024

work page 2024
[8]

On the importance of environments in human-robot coordination

Matthew Fontaine, Ya-Chuan Hsu, Yulun Zhang, Bryon Tjanaka, and Stefanos Nikolaidis. On the importance of environments in human-robot coordination. InProceedings of Robotics: Science and Systems, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.038

work page doi:10.15607/rss.2021.xvii.038 2021
[9]

Overcookedv2: Rethinking overcooked for zero-shot coordination

Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, and Jakob Nicolaus Foerster. Overcookedv2: Rethinking overcooked for zero-shot coordination. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[10]

Evaluating fluency in human–robot collaboration.IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019

Guy Hoffman. Evaluating fluency in human–robot collaboration.IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019

work page 2019
[11]

Learning to influence human behavior with offline reinforcement learning.Advances in Neural Information Processing Systems, 36:36094–36105, 2023

Joey Hong, Sergey Levine, and Anca Dragan. Learning to influence human behavior with offline reinforcement learning.Advances in Neural Information Processing Systems, 36:36094–36105, 2023

work page 2023
[12]

Other-Play

Hengyuan Hu, Adam Lerer, Alex Peysakhovich, and Jakob Foerster. “Other-Play” for zero-shot coordination. InInternational Conference on Machine Learning, pages 4399–4410. PMLR, 2020

work page 2020
[13]

Population Based Training of Neural Networks

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks.arXiv preprint arXiv:1711.09846, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Social influence as intrinsic motivation for multi-agent deep reinforcement learning

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 3040–3049. PMLR, 2019

work page 2019
[15]

Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026

Lora Kolodny. Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026. CNBC

work page 2026
[16]

Dimosthenis Kontogiorgos and Hannah R. M. Pelikan. Towards adaptive and least-collaborative- effort social robots. InCompanion of the 2020 ACM/IEEE International Conference on Human- Robot Interaction, pages 311–313, 2020. 10

work page 2020
[17]

Trust region policy optimisation in multi-agent reinforcement learning

Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations, 2022

work page 2022
[18]

Google research football: A novel reinforcement learning environment

Karol Kurach, Anton Raichuk, Piotr Sta´nczyk, Michał Zaj ˛ ac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, et al. Google research football: A novel reinforcement learning environment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510, 2020

work page 2020
[19]

Sycara, and Simon Stepputtis

Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Charles Michael Lewis, Katia P. Sycara, and Simon Stepputtis. Adaptively coordinating with novel partners via learned latent strategies. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[20]

Jiahui Li, Kun Kuang, Baoxiang Wang, Xingchen Li, Fei Wu, Jun Xiao, and Long Chen. Two heads are better than one: A simple exploration framework for efficient multi-agent reinforcement learning.Advances in neural information processing systems, 36:20038–20053, 2023

work page 2023
[21]

Learning to cooperate with humans using generative agents.Advances in Neural Information Processing Systems, 37:60061–60087, 2024

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S Du, and Natasha Jaques. Learning to cooperate with humans using generative agents.Advances in Neural Information Processing Systems, 37:60061–60087, 2024

work page 2024
[22]

Cooperative exploration for multi-agent deep reinforcement learning

Iou-Jen Liu, Unnat Jain, Raymond A Yeh, and Alexander Schwing. Cooperative exploration for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 6826–6836. PMLR, 2021

work page 2021
[23]

Heterogeneous skill learning for multi-agent tasks.Advances in neural information processing systems, 35:37011–37023, 2022

Yuntao Liu, Yuan Li, Xinhai Xu, Yong Dou, and Donghong Liu. Heterogeneous skill learning for multi-agent tasks.Advances in neural information processing systems, 35:37011–37023, 2022

work page 2022
[24]

Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems, 30, 2017

Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems, 30, 2017

work page 2017
[25]

Trajectory diversity for zero-shot coordination

Andrei Lupu, Brandon Cui, Hengyuan Hu, and Jakob Foerster. Trajectory diversity for zero-shot coordination. InInternational Conference on Machine Learning, pages 7204–7213. PMLR, 2021

work page 2021
[26]

A five-factor theory of personality.Handbook of personality: Theory and research, 2(1999):139–153, 1999

Robert R McCrae and Paul T Costa Jr. A five-factor theory of personality.Handbook of personality: Theory and research, 2(1999):139–153, 1999

work page 1999
[27]

A survey of robot learning strategies for human-robot collaboration in industrial settings.Robotics and Computer-Integrated Manufacturing, 73:102231, 2022

Debasmita Mukherjee, Kashish Gupta, Li Hsin Chang, and Homayoun Najjaran. A survey of robot learning strategies for human-robot collaboration in industrial settings.Robotics and Computer-Integrated Manufacturing, 73:102231, 2022

work page 2022
[28]

Personality-driven decision making in llm-based au- tonomous agents

Lewis Newsham and Daniel Prince. Personality-driven decision making in llm-based au- tonomous agents. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 1538–1547, Detroit, MI, USA, 2025. International Foundation for Autonomous Agents and Multiagent Systems

work page 2025
[29]

Sycara, and Woojun Kim

Andrew Ni, Simon Stepputtis, Stefanos Nikolaidis, Michael Lewis, Katia P. Sycara, and Woojun Kim. Theory of mind guided strategy adaptation for zero-shot coordination. InProceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2026

work page 2026
[30]

Investigating the impact of trust in multi-human multi-robot task allocation

Ike Obi, Ruiqi Wang, Wonse Jo, and Byung-Cheol Min. Investigating the impact of trust in multi-human multi-robot task allocation. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025

work page 2025
[31]

Springer, 2016

Frans A Oliehoek, Christopher Amato, et al.A concise introduction to decentralized POMDPs, volume 1. Springer, 2016

work page 2016
[32]

Human–autonomy teaming: A review and analysis of the empirical literature.Human factors, 64(5):904–938, 2022

Thomas O’neill, Nathan McNeese, Amy Barron, and Beau Schelble. Human–autonomy teaming: A review and analysis of the empirical literature.Human factors, 64(5):904–938, 2022. 11

work page 2022
[33]

The utility of explainable ai in ad hoc human-machine teaming.Advances in neural information processing systems, 34:610–623, 2021

Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka Arachchige, Reed Jensen, and Matthew Gombolay. The utility of explainable ai in ad hoc human-machine teaming.Advances in neural information processing systems, 34:610–623, 2021

work page 2021
[34]

Designs for enabling collaboration in human-machine teaming via interactive and explainable systems.Advances in Neural Information Processing Systems, 37:64942–64969, 2024

Rohan Paleja, Michael Munje, Kimberlee C Chang, Reed Jensen, and Mathew Gombolay. Designs for enabling collaboration in human-machine teaming via interactive and explainable systems.Advances in Neural Information Processing Systems, 37:64942–64969, 2024

work page 2024
[35]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019

work page arXiv 1902
[36]

Diverse conventions for human-AI collaboration

Bidipta Sarkar, Andy Shih, and Dorsa Sadigh. Diverse conventions for human-AI collaboration. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[37]

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[38]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

An extensible, data-oriented architecture for high-performance, many-world simulation.ACM Transactions on Graphics (TOG), 42(4):1–13, 2023

Brennan Shacklett, Luc Guy Rosenzweig, Zhiqiang Xie, Bidipta Sarkar, Andrew Szot, Erik Wijmans, Vladlen Koltun, Dhruv Batra, and Kayvon Fatahalian. An extensible, data-oriented architecture for high-performance, many-world simulation.ACM Transactions on Graphics (TOG), 42(4):1–13, 2023

work page 2023
[40]

Evaluation of human-AI teams for learned and rule-based agents in hanabi

Ho Chit Siu, Jaime Daniel Pena, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimber- lee Chestnut Chang, and Ross Emerson Allen. Evaluation of human-AI teams for learned and rule-based agents in hanabi. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021

work page 2021
[41]

Generalized behavior learning from diverse demonstrations

Varshith Sreeramdass, Rohan R Paleja, Letian Chen, Sanne van Waveren, and Matthew Gombo- lay. Generalized behavior learning from diverse demonstrations. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[42]

Ad hoc autonomous agent teams: Collaboration without pre-coordination

Peter Stone, Gal Kaminka, Sarit Kraus, and Jeffrey Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. InProceedings of the AAAI conference on artificial intelligence, volume 24, pages 1504–1509, 2010

work page 2010
[43]

Collaborating with humans without human data.Advances in Neural Information Processing Systems, 34: 14502–14515, 2021

DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, and Richard Everett. Collaborating with humans without human data.Advances in Neural Information Processing Systems, 34: 14502–14515, 2021

work page 2021
[44]

MIT press, 2009

Michael Tomasello.Why we cooperate. MIT press, 2009

work page 2009
[45]

Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019

Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Jun- young Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019

work page 2019
[46]

Beyond single stationary policies: Meta-task players as naturally superior collaborators

Haoming Wang, Zhaoming Tian, Yunpeng Song, Xiangliang Zhang, and Zhongmin Cai. Beyond single stationary policies: Meta-task players as naturally superior collaborators. InAdvances in Neural Information Processing Systems, volume 37, pages 78836–78862, 2024

work page 2024
[47]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Ruiqi Wang, Dezhong Zhao, Dayoon Suh, Ziqin Yuan, Guohua Chen, and Byung-Cheol Min. Personalization in human-robot interaction through preference-based action representation learning. InProceedings of the IEEE International Conference on Robotics and Automation, pages 7377–7384, 2025. doi: 10.1109/ICRA55743.2025.11128756

work page doi:10.1109/icra55743.2025.11128756 2025
[48]

Roma: Multi-agent reinforce- ment learning with emergent roles

Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. Roma: Multi-agent reinforce- ment learning with emergent roles. InProceedings of the 37th International Conference on Machine Learning, ICML, pages 9876–9886. PMLR, 2020

work page 2020
[49]

Influence-based multi-agent exploration

Tonghan Wang, Jianhao Wang, Yi Wu, and Chongjie Zhang. Influence-based multi-agent exploration. InInternational Conference on Learning Representations, 2020. 12

work page 2020
[50]

Zsc-eval: An evaluation toolkit and benchmark for multi-agent zero-shot coordination.Advances in Neural Information Processing Systems, 37:47344–47377, 2024

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang. Zsc-eval: An evaluation toolkit and benchmark for multi-agent zero-shot coordination.Advances in Neural Information Processing Systems, 37:47344–47377, 2024

work page 2024
[51]

Population-based diverse exploration for sparse-reward multi-agent tasks

Pei Xu, Junge Zhang, and Kaiqi Huang. Population-based diverse exploration for sparse-reward multi-agent tasks. InIJCAI, pages 283–291, 2024

work page 2024
[52]

The surprising effectiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624, 2022

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624, 2022

work page 2022
[53]

Learning to coordinate with anyone

Lei Yuan, Lihe Li, Ziqian Zhang, Feng Chen, Tianyi Zhang, Cong Guan, Yang Yu, and Zhi-Hua Zhou. Learning to coordinate with anyone. InProceedings of the Fifth International Conference on Distributed Artificial Intelligence, pages 1–9, 2023

work page 2023
[54]

Proagent: building proactive cooperative agents with large language models

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17591–17599, 2024

work page 2024
[55]

Maximum entropy population-based training for zero-shot human-ai coordination

Rui Zhao, Jinming Song, Yufeng Yuan, Haifeng Hu, Yang Gao, Yi Wu, Zhongqian Sun, and Wei Yang. Maximum entropy population-based training for zero-shot human-ai coordination. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6145–6153, 2023. 13 A Experimental Details and Hyperparameters A.1 Reward Shaping In all layouts, age...

work page 2023

[1] [1]

An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

Christopher Amato. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

work page arXiv 2024

[2] [2]

Mind the gaps: How ai shortcomings and human concerns may disrupt team cognition in human-ai teams (hats)

Rhea Basappa, Caitlin Lancaster, Rohit Mallick, Christopher Flathmann, and Nathan McNeese. Mind the gaps: How ai shortcomings and human concerns may disrupt team cognition in human-ai teams (hats). InProceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 69, pages 354–359. SAGE Publications Sage CA: Los Angeles, CA, 2025

work page 2025

[3] [3]

Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008

Andrea Bauer, Dirk Wollherr, and Martin Buss. Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008

work page 2008

[4] [4]

The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27: 819–840, 2002

Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27: 819–840, 2002

work page 2002

[5] [5]

On the utility of learning about humans for human-ai coordination.Advances in Neural Information Processing Systems, 32, 2019

Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. On the utility of learning about humans for human-ai coordination.Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[6] [6]

Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning

Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning. InInternational Conference on Neural Information Processing, pages 395–402. Springer, 2020

work page 2020

[7] [7]

Diversity is not all you need: Training a robust cooperative agent needs specialist partners.Advances in Neural Information Processing Systems, 37:56401–56423, 2024

Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Diversity is not all you need: Training a robust cooperative agent needs specialist partners.Advances in Neural Information Processing Systems, 37:56401–56423, 2024

work page 2024

[8] [8]

On the importance of environments in human-robot coordination

Matthew Fontaine, Ya-Chuan Hsu, Yulun Zhang, Bryon Tjanaka, and Stefanos Nikolaidis. On the importance of environments in human-robot coordination. InProceedings of Robotics: Science and Systems, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.038

work page doi:10.15607/rss.2021.xvii.038 2021

[9] [9]

Overcookedv2: Rethinking overcooked for zero-shot coordination

Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, and Jakob Nicolaus Foerster. Overcookedv2: Rethinking overcooked for zero-shot coordination. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[10] [10]

Evaluating fluency in human–robot collaboration.IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019

Guy Hoffman. Evaluating fluency in human–robot collaboration.IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019

work page 2019

[11] [11]

Learning to influence human behavior with offline reinforcement learning.Advances in Neural Information Processing Systems, 36:36094–36105, 2023

Joey Hong, Sergey Levine, and Anca Dragan. Learning to influence human behavior with offline reinforcement learning.Advances in Neural Information Processing Systems, 36:36094–36105, 2023

work page 2023

[12] [12]

Other-Play

Hengyuan Hu, Adam Lerer, Alex Peysakhovich, and Jakob Foerster. “Other-Play” for zero-shot coordination. InInternational Conference on Machine Learning, pages 4399–4410. PMLR, 2020

work page 2020

[13] [13]

Population Based Training of Neural Networks

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks.arXiv preprint arXiv:1711.09846, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Social influence as intrinsic motivation for multi-agent deep reinforcement learning

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 3040–3049. PMLR, 2019

work page 2019

[15] [15]

Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026

Lora Kolodny. Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026. CNBC

work page 2026

[16] [16]

Dimosthenis Kontogiorgos and Hannah R. M. Pelikan. Towards adaptive and least-collaborative- effort social robots. InCompanion of the 2020 ACM/IEEE International Conference on Human- Robot Interaction, pages 311–313, 2020. 10

work page 2020

[17] [17]

Trust region policy optimisation in multi-agent reinforcement learning

Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations, 2022

work page 2022

[18] [18]

Google research football: A novel reinforcement learning environment

Karol Kurach, Anton Raichuk, Piotr Sta´nczyk, Michał Zaj ˛ ac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, et al. Google research football: A novel reinforcement learning environment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510, 2020

work page 2020

[19] [19]

Sycara, and Simon Stepputtis

Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Charles Michael Lewis, Katia P. Sycara, and Simon Stepputtis. Adaptively coordinating with novel partners via learned latent strategies. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[20] [20]

Jiahui Li, Kun Kuang, Baoxiang Wang, Xingchen Li, Fei Wu, Jun Xiao, and Long Chen. Two heads are better than one: A simple exploration framework for efficient multi-agent reinforcement learning.Advances in neural information processing systems, 36:20038–20053, 2023

work page 2023

[21] [21]

Learning to cooperate with humans using generative agents.Advances in Neural Information Processing Systems, 37:60061–60087, 2024

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S Du, and Natasha Jaques. Learning to cooperate with humans using generative agents.Advances in Neural Information Processing Systems, 37:60061–60087, 2024

work page 2024

[22] [22]

Cooperative exploration for multi-agent deep reinforcement learning

Iou-Jen Liu, Unnat Jain, Raymond A Yeh, and Alexander Schwing. Cooperative exploration for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 6826–6836. PMLR, 2021

work page 2021

[23] [23]

Heterogeneous skill learning for multi-agent tasks.Advances in neural information processing systems, 35:37011–37023, 2022

Yuntao Liu, Yuan Li, Xinhai Xu, Yong Dou, and Donghong Liu. Heterogeneous skill learning for multi-agent tasks.Advances in neural information processing systems, 35:37011–37023, 2022

work page 2022

[24] [24]

Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems, 30, 2017

Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems, 30, 2017

work page 2017

[25] [25]

Trajectory diversity for zero-shot coordination

Andrei Lupu, Brandon Cui, Hengyuan Hu, and Jakob Foerster. Trajectory diversity for zero-shot coordination. InInternational Conference on Machine Learning, pages 7204–7213. PMLR, 2021

work page 2021

[26] [26]

A five-factor theory of personality.Handbook of personality: Theory and research, 2(1999):139–153, 1999

Robert R McCrae and Paul T Costa Jr. A five-factor theory of personality.Handbook of personality: Theory and research, 2(1999):139–153, 1999

work page 1999

[27] [27]

A survey of robot learning strategies for human-robot collaboration in industrial settings.Robotics and Computer-Integrated Manufacturing, 73:102231, 2022

Debasmita Mukherjee, Kashish Gupta, Li Hsin Chang, and Homayoun Najjaran. A survey of robot learning strategies for human-robot collaboration in industrial settings.Robotics and Computer-Integrated Manufacturing, 73:102231, 2022

work page 2022

[28] [28]

Personality-driven decision making in llm-based au- tonomous agents

Lewis Newsham and Daniel Prince. Personality-driven decision making in llm-based au- tonomous agents. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 1538–1547, Detroit, MI, USA, 2025. International Foundation for Autonomous Agents and Multiagent Systems

work page 2025

[29] [29]

Sycara, and Woojun Kim

Andrew Ni, Simon Stepputtis, Stefanos Nikolaidis, Michael Lewis, Katia P. Sycara, and Woojun Kim. Theory of mind guided strategy adaptation for zero-shot coordination. InProceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2026

work page 2026

[30] [30]

Investigating the impact of trust in multi-human multi-robot task allocation

Ike Obi, Ruiqi Wang, Wonse Jo, and Byung-Cheol Min. Investigating the impact of trust in multi-human multi-robot task allocation. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025

work page 2025

[31] [31]

Springer, 2016

Frans A Oliehoek, Christopher Amato, et al.A concise introduction to decentralized POMDPs, volume 1. Springer, 2016

work page 2016

[32] [32]

Human–autonomy teaming: A review and analysis of the empirical literature.Human factors, 64(5):904–938, 2022

Thomas O’neill, Nathan McNeese, Amy Barron, and Beau Schelble. Human–autonomy teaming: A review and analysis of the empirical literature.Human factors, 64(5):904–938, 2022. 11

work page 2022

[33] [33]

The utility of explainable ai in ad hoc human-machine teaming.Advances in neural information processing systems, 34:610–623, 2021

Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka Arachchige, Reed Jensen, and Matthew Gombolay. The utility of explainable ai in ad hoc human-machine teaming.Advances in neural information processing systems, 34:610–623, 2021

work page 2021

[34] [34]

Designs for enabling collaboration in human-machine teaming via interactive and explainable systems.Advances in Neural Information Processing Systems, 37:64942–64969, 2024

Rohan Paleja, Michael Munje, Kimberlee C Chang, Reed Jensen, and Mathew Gombolay. Designs for enabling collaboration in human-machine teaming via interactive and explainable systems.Advances in Neural Information Processing Systems, 37:64942–64969, 2024

work page 2024

[35] [35]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder De Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019

work page arXiv 1902

[36] [36]

Diverse conventions for human-AI collaboration

Bidipta Sarkar, Andy Shih, and Dorsa Sadigh. Diverse conventions for human-AI collaboration. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[37] [37]

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[38] [38]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

An extensible, data-oriented architecture for high-performance, many-world simulation.ACM Transactions on Graphics (TOG), 42(4):1–13, 2023

Brennan Shacklett, Luc Guy Rosenzweig, Zhiqiang Xie, Bidipta Sarkar, Andrew Szot, Erik Wijmans, Vladlen Koltun, Dhruv Batra, and Kayvon Fatahalian. An extensible, data-oriented architecture for high-performance, many-world simulation.ACM Transactions on Graphics (TOG), 42(4):1–13, 2023

work page 2023

[40] [40]

Evaluation of human-AI teams for learned and rule-based agents in hanabi

Ho Chit Siu, Jaime Daniel Pena, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimber- lee Chestnut Chang, and Ross Emerson Allen. Evaluation of human-AI teams for learned and rule-based agents in hanabi. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021

work page 2021

[41] [41]

Generalized behavior learning from diverse demonstrations

Varshith Sreeramdass, Rohan R Paleja, Letian Chen, Sanne van Waveren, and Matthew Gombo- lay. Generalized behavior learning from diverse demonstrations. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[42] [42]

Ad hoc autonomous agent teams: Collaboration without pre-coordination

Peter Stone, Gal Kaminka, Sarit Kraus, and Jeffrey Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. InProceedings of the AAAI conference on artificial intelligence, volume 24, pages 1504–1509, 2010

work page 2010

[43] [43]

Collaborating with humans without human data.Advances in Neural Information Processing Systems, 34: 14502–14515, 2021

DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, and Richard Everett. Collaborating with humans without human data.Advances in Neural Information Processing Systems, 34: 14502–14515, 2021

work page 2021

[44] [44]

MIT press, 2009

Michael Tomasello.Why we cooperate. MIT press, 2009

work page 2009

[45] [45]

Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019

Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Jun- young Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019

work page 2019

[46] [46]

Beyond single stationary policies: Meta-task players as naturally superior collaborators

Haoming Wang, Zhaoming Tian, Yunpeng Song, Xiangliang Zhang, and Zhongmin Cai. Beyond single stationary policies: Meta-task players as naturally superior collaborators. InAdvances in Neural Information Processing Systems, volume 37, pages 78836–78862, 2024

work page 2024

[47] [47]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp

Ruiqi Wang, Dezhong Zhao, Dayoon Suh, Ziqin Yuan, Guohua Chen, and Byung-Cheol Min. Personalization in human-robot interaction through preference-based action representation learning. InProceedings of the IEEE International Conference on Robotics and Automation, pages 7377–7384, 2025. doi: 10.1109/ICRA55743.2025.11128756

work page doi:10.1109/icra55743.2025.11128756 2025

[48] [48]

Roma: Multi-agent reinforce- ment learning with emergent roles

Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. Roma: Multi-agent reinforce- ment learning with emergent roles. InProceedings of the 37th International Conference on Machine Learning, ICML, pages 9876–9886. PMLR, 2020

work page 2020

[49] [49]

Influence-based multi-agent exploration

Tonghan Wang, Jianhao Wang, Yi Wu, and Chongjie Zhang. Influence-based multi-agent exploration. InInternational Conference on Learning Representations, 2020. 12

work page 2020

[50] [50]

Zsc-eval: An evaluation toolkit and benchmark for multi-agent zero-shot coordination.Advances in Neural Information Processing Systems, 37:47344–47377, 2024

Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang. Zsc-eval: An evaluation toolkit and benchmark for multi-agent zero-shot coordination.Advances in Neural Information Processing Systems, 37:47344–47377, 2024

work page 2024

[51] [51]

Population-based diverse exploration for sparse-reward multi-agent tasks

Pei Xu, Junge Zhang, and Kaiqi Huang. Population-based diverse exploration for sparse-reward multi-agent tasks. InIJCAI, pages 283–291, 2024

work page 2024

[52] [52]

The surprising effectiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624, 2022

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624, 2022

work page 2022

[53] [53]

Learning to coordinate with anyone

Lei Yuan, Lihe Li, Ziqian Zhang, Feng Chen, Tianyi Zhang, Cong Guan, Yang Yu, and Zhi-Hua Zhou. Learning to coordinate with anyone. InProceedings of the Fifth International Conference on Distributed Artificial Intelligence, pages 1–9, 2023

work page 2023

[54] [54]

Proagent: building proactive cooperative agents with large language models

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17591–17599, 2024

work page 2024

[55] [55]

Maximum entropy population-based training for zero-shot human-ai coordination

Rui Zhao, Jinming Song, Yufeng Yuan, Haifeng Hu, Yang Gao, Yi Wu, Zhongqian Sun, and Wei Yang. Maximum entropy population-based training for zero-shot human-ai coordination. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6145–6153, 2023. 13 A Experimental Details and Hyperparameters A.1 Reward Shaping In all layouts, age...

work page 2023