Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming
Pith reviewed 2026-05-19 15:22 UTC · model grok-4.3
The pith
Influence-Based Team Steering lets AI discover and guide toward effective coordination patterns instead of relying only on varied simulated partners.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Influence-Based Team Steering uses influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes. This remedies the insufficiency of partner coverage alone in zero-shot coordination as settings scale and communication degrades. The framework is evaluated in Overcooked-AI across two-agent and three-agent cases with simulated partners, synthetic variations, and a 30-subject study with two real humans and one machine, where it improves team performance against competing baselines.
What carries the argument
Influence-Based Team Steering, a framework that applies influence shaping to discover diverse high-performing coordination modes and steer agent trajectories toward stronger modes.
If this is right
- IBTS produces higher team scores than baselines when paired with simulated partners that vary in style.
- The performance gains extend to three-agent teams, showing the learned coordination transfers past simple pairs.
- Real human-machine teams with two people and one AI teammate reach better results under IBTS than under prior zero-shot methods.
- The approach works across synthetic changes in how partners behave, not just fixed simulations.
- Scaled zero-shot coordination benefits from adding coordination mechanisms to partner variety rather than using variety alone.
Where Pith is reading between the lines
- The same steering idea could let agents adjust mid-task when a new human joins without restarting training.
- Applying the method to other collaborative tasks like navigation or assembly might show whether influence shaping generalizes beyond the tested game.
- If the discovered modes prove stable, teams could maintain performance even when one member changes unexpectedly.
Load-bearing premise
The premise that influence shaping can reliably discover and direct agents to high-performing coordination modes that transfer from simulations to real humans and from pairs to three-agent teams.
What would settle it
A replication human study in the same game setup where teams using Influence-Based Team Steering show no higher scores than teams trained only on partner diversity would indicate the steering and transfer do not hold.
Figures
read the original abstract
While AI agents are rapidly advancing from isolated tools to interactive collaborators, data-driven human-machine teaming (HMT) methods remain costly in their reliance on human interaction data across domains, teammates, and team sizes. Zero-shot coordination (ZSC) addresses this bottleneck by simulating diverse partner populations to approximate how unseen partners might behave. However, partner coverage alone is insufficient as team settings scale and communication becomes degraded. To remedy this deficiency, we propose Influence-Based Team Steering (IBTS), a framework that uses influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes. We assess IBTS on Overcooked-AI in both two-agent and three-agent settings, allowing us to test whether learned coordination structure transfers beyond dyadic interaction. Our evaluation includes simulated partners, synthetic partner-style variation, and, to our knowledge, the first 30-subject Overcooked-AI HMT study involving two real human teammates and one machine teammate. Across these evaluations, IBTS improves team performance against competing baselines, highlighting the need for scaled ZSC to combine sparse-reward coordination mechanisms with partner-variation coverage rather than relying on diversity alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Influence-Based Team Steering (IBTS) improves upon zero-shot coordination by using influence shaping to discover diverse high-performing team interaction patterns and steer trajectories toward them. This is evaluated in Overcooked-AI for two- and three-agent teams with simulated partners, synthetic variations, and a novel 30-subject human-machine teaming study involving two humans and one AI agent, demonstrating performance gains over competing baselines.
Significance. If the results hold, IBTS could advance scalable human-AI collaboration by addressing limitations of partner diversity alone in ZSC, particularly for multi-agent teams and real human partners. The 30-subject study with two humans and one machine is a clear strength as the first such evaluation in Overcooked-AI, providing empirical grounding for transfer claims from simulation to live interaction.
major comments (2)
- [Human Study Evaluation] Human Study Evaluation: The 30-subject Overcooked-AI HMT study reports aggregate performance gains for IBTS over baselines, but without ablations removing the steering component, trajectory-level analysis, or mode clustering, it is unclear whether gains arise from the influence mechanism or from stronger base policies and partner diversity alone. This directly affects the central claim that influence shaping enables reliable discovery and steering that transfers to unseen real humans and three-agent teams.
- [Framework Description (§3)] Framework Description (§3): The influence shaping mechanism is presented as incentivizing discovery of high-performing coordination modes, but the manuscript provides insufficient formalization of the influence function or explicit tests for generalization from simulated dyadic partners to real humans in three-agent settings, leaving the transfer step under-supported.
minor comments (2)
- [Abstract] Abstract: The claim of improvements 'across these evaluations' would benefit from brief mention of key metrics or number of trials to allow readers to assess the strength of the reported gains.
- [Notation and Figures] Notation and Figures: Ensure consistent use of symbols for influence terms across sections and that all result figures include error bars or statistical significance markers.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the novelty of the 30-subject human-machine teaming study as well as the broader potential of the IBTS framework. We address each major comment below and outline the revisions that will be incorporated to strengthen the manuscript.
read point-by-point responses
-
Referee: [Human Study Evaluation] Human Study Evaluation: The 30-subject Overcooked-AI HMT study reports aggregate performance gains for IBTS over baselines, but without ablations removing the steering component, trajectory-level analysis, or mode clustering, it is unclear whether gains arise from the influence mechanism or from stronger base policies and partner diversity alone. This directly affects the central claim that influence shaping enables reliable discovery and steering that transfers to unseen real humans and three-agent teams.
Authors: We agree that additional analyses isolating the contribution of influence shaping would provide clearer support for the central claims. In the revised manuscript we will add an ablation that removes the steering component while retaining the partner-diversity training and base policies. We will also include trajectory-level performance breakdowns and clustering of coordination modes to show that performance improvements arise specifically from steering toward higher-performing patterns discovered via influence shaping. These additions will more directly substantiate transfer to real human partners and three-agent teams. revision: yes
-
Referee: [Framework Description (§3)] Framework Description (§3): The influence shaping mechanism is presented as incentivizing discovery of high-performing coordination modes, but the manuscript provides insufficient formalization of the influence function or explicit tests for generalization from simulated dyadic partners to real humans in three-agent settings, leaving the transfer step under-supported.
Authors: We will expand Section 3 with a more precise mathematical definition of the influence function, including its exact formulation and the mechanism by which it incentivizes discovery of diverse high-performing modes. While the current evaluations already cover simulated dyadic partners, synthetic variations, and real-human three-agent teams, we will add an explicit subsection discussing the generalization pathway and any supporting analysis or visualizations that illustrate how coordination structures transfer beyond the training distribution. revision: yes
Circularity Check
No significant circularity in IBTS derivation or claims.
full rationale
The paper proposes a new Influence-Based Team Steering framework that combines influence shaping with partner diversity for zero-shot coordination. Central claims rest on empirical results from simulated partners, synthetic variations, and a 30-subject human Overcooked-AI study rather than any self-definitional reduction, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps in the abstract or described evaluation chain collapse to prior inputs by construction; the method introduces independent steering logic evaluated against baselines. This is the expected non-finding for a framework paper whose value is in the proposed mechanism and transfer experiments.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
influence shaping to incentivize agents to discover diverse, high-performing team interaction patterns and further steers ongoing trajectories toward stronger learned coordination modes
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
rinf_i,t := 1/(n-1) sum max(q_i->j(y=1|o_t,a_i) - omega_j(y=1|o_t),0)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Christopher Amato. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024
-
[2]
Rhea Basappa, Caitlin Lancaster, Rohit Mallick, Christopher Flathmann, and Nathan McNeese. Mind the gaps: How ai shortcomings and human concerns may disrupt team cognition in human-ai teams (hats). InProceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 69, pages 354–359. SAGE Publications Sage CA: Los Angeles, CA, 2025
work page 2025
-
[3]
Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008
Andrea Bauer, Dirk Wollherr, and Martin Buss. Human–robot collaboration: a survey.Interna- tional Journal of Humanoid Robotics, 5(01):47–66, 2008
work page 2008
-
[4]
Daniel S Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of markov decision processes.Mathematics of operations research, 27: 819–840, 2002
work page 2002
-
[5]
Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. On the utility of learning about humans for human-ai coordination.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[6]
Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning
Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning. InInternational Conference on Neural Information Processing, pages 395–402. Springer, 2020
work page 2020
-
[7]
Rujikorn Charakorn, Poramate Manoonpong, and Nat Dilokthanakul. Diversity is not all you need: Training a robust cooperative agent needs specialist partners.Advances in Neural Information Processing Systems, 37:56401–56423, 2024
work page 2024
-
[8]
On the importance of environments in human-robot coordination
Matthew Fontaine, Ya-Chuan Hsu, Yulun Zhang, Bryon Tjanaka, and Stefanos Nikolaidis. On the importance of environments in human-robot coordination. InProceedings of Robotics: Science and Systems, Virtual, July 2021. doi: 10.15607/RSS.2021.XVII.038
-
[9]
Overcookedv2: Rethinking overcooked for zero-shot coordination
Tobias Gessler, Tin Dizdarevic, Ani Calinescu, Benjamin Ellis, Andrei Lupu, and Jakob Nicolaus Foerster. Overcookedv2: Rethinking overcooked for zero-shot coordination. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[10]
Guy Hoffman. Evaluating fluency in human–robot collaboration.IEEE Transactions on Human-Machine Systems, 49(3):209–218, 2019
work page 2019
-
[11]
Joey Hong, Sergey Levine, and Anca Dragan. Learning to influence human behavior with offline reinforcement learning.Advances in Neural Information Processing Systems, 36:36094–36105, 2023
work page 2023
-
[12]
Hengyuan Hu, Adam Lerer, Alex Peysakhovich, and Jakob Foerster. “Other-Play” for zero-shot coordination. InInternational Conference on Machine Learning, pages 4399–4410. PMLR, 2020
work page 2020
-
[13]
Population Based Training of Neural Networks
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neural networks.arXiv preprint arXiv:1711.09846, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Social influence as intrinsic motivation for multi-agent deep reinforcement learning
Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 3040–3049. PMLR, 2019
work page 2019
-
[15]
Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026
Lora Kolodny. Apptronik raises $520 million to beat chinese humanoids, tesla optimus to market, February 2026. CNBC
work page 2026
-
[16]
Dimosthenis Kontogiorgos and Hannah R. M. Pelikan. Towards adaptive and least-collaborative- effort social robots. InCompanion of the 2020 ACM/IEEE International Conference on Human- Robot Interaction, pages 311–313, 2020. 10
work page 2020
-
[17]
Trust region policy optimisation in multi-agent reinforcement learning
Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. Trust region policy optimisation in multi-agent reinforcement learning. In International Conference on Learning Representations, 2022
work page 2022
-
[18]
Google research football: A novel reinforcement learning environment
Karol Kurach, Anton Raichuk, Piotr Sta´nczyk, Michał Zaj ˛ ac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, et al. Google research football: A novel reinforcement learning environment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4501–4510, 2020
work page 2020
-
[19]
Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Charles Michael Lewis, Katia P. Sycara, and Simon Stepputtis. Adaptively coordinating with novel partners via learned latent strategies. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[20]
Jiahui Li, Kun Kuang, Baoxiang Wang, Xingchen Li, Fei Wu, Jun Xiao, and Long Chen. Two heads are better than one: A simple exploration framework for efficient multi-agent reinforcement learning.Advances in neural information processing systems, 36:20038–20053, 2023
work page 2023
-
[21]
Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S Du, and Natasha Jaques. Learning to cooperate with humans using generative agents.Advances in Neural Information Processing Systems, 37:60061–60087, 2024
work page 2024
-
[22]
Cooperative exploration for multi-agent deep reinforcement learning
Iou-Jen Liu, Unnat Jain, Raymond A Yeh, and Alexander Schwing. Cooperative exploration for multi-agent deep reinforcement learning. InInternational conference on machine learning, pages 6826–6836. PMLR, 2021
work page 2021
-
[23]
Yuntao Liu, Yuan Li, Xinhai Xu, Yong Dou, and Donghong Liu. Heterogeneous skill learning for multi-agent tasks.Advances in neural information processing systems, 35:37011–37023, 2022
work page 2022
-
[24]
Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments.Advances in neural information processing systems, 30, 2017
work page 2017
-
[25]
Trajectory diversity for zero-shot coordination
Andrei Lupu, Brandon Cui, Hengyuan Hu, and Jakob Foerster. Trajectory diversity for zero-shot coordination. InInternational Conference on Machine Learning, pages 7204–7213. PMLR, 2021
work page 2021
-
[26]
Robert R McCrae and Paul T Costa Jr. A five-factor theory of personality.Handbook of personality: Theory and research, 2(1999):139–153, 1999
work page 1999
-
[27]
Debasmita Mukherjee, Kashish Gupta, Li Hsin Chang, and Homayoun Najjaran. A survey of robot learning strategies for human-robot collaboration in industrial settings.Robotics and Computer-Integrated Manufacturing, 73:102231, 2022
work page 2022
-
[28]
Personality-driven decision making in llm-based au- tonomous agents
Lewis Newsham and Daniel Prince. Personality-driven decision making in llm-based au- tonomous agents. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 1538–1547, Detroit, MI, USA, 2025. International Foundation for Autonomous Agents and Multiagent Systems
work page 2025
-
[29]
Andrew Ni, Simon Stepputtis, Stefanos Nikolaidis, Michael Lewis, Katia P. Sycara, and Woojun Kim. Theory of mind guided strategy adaptation for zero-shot coordination. InProceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2026
work page 2026
-
[30]
Investigating the impact of trust in multi-human multi-robot task allocation
Ike Obi, Ruiqi Wang, Wonse Jo, and Byung-Cheol Min. Investigating the impact of trust in multi-human multi-robot task allocation. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025
work page 2025
-
[31]
Frans A Oliehoek, Christopher Amato, et al.A concise introduction to decentralized POMDPs, volume 1. Springer, 2016
work page 2016
-
[32]
Thomas O’neill, Nathan McNeese, Amy Barron, and Beau Schelble. Human–autonomy teaming: A review and analysis of the empirical literature.Human factors, 64(5):904–938, 2022. 11
work page 2022
-
[33]
Rohan Paleja, Muyleng Ghuy, Nadun Ranawaka Arachchige, Reed Jensen, and Matthew Gombolay. The utility of explainable ai in ad hoc human-machine teaming.Advances in neural information processing systems, 34:610–623, 2021
work page 2021
-
[34]
Rohan Paleja, Michael Munje, Kimberlee C Chang, Reed Jensen, and Mathew Gombolay. Designs for enabling collaboration in human-machine teaming via interactive and explainable systems.Advances in Neural Information Processing Systems, 37:64942–64969, 2024
work page 2024
- [35]
-
[36]
Diverse conventions for human-AI collaboration
Bidipta Sarkar, Andy Shih, and Dorsa Sadigh. Diverse conventions for human-AI collaboration. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[37]
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[38]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Brennan Shacklett, Luc Guy Rosenzweig, Zhiqiang Xie, Bidipta Sarkar, Andrew Szot, Erik Wijmans, Vladlen Koltun, Dhruv Batra, and Kayvon Fatahalian. An extensible, data-oriented architecture for high-performance, many-world simulation.ACM Transactions on Graphics (TOG), 42(4):1–13, 2023
work page 2023
-
[40]
Evaluation of human-AI teams for learned and rule-based agents in hanabi
Ho Chit Siu, Jaime Daniel Pena, Edenna Chen, Yutai Zhou, Victor Lopez, Kyle Palko, Kimber- lee Chestnut Chang, and Ross Emerson Allen. Evaluation of human-AI teams for learned and rule-based agents in hanabi. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021
work page 2021
-
[41]
Generalized behavior learning from diverse demonstrations
Varshith Sreeramdass, Rohan R Paleja, Letian Chen, Sanne van Waveren, and Matthew Gombo- lay. Generalized behavior learning from diverse demonstrations. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[42]
Ad hoc autonomous agent teams: Collaboration without pre-coordination
Peter Stone, Gal Kaminka, Sarit Kraus, and Jeffrey Rosenschein. Ad hoc autonomous agent teams: Collaboration without pre-coordination. InProceedings of the AAAI conference on artificial intelligence, volume 24, pages 1504–1509, 2010
work page 2010
-
[43]
DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, and Richard Everett. Collaborating with humans without human data.Advances in Neural Information Processing Systems, 34: 14502–14515, 2021
work page 2021
- [44]
-
[45]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Jun- young Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning.nature, 575(7782):350–354, 2019
work page 2019
-
[46]
Beyond single stationary policies: Meta-task players as naturally superior collaborators
Haoming Wang, Zhaoming Tian, Yunpeng Song, Xiangliang Zhang, and Zhongmin Cai. Beyond single stationary policies: Meta-task players as naturally superior collaborators. InAdvances in Neural Information Processing Systems, volume 37, pages 78836–78862, 2024
work page 2024
-
[47]
In: 2025 IEEE International Conference on Robotics and Automation (ICRA), pp
Ruiqi Wang, Dezhong Zhao, Dayoon Suh, Ziqin Yuan, Guohua Chen, and Byung-Cheol Min. Personalization in human-robot interaction through preference-based action representation learning. InProceedings of the IEEE International Conference on Robotics and Automation, pages 7377–7384, 2025. doi: 10.1109/ICRA55743.2025.11128756
-
[48]
Roma: Multi-agent reinforce- ment learning with emergent roles
Tonghan Wang, Heng Dong, Victor Lesser, and Chongjie Zhang. Roma: Multi-agent reinforce- ment learning with emergent roles. InProceedings of the 37th International Conference on Machine Learning, ICML, pages 9876–9886. PMLR, 2020
work page 2020
-
[49]
Influence-based multi-agent exploration
Tonghan Wang, Jianhao Wang, Yi Wu, and Chongjie Zhang. Influence-based multi-agent exploration. InInternational Conference on Learning Representations, 2020. 12
work page 2020
-
[50]
Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, and Weinan Zhang. Zsc-eval: An evaluation toolkit and benchmark for multi-agent zero-shot coordination.Advances in Neural Information Processing Systems, 37:47344–47377, 2024
work page 2024
-
[51]
Population-based diverse exploration for sparse-reward multi-agent tasks
Pei Xu, Junge Zhang, and Kaiqi Huang. Population-based diverse exploration for sparse-reward multi-agent tasks. InIJCAI, pages 283–291, 2024
work page 2024
-
[52]
Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effectiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624, 2022
work page 2022
-
[53]
Learning to coordinate with anyone
Lei Yuan, Lihe Li, Ziqian Zhang, Feng Chen, Tianyi Zhang, Cong Guan, Yang Yu, and Zhi-Hua Zhou. Learning to coordinate with anyone. InProceedings of the Fifth International Conference on Distributed Artificial Intelligence, pages 1–9, 2023
work page 2023
-
[54]
Proagent: building proactive cooperative agents with large language models
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, et al. Proagent: building proactive cooperative agents with large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17591–17599, 2024
work page 2024
-
[55]
Maximum entropy population-based training for zero-shot human-ai coordination
Rui Zhao, Jinming Song, Yufeng Yuan, Haifeng Hu, Yang Gao, Yi Wu, Zhongqian Sun, and Wei Yang. Maximum entropy population-based training for zero-shot human-ai coordination. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6145–6153, 2023. 13 A Experimental Details and Hyperparameters A.1 Reward Shaping In all layouts, age...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.