CCKS: Consensus-based Communication and Knowledge Sharing

Deying Li; Fengyi Zhang; Jinyuan Zu; Naiqi Wu; Wenping Chen; Xiaowei Lv; Yongcai Wang; Yunjun Han

arxiv: 2606.12281 · v1 · pith:WVGIFQOSnew · submitted 2026-06-10 · 💻 cs.MA · cs.AI· cs.LG

CCKS: Consensus-based Communication and Knowledge Sharing

Jinyuan Zu , Xiaowei Lv , Yongcai Wang , Deying Li , Yunjun Han , Wenping Chen , Fengyi Zhang , Naiqi Wu This is my paper

Pith reviewed 2026-06-27 07:32 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.LG

keywords multi-agent reinforcement learningdecentralized training decentralized executionaction advisingknowledge sharingcontrastive learningconsensus modelcooperative MARL

0 comments

The pith

A consensus model built via contrastive learning on local observations lets agents selectively follow teacher advice in cooperative multi-agent tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In decentralized training and decentralized execution for cooperative multi-agent reinforcement learning, current action-advising methods cause excessive guidance and instability because they do not evaluate how well a teacher's recommendation fits the student's situation. The CCKS framework counters this by training a consensus model through contrastive learning on each agent's local observations, then using that model to score actions and apply consensus-derived constraints when deciding whether to adopt shared knowledge. Agents therefore balance exploration with selective use of experienced teachers rather than defaulting to full compliance. The method is built as a plug-and-play addition to existing DTDE algorithms. Experiments in Google Research Football and StarCraft II Multi-Agent Challenge show gains in cooperation efficiency, learning speed, and final performance over standard baselines.

Core claim

CCKS constructs a consensus model via contrastive learning on local observations during training; in action selection, agents score candidate actions against this model and shared knowledge to decide whether to follow a teacher's recommendation, thereby replacing blind adherence with consensus-constrained choice and producing more stable and effective cooperation.

What carries the argument

Consensus model constructed by contrastive learning on local observations, which agents use to score actions and enforce compatibility constraints on teacher advice.

If this is right

Agents reduce excessive advising by only accepting recommendations that satisfy the consensus constraint.
Exploration is preserved while still benefiting from experienced teachers, raising overall task performance.
The same consensus scoring layer can be attached to any existing DTDE algorithm without altering its core training loop.
Cooperation efficiency rises because incompatible advice is filtered before it distorts joint policy updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-observation contrastive construction might be reused to filter advice in other decentralized coordination settings beyond reinforcement learning.
If the consensus model remains stable across changing team compositions, it could support lifelong multi-agent learning without retraining the compatibility layer from scratch.
Testing whether the learned consensus generalizes to teacher policies trained on different tasks would clarify the limits of the compatibility measure.

Load-bearing premise

Contrastive learning on local observations during training produces a consensus model that reliably measures teacher-student compatibility and does not introduce new selection biases when used for action scoring.

What would settle it

Adding CCKS to a standard DTDE baseline in the StarCraft II Multi-Agent Challenge produces no measurable rise in win rate or learning-curve slope compared with the unmodified baseline.

Figures

Figures reproduced from arXiv: 2606.12281 by Deying Li, Fengyi Zhang, Jinyuan Zu, Naiqi Wu, Wenping Chen, Xiaowei Lv, Yongcai Wang, Yunjun Han.

**Figure 1.** Figure 1: This figure provides an overview of CCKS. (a) The communication process among agents. Specifically, the blue sections indicate the agents’ [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Experiment environments. (a) The StarCraft II Multi-Agent Challenge [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental results on the StarCraft Multi-Agent Challenge Environment: The mean test win rate of the five algorithms over 5 seeds under a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Experimental results on the Google Research Football Environment: The mean test score reward of the five algorithms over 5 seeds under a [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation experimental results on SMAC and GRF showing the influence of the components and [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at https://github.com/yuanxpy/CCKS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CCKS adds contrastive learning to build a consensus model for selective action advising in DTDE MARL, with claimed gains on GRF and SMAC that look like a practical but incremental step.

read the letter

The paper's main move is to train a consensus model via contrastive learning on agents' local observations, then use that model at decision time to score how compatible a teacher's suggested action is with the group's consensus before deciding whether to follow it. This is positioned as a fix for over-advising in existing action-advising methods under DTDE.

What is actually new is the specific combination of contrastive consensus construction with the selective advising step, presented as a plug-and-play addition to existing DTDE algorithms. The experiments on Google Research Football and StarCraft II Multi-Agent Challenge are the right kind of test beds, and the claim that it improves cooperation efficiency and learning speed is at least directionally plausible given the motivation.

The paper does a clean job stating the problem with current teacher-student advising and releasing code, which helps. The motivation around balancing exploration and teacher guidance holds up on its own terms.

The soft spots are that the abstract stays high-level on exactly how the consensus scores are turned into action selection constraints, and no quantitative results, error bars, or ablation details appear in the provided summary. That makes it hard to judge the magnitude of the improvement or whether the contrastive model avoids introducing its own selection biases. The experiments cover only two domains, which is standard but keeps the scope narrow.

This is for readers working on knowledge sharing or action advising inside cooperative MARL. Someone already running DTDE baselines on GRF or SMAC could get value from testing the module. It deserves a serious referee because the core idea is coherent, the benchmarks are relevant, and there is no load-bearing flaw visible in the description.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes the Consensus-based Communication and Knowledge Sharing (CCKS) framework as a plug-and-play addition to DTDE MARL algorithms. Agents construct a consensus model via contrastive learning on local observations during training; at action selection, this model is used to score actions and decide whether to follow teacher advice, thereby addressing over-advising and compatibility issues. Experiments in Google Research Football and StarCraft II Multi-Agent Challenge are reported to show gains in cooperation efficiency, learning speed, and overall performance relative to DTDE baselines.

Significance. If the empirical improvements are robust and properly controlled, the plug-and-play design could provide a practical mechanism for more selective knowledge sharing in cooperative MARL. The use of contrastive learning to derive compatibility constraints is a potentially reusable idea for balancing teacher guidance with exploration.

major comments (1)

Abstract: the central performance claims are stated without any quantitative results, error bars, ablation studies, or description of the precise integration of the consensus model into the action-selection step, preventing evaluation of the reported gains in GRF and SMAC.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: the central performance claims are stated without any quantitative results, error bars, ablation studies, or description of the precise integration of the consensus model into the action-selection step, preventing evaluation of the reported gains in GRF and SMAC.

Authors: We agree the abstract would be strengthened by quantitative results and a concise description of the integration. In revision we will add specific metrics (e.g., win-rate or reward improvements with standard errors from the GRF and SMAC tables) and a one-sentence statement of how the consensus model is queried at action selection to score and gate teacher advice. Ablation results remain in the main experimental section; space permitting we will reference the key ablation outcomes in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical plug-and-play framework for DTDE MARL that constructs consensus models via contrastive learning on local observations and uses them for action scoring. No derivation chain, first-principles prediction, or fitted quantity is presented that reduces by construction to its own inputs, self-citations, or ansatzes. The central claims rest on experimental performance gains versus baselines in GRF and SMAC, with the method presented as an algorithmic integration rather than a mathematical reduction. This is the most common honest finding for empirical MARL papers without load-bearing self-referential equations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that contrastive learning yields a useful compatibility signal.

pith-pipeline@v0.9.1-grok · 5782 in / 1062 out tokens · 15961 ms · 2026-06-27T07:32:20.219544+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 3 canonical work pages

[1]

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,

T. Fan, P. Long, W. Liu, and J. Pan, “Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,”The International Journal of Robotics Research, vol. 39, no. 7, pp. 856–892, 2020

2020
[2]

An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility,

L. M. Schmidt, J. Brosig, A. Plinge, B. M. Eskofier, and C. Mutschler, “An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 1342–1349

2022
[3]

Mastering complex control in moba games with deep reinforcement learning,

D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo,et al., “Mastering complex control in moba games with deep reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6672–6679

2020
[4]

Decentralized pomdps,

F. A. Oliehoek, “Decentralized pomdps,” inReinforcement learning: state-of-the-art. Springer, 2012, pp. 471–503

2012
[5]

Multi-agent actor-critic for mixed cooperative-competitive environments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,”Advances in neural information processing systems, vol. 30, 2017

2017
[6]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020
[7]

Actor-attention-critic for multi-agent reinforcement learning,

S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” inInternational conference on machine learning. PMLR, 2019, pp. 2961–2970

2019
[8]

Pic: permutation invariant critic for multi-agent deep reinforcement learning,

I.-J. Liu, R. A. Yeh, and A. G. Schwing, “Pic: permutation invariant critic for multi-agent deep reinforcement learning,” inConference on Robot Learning. PMLR, 2020, pp. 590–602

2020
[9]

Multi-agent reinforcement learning: Independent vs. coopera- tive agents,

M. Tan, “Multi-agent reinforcement learning: Independent vs. coopera- tive agents,” inProceedings of the tenth international conference on machine learning, 1993, pp. 330–337

1993
[10]

Multiagent cooperation and competition with deep reinforcement learning,

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,”PloS one, vol. 12, no. 4, p. e0172395, 2017

2017
[11]

Facmac: Factored multi-agent centralised policy gradients,

B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “Facmac: Factored multi-agent centralised policy gradients,”Advances in Neural Information Processing Systems, vol. 34, pp. 12 208–12 221, 2021

2021
[12]

I2q: A fully decentralized q-learning algorithm,

J. Jiang and Z. Lu, “I2q: A fully decentralized q-learning algorithm,” Advances in Neural Information Processing Systems, vol. 35, pp. 20 469– 20 481, 2022

2022
[13]

Learning multiagent communication with backpropagation,

S. Sukhbaatar, R. Fergus,et al., “Learning multiagent communication with backpropagation,”Advances in neural information processing systems, vol. 29, 2016

2016
[14]

Learning individually inferred communication for multi-agent cooperation,

Z. Ding, T. Huang, and Z. Lu, “Learning individually inferred communication for multi-agent cooperation,”Advances in neural information processing systems, vol. 33, pp. 22 069–22 079, 2020

2020
[15]

Communication in multi-agent reinforcement learning: Intention sharing,

W. Kim, J. Park, and Y . Sung, “Communication in multi-agent reinforcement learning: Intention sharing,” inInternational conference on learning representations, 2020

2020
[16]

A q-values sharing framework for multi-agent reinforcement learning under budget constraint,

C. Zhu, H.-F. Leung, S. Hu, and Y . Cai, “A q-values sharing framework for multi-agent reinforcement learning under budget constraint,”ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 15, no. 2, pp. 1–28, 2021

2021
[17]

Explainable action advising for multi-agent reinforcement learning,

Y . Guo, J. Campbell, S. Stepputtis, R. Li, D. Hughes, F. Fang, and K. Sycara, “Explainable action advising for multi-agent reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5515–5521

2023
[18]

Cautiously-optimistic knowledge sharing for cooperative multi-agent reinforcement learning,

Y . Ba, X. Liu, X. Chen, H. Wang, Y . Xu, K. Li, and S. Zhang, “Cautiously-optimistic knowledge sharing for cooperative multi-agent reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 299–17 307

2024
[19]

Consensus-based partnerships: the heart of effective interprofessional education and collaborative practice,

S. Snyman and J. Rogers, “Consensus-based partnerships: the heart of effective interprofessional education and collaborative practice,” Sustainability and interprofessional collaboration: Ensuring leadership resilience in collaborative health care, pp. 59–82, 2020

2020
[20]

Samvelyan et al

M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,”arXiv preprint arXiv:1902.04043, 2019

work page arXiv 1902
[21]

Google research football: A novel reinforcement learning environment,

K. Kurach, A. Raichuk, P. Sta´nczyk, M. Zaj ˛ ac, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet,et al., “Google research football: A novel reinforcement learning environment,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4501–4510

2020
[22]

Adaptive coordination strategies for human-robot handovers

C.-M. Huang, M. Cakmak, and B. Mutlu, “Adaptive coordination strategies for human-robot handovers.” inRobotics: science and systems, vol. 11. Rome, Italy, 2015, pp. 1–10

2015
[23]

Multi-agent framework for third party logistics in e-commerce,

W. Ying and S. Dayong, “Multi-agent framework for third party logistics in e-commerce,”Expert Systems with Applications, vol. 29, no. 2, pp. 431–436, 2005

2005
[24]

Simultaneously learning and advising in multiagent reinforcement learning,

F. L. Da Silva, R. Glatt, and A. H. R. Costa, “Simultaneously learning and advising in multiagent reinforcement learning,” inProceedings of the 16th conference on autonomous agents and multiagent systems, 2017, pp. 1100–1108

2017
[25]

Learning hierarchical teaching policies for cooperative agents,

D.-K. Kim, M. Liu, S. Omidshafiei, S. Lopez-Cot, M. Riemer, G. Habibi, G. Tesauro, S. Mourad, M. Campbell, and J. P. How, “Learning hierarchical teaching policies for cooperative agents,”arXiv preprint arXiv:1903.03216, 2019

work page arXiv 1903
[26]

Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,

N. Gupta, G. Srinivasaraghavan, S. Mohalik, N. Kumar, and M. E. Taylor, “Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,”Neural Computing and Applications, pp. 1–16, 2023

2023
[27]

Un- derstanding and sharing intentions: The origins of cultural cognition,

M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll, “Un- derstanding and sharing intentions: The origins of cultural cognition,” Behavioral and brain sciences, vol. 28, no. 5, pp. 675–691, 2005

2005
[28]

Consensus learning for cooperative multi-agent reinforcement learning,

Z. Xu, B. Zhang, D. Li, Z. Zhang, G. Zhou, H. Chen, and G. Fan, “Consensus learning for cooperative multi-agent reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 726–11 734

2023
[29]

Contrastive identity-aware learning for multi-agent value decomposition,

S. Liu, Y . Zhou, J. Song, T. Zheng, K. Chen, T. Zhu, Z. Feng, and M. Song, “Contrastive identity-aware learning for multi-agent value decomposition,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 595–11 603

2023
[30]

Learning to ground decentralized multi-agent communication with contrastive learning,

Y . L. Lo and B. Sengupta, “Learning to ground decentralized multi-agent communication with contrastive learning,”arXiv preprint arXiv:2203.03344, 2022

work page arXiv 2022
[31]

Markov games as a framework for multi-agent rein- forcement learning,

M. L. Littman, “Markov games as a framework for multi-agent rein- forcement learning,” inMachine learning proceedings 1994. Elsevier, 1994, pp. 157–163

1994

[1] [1]

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,

T. Fan, P. Long, W. Liu, and J. Pan, “Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,”The International Journal of Robotics Research, vol. 39, no. 7, pp. 856–892, 2020

2020

[2] [2]

An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility,

L. M. Schmidt, J. Brosig, A. Plinge, B. M. Eskofier, and C. Mutschler, “An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility,” in2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 1342–1349

2022

[3] [3]

Mastering complex control in moba games with deep reinforcement learning,

D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo,et al., “Mastering complex control in moba games with deep reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6672–6679

2020

[4] [4]

Decentralized pomdps,

F. A. Oliehoek, “Decentralized pomdps,” inReinforcement learning: state-of-the-art. Springer, 2012, pp. 471–503

2012

[5] [5]

Multi-agent actor-critic for mixed cooperative-competitive environments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,”Advances in neural information processing systems, vol. 30, 2017

2017

[6] [6]

Monotonic value function factorisation for deep multi- agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi- agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020

[7] [7]

Actor-attention-critic for multi-agent reinforcement learning,

S. Iqbal and F. Sha, “Actor-attention-critic for multi-agent reinforcement learning,” inInternational conference on machine learning. PMLR, 2019, pp. 2961–2970

2019

[8] [8]

Pic: permutation invariant critic for multi-agent deep reinforcement learning,

I.-J. Liu, R. A. Yeh, and A. G. Schwing, “Pic: permutation invariant critic for multi-agent deep reinforcement learning,” inConference on Robot Learning. PMLR, 2020, pp. 590–602

2020

[9] [9]

Multi-agent reinforcement learning: Independent vs. coopera- tive agents,

M. Tan, “Multi-agent reinforcement learning: Independent vs. coopera- tive agents,” inProceedings of the tenth international conference on machine learning, 1993, pp. 330–337

1993

[10] [10]

Multiagent cooperation and competition with deep reinforcement learning,

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,”PloS one, vol. 12, no. 4, p. e0172395, 2017

2017

[11] [11]

Facmac: Factored multi-agent centralised policy gradients,

B. Peng, T. Rashid, C. Schroeder de Witt, P.-A. Kamienny, P. Torr, W. Böhmer, and S. Whiteson, “Facmac: Factored multi-agent centralised policy gradients,”Advances in Neural Information Processing Systems, vol. 34, pp. 12 208–12 221, 2021

2021

[12] [12]

I2q: A fully decentralized q-learning algorithm,

J. Jiang and Z. Lu, “I2q: A fully decentralized q-learning algorithm,” Advances in Neural Information Processing Systems, vol. 35, pp. 20 469– 20 481, 2022

2022

[13] [13]

Learning multiagent communication with backpropagation,

S. Sukhbaatar, R. Fergus,et al., “Learning multiagent communication with backpropagation,”Advances in neural information processing systems, vol. 29, 2016

2016

[14] [14]

Learning individually inferred communication for multi-agent cooperation,

Z. Ding, T. Huang, and Z. Lu, “Learning individually inferred communication for multi-agent cooperation,”Advances in neural information processing systems, vol. 33, pp. 22 069–22 079, 2020

2020

[15] [15]

Communication in multi-agent reinforcement learning: Intention sharing,

W. Kim, J. Park, and Y . Sung, “Communication in multi-agent reinforcement learning: Intention sharing,” inInternational conference on learning representations, 2020

2020

[16] [16]

A q-values sharing framework for multi-agent reinforcement learning under budget constraint,

C. Zhu, H.-F. Leung, S. Hu, and Y . Cai, “A q-values sharing framework for multi-agent reinforcement learning under budget constraint,”ACM Transactions on Autonomous and Adaptive Systems (TAAS), vol. 15, no. 2, pp. 1–28, 2021

2021

[17] [17]

Explainable action advising for multi-agent reinforcement learning,

Y . Guo, J. Campbell, S. Stepputtis, R. Li, D. Hughes, F. Fang, and K. Sycara, “Explainable action advising for multi-agent reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5515–5521

2023

[18] [18]

Cautiously-optimistic knowledge sharing for cooperative multi-agent reinforcement learning,

Y . Ba, X. Liu, X. Chen, H. Wang, Y . Xu, K. Li, and S. Zhang, “Cautiously-optimistic knowledge sharing for cooperative multi-agent reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 299–17 307

2024

[19] [19]

Consensus-based partnerships: the heart of effective interprofessional education and collaborative practice,

S. Snyman and J. Rogers, “Consensus-based partnerships: the heart of effective interprofessional education and collaborative practice,” Sustainability and interprofessional collaboration: Ensuring leadership resilience in collaborative health care, pp. 59–82, 2020

2020

[20] [20]

Samvelyan et al

M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,”arXiv preprint arXiv:1902.04043, 2019

work page arXiv 1902

[21] [21]

Google research football: A novel reinforcement learning environment,

K. Kurach, A. Raichuk, P. Sta´nczyk, M. Zaj ˛ ac, O. Bachem, L. Espeholt, C. Riquelme, D. Vincent, M. Michalski, O. Bousquet,et al., “Google research football: A novel reinforcement learning environment,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04, 2020, pp. 4501–4510

2020

[22] [22]

Adaptive coordination strategies for human-robot handovers

C.-M. Huang, M. Cakmak, and B. Mutlu, “Adaptive coordination strategies for human-robot handovers.” inRobotics: science and systems, vol. 11. Rome, Italy, 2015, pp. 1–10

2015

[23] [23]

Multi-agent framework for third party logistics in e-commerce,

W. Ying and S. Dayong, “Multi-agent framework for third party logistics in e-commerce,”Expert Systems with Applications, vol. 29, no. 2, pp. 431–436, 2005

2005

[24] [24]

Simultaneously learning and advising in multiagent reinforcement learning,

F. L. Da Silva, R. Glatt, and A. H. R. Costa, “Simultaneously learning and advising in multiagent reinforcement learning,” inProceedings of the 16th conference on autonomous agents and multiagent systems, 2017, pp. 1100–1108

2017

[25] [25]

Learning hierarchical teaching policies for cooperative agents,

D.-K. Kim, M. Liu, S. Omidshafiei, S. Lopez-Cot, M. Riemer, G. Habibi, G. Tesauro, S. Mourad, M. Campbell, and J. P. How, “Learning hierarchical teaching policies for cooperative agents,”arXiv preprint arXiv:1903.03216, 2019

work page arXiv 1903

[26] [26]

Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,

N. Gupta, G. Srinivasaraghavan, S. Mohalik, N. Kumar, and M. E. Taylor, “Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,”Neural Computing and Applications, pp. 1–16, 2023

2023

[27] [27]

Un- derstanding and sharing intentions: The origins of cultural cognition,

M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll, “Un- derstanding and sharing intentions: The origins of cultural cognition,” Behavioral and brain sciences, vol. 28, no. 5, pp. 675–691, 2005

2005

[28] [28]

Consensus learning for cooperative multi-agent reinforcement learning,

Z. Xu, B. Zhang, D. Li, Z. Zhang, G. Zhou, H. Chen, and G. Fan, “Consensus learning for cooperative multi-agent reinforcement learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 726–11 734

2023

[29] [29]

Contrastive identity-aware learning for multi-agent value decomposition,

S. Liu, Y . Zhou, J. Song, T. Zheng, K. Chen, T. Zhu, Z. Feng, and M. Song, “Contrastive identity-aware learning for multi-agent value decomposition,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 10, 2023, pp. 11 595–11 603

2023

[30] [30]

Learning to ground decentralized multi-agent communication with contrastive learning,

Y . L. Lo and B. Sengupta, “Learning to ground decentralized multi-agent communication with contrastive learning,”arXiv preprint arXiv:2203.03344, 2022

work page arXiv 2022

[31] [31]

Markov games as a framework for multi-agent rein- forcement learning,

M. L. Littman, “Markov games as a framework for multi-agent rein- forcement learning,” inMachine learning proceedings 1994. Elsevier, 1994, pp. 157–163

1994