SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
Pith reviewed 2026-05-20 22:31 UTC · model grok-4.3
The pith
Graph transformer convolutions over a coordination graph let each agent receive tailored signals from teammates before acting on partial observations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that graph transformer convolutions applied to an inter-agent coordination graph produce receiver-sensitive, content-dependent signals that integrate scattered teammate knowledge into each agent's local representation, thereby reducing the partial-observation information bottleneck and yielding performance that is statistically superior to prior mixing or communication baselines on every evaluated task.
What carries the argument
An inter-agent coordination graph whose nodes carry local observations and whose edges are processed by graph transformer convolutions that generate content-dependent and receiver-sensitive messages before action selection.
If this is right
- The same architecture produces statistically significant gains on tasks that require spatial coordination, explicit communication, and adversarial interaction.
- Parameter-matched ablations isolate the performance lift to the degree of content-dependence inside the message-passing operator rather than to raw model size.
- Bootstrap confidence intervals, Friedman rankings, and performance profiles all support that the advantage is robust across environments and not an artifact of a single metric.
- The method works inside the standard centralized-training decentralized-execution loop without requiring agents to transmit raw observations or intentions at decision time.
Where Pith is reading between the lines
- If the coordination graph can be updated online, the same integration pattern could handle environments where team membership changes during an episode.
- The approach suggests that explicit communication protocols may be unnecessary when implicit, receiver-tailored integration already supplies the missing joint knowledge.
- Similar graph-based enrichment might improve single-agent reinforcement learning under strong partial observability by treating past states or auxiliary sensors as virtual teammates.
Load-bearing premise
The coordination graph and its graph transformer convolution operator can be constructed and trained so that the resulting signals actually reduce the information bottleneck without creating new representational or optimization failures.
What would settle it
Replacing the graph transformer convolution with a content-independent aggregator such as mean pooling while keeping parameter count fixed and observing no drop in performance would falsify the claim that content-dependence is the source of the reported gains.
Figures
read the original abstract
Cooperative multi-agent reinforcement learning agents that act on partial local observations face a fundamental information bottleneck: the knowledge needed to select jointly optimal actions is scattered across the team, yet each agent must commit to a decision without access to its teammates' observations, intentions, or chosen actions. Existing methods either ignore this bottleneck, compress it into a scalar mixing signal, or route around it with learned communication channels. Framing action coordination as a problem of structured information integration among agents, we propose \textit{structured agent coordination via holistic information integration}, or SACHI, in which graph transformer convolutions over an inter-agent coordination graph enrich each agent's representation with receiver-sensitive, content-dependent signals from teammates prior to action selection. We evaluate SACHI across five cooperative tasks spanning spatial, communicative, and adversarial coordination challenges against twelve baselines. SACHI consistently matches or outperforms the best baseline on every task, and rigorous aggregate statistical analyses, including normalized metrics with bootstrap confidence intervals, Friedman ranking, and performance profiling, confirm that this advantage is statistically significant, robust across environments, and not attributable to increased model capacity. Parameter-matched ablations further trace the source of the gains to a single architectural property: the degree of content-dependence in the message-passing operator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SACHI for cooperative MARL under partial observations, framing coordination as holistic information integration via graph transformer convolutions over an inter-agent coordination graph. This produces receiver-sensitive, content-dependent signals that enrich each agent's representation before action selection. Evaluated on five tasks (spatial, communicative, adversarial) against twelve baselines, SACHI matches or outperforms the best baseline on every task; aggregate statistics (normalized metrics with bootstrap CIs, Friedman ranking, performance profiling) establish statistical significance and robustness, while parameter-matched ablations attribute gains specifically to the degree of content-dependence in the message-passing operator.
Significance. If the central architectural claim holds, the work offers a structured alternative to scalar mixing or generic communication channels for reducing information bottlenecks in cooperative MARL. Strengths include the breadth of evaluation, rigorous statistical aggregation across environments, and explicit ablations that isolate content-dependence rather than capacity. These elements make the empirical contribution more convincing than typical MARL ablation studies and could inform future designs that require fine-grained, receiver-aware coordination.
major comments (2)
- [§3 (Method, coordination graph definition)] The inter-agent coordination graph construction is underspecified. It is unclear whether edges are fixed (e.g., complete graph), learned via adjacency matrix, or environment-specific; without this definition it is impossible to verify that the resulting signals are receiver-sensitive and content-dependent as claimed in the abstract and §3. This detail is load-bearing for the central claim that gains arise from the architectural property rather than implicit capacity or tuning differences.
- [§3.2 (graph transformer convolution)] The graph transformer convolution operator lacks explicit specification of how receiver identity and message content condition the attention keys/queries (or equivalent). It is therefore difficult to distinguish the operator from standard GAT or QMIX-style mixing and to confirm that it actually reduces the information bottleneck without introducing new representational failures. This directly affects the validity of the ablation results tracing gains to content-dependence.
minor comments (2)
- [Figure 2] Figure 2 (or equivalent architecture diagram) would benefit from explicit annotation of receiver-specific conditioning paths to make the claimed property visually verifiable.
- [Abstract] The abstract states 'five cooperative tasks' but does not name them; a parenthetical list would improve readability without lengthening the paragraph.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and agree that greater explicitness in the method section will strengthen the manuscript. We have prepared revisions to clarify the coordination graph construction and the conditioning mechanisms in the graph transformer convolution.
read point-by-point responses
-
Referee: [§3 (Method, coordination graph definition)] The inter-agent coordination graph construction is underspecified. It is unclear whether edges are fixed (e.g., complete graph), learned via adjacency matrix, or environment-specific; without this definition it is impossible to verify that the resulting signals are receiver-sensitive and content-dependent as claimed in the abstract and §3. This detail is load-bearing for the central claim that gains arise from the architectural property rather than implicit capacity or tuning differences.
Authors: We agree that the coordination graph requires a more explicit definition to support verification of the claimed properties. The manuscript constructs the inter-agent coordination graph as a fixed complete graph over the set of agents (i.e., every pair of agents shares an edge), independent of environment-specific features and without learned adjacency. This fixed structure is chosen precisely to isolate the effects of the subsequent graph transformer convolution. We will revise §3 to include a formal definition of the graph, pseudocode for its construction, and an accompanying figure. With this clarification, the receiver-sensitive and content-dependent character of the signals can be attributed directly to the convolution operator rather than to the graph topology, preserving the validity of the central architectural claim and the parameter-matched ablations. revision: yes
-
Referee: [§3.2 (graph transformer convolution)] The graph transformer convolution operator lacks explicit specification of how receiver identity and message content condition the attention keys/queries (or equivalent). It is therefore difficult to distinguish the operator from standard GAT or QMIX-style mixing and to confirm that it actually reduces the information bottleneck without introducing new representational failures. This directly affects the validity of the ablation results tracing gains to content-dependence.
Authors: We appreciate the referee highlighting the need for greater mathematical detail. In SACHI the graph transformer convolution computes attention scores by forming queries from each receiver agent's local representation (thereby incorporating receiver identity) while keys and values are produced from the sender agent's message content via content-dependent linear transformations; agent identity embeddings are added to both queries and keys to further emphasize receiver sensitivity. This formulation is distinct from standard GAT (which does not explicitly separate receiver conditioning in this manner) and from QMIX-style mixing (which performs centralized value decomposition rather than per-agent message passing). We will insert the precise attention equations, including the conditioning steps, into the revised §3.2. These additions will allow direct verification that the operator reduces the information bottleneck and will reinforce the ablation results that isolate the contribution of content dependence. revision: yes
Circularity Check
No circularity: empirical claims rest on held-out task performance and ablations, not self-referential definitions or fitted inputs renamed as predictions
full rationale
The paper's central claims concern empirical outperformance on five cooperative MARL tasks, supported by statistical analyses (bootstrap CIs, Friedman ranking, performance profiling) and parameter-matched ablations that isolate content-dependence in the message-passing operator. No derivation chain reduces a claimed result to its own inputs by construction: the coordination graph and graph transformer are architectural choices evaluated against baselines, not fitted parameters whose outputs are then presented as independent predictions. Self-citations to prior MARL literature are not load-bearing for the uniqueness or correctness of the reported gains, which are externally falsifiable via the described experiments. The method is self-contained against the provided benchmarks and does not invoke self-citation chains or ansatzes that collapse the result to the input.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Agents operate under partial local observations and must select actions without direct access to teammates' observations or intentions.
Reference graph
Works this paper leans on
-
[1]
Mooney,The principles of organization
J. Mooney,The principles of organization. Harper & Row, 1947. [Online]. Available: https://books.google. com/books?id=d7rczgEACAAJ
work page 1947
-
[2]
F. A. Oliehoek, C. Amatoet al.,A concise introduction to decentralized POMDPs. Springer, 2016, vol. 1
work page 2016
-
[3]
The complexity of decentralized control of markov decision processes,
D. S. Bernstein, R. Givan, N. Immerman, and S. Zil- berstein, “The complexity of decentralized control of markov decision processes,”Mathematics of operations research, vol. 27, no. 4, pp. 819–840, 2002
work page 2002
-
[4]
Value-Decomposition Networks For Cooperative Multi-Agent Learning
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V . Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuylset al., “Value-decomposition networks for cooperative multi-agent learning,”arXiv preprint arXiv:1706.05296, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Monotonic value function factorisation for deep multi-agent reinforcement learn- ing,
T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learn- ing,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020
work page 2020
-
[6]
Multi-agent actor-critic for mixed cooperative-competitive environments,
R. Lowe, Y . Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,”Neural Informa- tion Processing Systems (NIPS), 2017
work page 2017
-
[7]
The surprising effectiveness of ppo in cooperative multi-agent games,
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . Wu, “The surprising effectiveness of ppo in cooperative multi-agent games,”Advances in neural in- formation processing systems, vol. 35, pp. 24 611–24 624, 2022
work page 2022
-
[8]
Applications of multi-agent reinforcement learning in future internet: A comprehensive survey,
T. Li, K. Zhu, N. C. Luong, D. Niyato, Q. Wu, Y . Zhang, and B. Chen, “Applications of multi-agent reinforcement learning in future internet: A comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 1240–1279, 2022
work page 2022
-
[9]
A review of cooper- ative multi-agent deep reinforcement learning,
A. Oroojlooy and D. Hajinezhad, “A review of cooper- ative multi-agent deep reinforcement learning,”Applied Intelligence, vol. 53, no. 11, pp. 13 677–13 722, 2023
work page 2023
-
[10]
Distributed reinforcement learning for robot teams: A review,
Y . Wang, M. Damani, P. Wang, Y . Cao, and G. Sartoretti, “Distributed reinforcement learning for robot teams: A review,”Current Robotics Reports, vol. 3, no. 4, pp. 239– 257, 2022
work page 2022
-
[11]
Collision avoidance mechanism for swarms of drones,
D. Marek, P. Biernacki, J. Szyguła, A. Doma ´nski, M. Paszkuta, M. Szczygieł, M. Kr ´ol, and K. Woj- ciechowski, “Collision avoidance mechanism for swarms of drones,”Sensors, vol. 25, no. 4, p. 1141, 2025
work page 2025
-
[12]
Masked label prediction: Unified message passing model for semi-supervised classification,
S. Yunsheng, H. Zhengjie, F. Shikun, Z. Hui, W. Wenjing, and S. Yu, “Masked label prediction: Unified message passing model for semi-supervised classification,”Pro- ceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pp. 1548–1554, 08 2021
work page 2021
-
[13]
A com- prehensive survey of multiagent reinforcement learning,
L. Busoniu, R. Babuska, and B. De Schutter, “A com- prehensive survey of multiagent reinforcement learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008
work page 2008
-
[14]
A survey and critique of multiagent deep reinforcement learning,
P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi-Agent Systems, vol. 33, no. 6, pp. 750–797, 2019
work page 2019
-
[15]
Multi-agent reinforcement learning: Indepen- dent vs. cooperative agents,
M. Tan, “Multi-agent reinforcement learning: Indepen- dent vs. cooperative agents,” inProceedings of the tenth international conference on machine learning, 1993, pp. 330–337
work page 1993
-
[16]
L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “In- dependent reinforcement learners in cooperative markov games: a survey regarding coordination problems,”The Knowledge Engineering Review, vol. 27, no. 1, pp. 1–31, 2012. 15
work page 2012
-
[17]
The dynamics of rein- forcement learning in cooperative multiagent systems,
C. Claus and C. Boutilier, “The dynamics of rein- forcement learning in cooperative multiagent systems,” AAAI/IAAI, vol. 1998, no. 746-752, p. 2, 1998
work page 1998
-
[18]
Learning to communicate with deep multi-agent reinforcement learning,
J. Foerster, I. A. Assael, N. De Freitas, and S. White- son, “Learning to communicate with deep multi-agent reinforcement learning,”Advances in neural information processing systems, vol. 29, 2016
work page 2016
-
[19]
Learning multiagent communication with backpropagation,
S. Sukhbaatar, R. Ferguset al., “Learning multiagent communication with backpropagation,”Advances in neu- ral information processing systems, vol. 29, 2016
work page 2016
-
[20]
Tarmac: Targeted multi-agent communication,
A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “Tarmac: Targeted multi-agent communication,” inInternational Conference on machine learning. PMLR, 2019, pp. 1538–1546
work page 2019
-
[21]
Roma: multi-agent reinforcement learning with emergent roles,
T. Wang, H. Dong, V . Lesser, and C. Zhang, “Roma: multi-agent reinforcement learning with emergent roles,” inProceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020
work page 2020
-
[22]
Celebrating diversity in shared multi-agent reinforce- ment learning,
C. Li, T. Wang, C. Wu, Q. Zhao, J. Yang, and C. Zhang, “Celebrating diversity in shared multi-agent reinforce- ment learning,”Advances in Neural Information Process- ing Systems, vol. 34, pp. 3991–4002, 2021
work page 2021
-
[23]
S., Gupta, T., Makoviichuk, D., Makoviychuk, V ., Torr, P
C. S. De Witt, T. Gupta, D. Makoviichuk, V . Makoviy- chuk, P. H. Torr, M. Sun, and S. Whiteson, “Is indepen- dent learning all you need in the starcraft multi-agent challenge?”arXiv preprint arXiv:2011.09533, 2020
-
[24]
Trust region policy optimisation in multi-agent reinforcement learning,
J. G. Kuba, R. Chen, M. Wen, Y . Wen, F. Sun, J. Wang, and Y . Yang, “Trust region policy optimisation in multi-agent reinforcement learning,” inInternational Conference on Learning Representations, 2022. [Online]. Available: https: //openreview.net/forum?id=EcGGFkNTxdJ
work page 2022
-
[25]
Multi-agent reinforcement learning as a rehearsal for decentralized planning,
L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,”Neu- rocomputing, vol. 190, pp. 82–94, 2016
work page 2016
-
[26]
Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,
K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y . Yi, “Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in International conference on machine learning. PMLR, 2019, pp. 5887–5896
work page 2019
-
[27]
Qplex: Duplex dueling multi-agent q-learning
J. Wang, Z. Ren, T. Liu, Y . Yu, and C. Zhang, “Qplex: Duplex dueling multi-agent q-learning,”arXiv preprint arXiv:2008.01062, 2020
-
[28]
Counterfactual multi-agent policy gradi- ents,
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradi- ents,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018
work page 2018
-
[29]
Coordinated reinforcement learning,
C. Guestrin, M. Lagoudakis, and R. Parr, “Coordinated reinforcement learning,” inICML, vol. 2, 2002, pp. 227– 234
work page 2002
-
[30]
W. B ¨ohmer, V . Kurin, and S. Whiteson, “Deep coordi- nation graphs,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 980–991
work page 2020
-
[31]
Deep implicit coordination graphs for multi-agent reinforcement learning,
S. Li, J. K. Gupta, P. Morales, R. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” inAdaptive Agents and Multi-Agent Systems, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219966887
work page 2020
-
[32]
Self-organized polynomial-time coordination graphs,
Q. Yang, W. Dong, Z. Ren, J. Wang, T. Wang, and C. Zhang, “Self-organized polynomial-time coordination graphs,” inInternational conference on machine learn- ing. PMLR, 2022, pp. 24 963–24 979
work page 2022
-
[33]
Context-aware sparse deep coordination graphs,
T. Wang, L. Zeng, W. Dong, Q. Yang, Y . Yu, and C. Zhang, “Context-aware sparse deep coordination graphs,” inInternational Conference on Learning Representations, 2022. [Online]. Available: https:// openreview.net/forum?id=wQfgfb8VKTn
work page 2022
-
[34]
Deep meta coordination graphs for multi-agent reinforcement learning,
N. Gupta, J. Z. Hare, R. Kannan, and V . Prasanna, “Deep meta coordination graphs for multi-agent reinforcement learning,”arXiv preprint arXiv:2502.04028, 2025
-
[35]
Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,
N. Gupta, G. Srinivasaraghavan, S. Mohalik, N. Kumar, and M. E. Taylor, “Hammer: Multi-level coordination of reinforcement learning agents via learned messaging,” Neural Computing and Applications, vol. 37, no. 19, pp. 13 221–13 236, 2025
work page 2025
-
[36]
N. Gupta, L. Twardecka, J. Z. Hare, J. Milzman, R. Kan- nan, and V . Prasanna, “Tiger-marl: Enhancing multi- agent reinforcement learning with temporal information through graph-based embeddings and representations,” arXiv preprint arXiv:2511.08832, 2025
-
[37]
Action-graph policies: Learning action co-dependencies in multi-agent reinforcement learning,
N. Gupta, J. Z. Hare, J. Milzman, R. Kannan, and V . Prasanna, “Action-graph policies: Learning action co-dependencies in multi-agent reinforcement learning,” arXiv preprint arXiv:2602.17009, 2026
-
[38]
Graph convolutional reinforcement learning,
J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph convolutional reinforcement learning,” inInternational Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id= HkxdQkSYDB
work page 2020
-
[39]
Deep Multi-Agent Reinforcement Learning with Relevance Graphs
A. Malysheva, T. T. Sung, C.-B. Sohn, D. Kudenko, and A. Shpilman, “Deep multi-agent reinforcement learning with relevance graphs,”arXiv preprint arXiv:1811.12557, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
Relational inductive biases, deep learning, and graph networks
P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez- Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkneret al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[41]
Scalable multi-agent rein- forcement learning through intelligent information aggre- gation,
S. Nayak, K. Choi, W. Ding, S. Dolan, K. Gopalakr- ishnan, and H. Balakrishnan, “Scalable multi-agent rein- forcement learning through intelligent information aggre- gation,” inInternational conference on machine learning. PMLR, 2023, pp. 25 817–25 833
work page 2023
-
[42]
P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” in International Conference on Learning Representations,
-
[43]
Available: https://openreview.net/forum? id=rJXMpikCZ
[Online]. Available: https://openreview.net/forum? id=rJXMpikCZ
-
[44]
Deep reinforce- ment learning with double q-learning,
H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforce- ment learning with double q-learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016
work page 2016
-
[45]
Semi-supervised classifica- tion with graph convolutional networks,
T. N. Kipf and M. Welling, “Semi-supervised classifica- tion with graph convolutional networks,” inInternational Conference on Learning Representations, 2017. [Online]. Available: https://openreview.net/forum?id=SJU4ayYgl 16
work page 2017
-
[46]
Graph convolutional value decomposi- tion in multi-agent reinforcement learning,
N. Naderializadeh, F. H. Hung, S. Soleyman, and D. Khosla, “Graph convolutional value decomposition in multi-agent reinforcement learning,”arXiv preprint arXiv:2010.04740, 2020
-
[47]
Non-linear coordination graphs,
Y . Kang, T. Wang, Q. Yang, X. Wu, and C. Zhang, “Non-linear coordination graphs,”Advances in neural information processing systems, vol. 35, pp. 25 655– 25 666, 2022
work page 2022
-
[48]
Group-aware coordination graph for multi-agent reinforcement learning,
W. Duan, J. Lu, and J. Xuan, “Group-aware coordination graph for multi-agent reinforcement learning,” in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, ser. IJCAI ’24,
-
[49]
[Online]. Available: https://doi.org/10.24963/ijcai. 2024/434
-
[50]
Fop: Fac- torizing optimal joint policy of maximum-entropy multi- agent reinforcement learning,
T. Zhang, Y . Li, C. Wang, G. Xie, and Z. Lu, “Fop: Fac- torizing optimal joint policy of maximum-entropy multi- agent reinforcement learning,” inInternational confer- ence on machine learning. PMLR, 2021, pp. 12 491– 12 500
work page 2021
-
[51]
Benchmarking multi-agent deep reinforcement learn- ing algorithms in cooperative tasks
G. Papoudakis, F. Christianos, L. Sch ¨afer, and S. V . Albrecht, “Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks,” inProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021. [Online]. Available: http://arxiv.org/abs/2006.07869
-
[52]
S., Farquhar, G., Nardelli, N., Rudner, T
M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson, “The StarCraft Multi-Agent Challenge,”CoRR, vol. abs/1902.04043, 2019
-
[53]
J. Hu, S. Wang, S. Jiang, and M. Wang, “Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning,” inThe Second Blogpost Track at ICLR 2023, 2023. [Online]. Available: https: //openreview.net/forum?id=Y8hONVbMSDj
work page 2023
-
[54]
Deep reinforcement learning at the edge of the statistical precipice,
R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,”Advances in neural information processing systems, vol. 34, pp. 29 304– 29 320, 2021
work page 2021
-
[55]
Statistical comparisons of classifiers over multiple data sets,
J. Dem ˇsar, “Statistical comparisons of classifiers over multiple data sets,”Journal of Machine learning re- search, vol. 7, no. Jan, pp. 1–30, 2006
work page 2006
-
[56]
A. Shehzad, F. Xia, S. Abid, C. Peng, S. Yu, D. Zhang, and K. Verspoor, “Graph transformers: A survey,”IEEE Transactions on Neural Networks and Learning Systems, 2026
work page 2026
-
[57]
T. K. Rusch, M. M. Bronstein, and S. Mishra, “A sur- vey on oversmoothing in graph neural networks,”arXiv preprint arXiv:2303.10993, 2023
-
[58]
A Note on Over-Smoothing for Graph Neural Networks, June 2020
C. Cai and Y . Wang, “A note on over-smoothing for graph neural networks,”arXiv preprint arXiv:2006.13318, 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.