pith. sign in

arxiv: 2403.19253 · v3 · submitted 2024-03-28 · 💻 cs.LG · cs.MA

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-24 03:22 UTC · model grok-4.3

classification 💻 cs.LG cs.MA
keywords multi-agent reinforcement learningcoordination graphssparse graphstemporal modelinghistorical observationsknowledge exchangeStarCraft IIend-to-end learning
0
0 comments X

The pith

A sparse coordination graph sampled from an agent-pair probability matrix derived from historical observations enables effective knowledge exchange in multi-agent reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to improve agent coordination in cooperative multi-agent reinforcement learning by inferring a latent temporal sparse coordination graph. It calculates an agent-pair probability matrix using historical observations and samples a sparse graph from it for knowledge exchange between agents. This simultaneously captures dependencies and relation uncertainty with computational complexity tied only to the number of agents. The process incorporates Predict-Future to anticipate upcoming observations and Infer-Present to better understand the current environment from limited data. Graph learning and policy training proceed jointly, yielding superior results on the StarCraft II benchmark.

Core claim

By leveraging agents' historical observations to form a probability matrix, sampling a sparse graph, and exchanging knowledge along its edges, the LTS-CG captures temporal dependencies and uncertainty in relations; the addition of Predict-Future and Infer-Present augments this with foresight and contextual inference, all at a cost dependent only on agent count, in an end-to-end trainable framework.

What carries the argument

The agent-pair probability matrix computed from historical observations, used to sample a sparse graph for selective knowledge exchange.

Load-bearing premise

That a probability matrix based on historical observations reliably identifies the agent pairs whose information exchange is necessary and sufficient for successful coordination.

What would settle it

A controlled test on the StarCraft II benchmark in which the LTS-CG method produces lower win rates than a baseline using only one-step observations would indicate that incorporating historical data does not improve graph quality as claimed.

Figures

Figures reproduced from arXiv: 2403.19253 by Jie Lu, Junyu Xuan, Wei Duan.

Figure 1
Figure 1. Figure 1: The current methods to infer latent graphs in MARL [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The framework of LTS-CG. LTS-CG consists of two key modules: Inter-Agent Sparse Graph Learning and Cooperative MARL. The former follows an encoder-decoder framework: the encoder generates the sparse graph structure, while the decoder—guided by two graph loss functions—learns Predict-Future for anticipating future steps and Infer-Present for deducing current states. The temporal graph structure integrates p… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of our method and baselines on six maps of the StarCraft II benchmark [41]. The Y-axis is the test winning [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of non-graph-based methods on 3s5z and 8m vs 9m. other methods, such as CASEC in 3s5z, DICG in 8m vs 9m and DCG in 10m vs 11m. The reduced variability suggests that our approach consistently produces reliable and stable cooperative behaviours, resulting in more predictable and robust performance across different maps. Notably, our method achieved consistent and competitive performanc… view at source ↗
Figure 6
Figure 6. Figure 6: Performance comparison of different methods on the TAG and Gather scenarios. The TAG scenario involves agents chasing adversaries on a map with obstacles, showing results for 10 agents and 3 adversaries and 20 agents and 5 adversaries The Gather scenario is an extension of the Climb Game where precise coordination yields higher rewards [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance comparison on two maps to evaluate whether sampling the graph from the attention matrix is more effective than not sampling. Results [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Evaluate the effectiveness of the different latent temporal sparse graph learning strategies on SMAC and Gather. Gather scenario, LTS-CG exhibits the best test return after 0.8 million steps and demonstrates a promising convergence speed compared to DICG. These results illustrate the scalability, robustness, and efficiency of LTS-CG across a variety of environments and agent numbers, proving its capability… view at source ↗
Figure 10
Figure 10. Figure 10: Evaluate the effect of the different weights of [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: A case study on the StarCraft II benchmark map 3s5z, featuring 3 Stalkers and 5 Zealots. The top row displays screenshots from the actual gameplay replay, while the bottom row illustrates the corresponding attention matrices and final sparse matrices. Method Graph type Sample Edge Data used Graph Calculation Time Complexity QMIX × × × × DCG Complete × One-step O(A2N2 ) DICG Complete × One-step O(KN2 ) SOP… view at source ↗
read the original abstract

Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for cooperative multi-agent reinforcement learning. Agents' historical observations are used to compute an agent-pair probability matrix; a sparse graph is then sampled from this matrix for knowledge exchange. The approach incorporates Predict-Future (to anticipate future observations) and Infer-Present (to infer context from limited data) modules, claims computational complexity linear in the number of agents, trains the graph learning and policy end-to-end, and asserts superior performance on the StarCraft II benchmark.

Significance. If the empirical results and methodological details hold, the work could improve upon existing one-step graph-based MARL methods by incorporating temporal history and explicit sparsity, potentially aiding scalability in coordination tasks. The end-to-end training and complexity claim (dependent only on agent count) are positive features if substantiated.

major comments (2)
  1. [Abstract] Abstract: the central claim of 'superior performance' on the StarCraft II benchmark is stated without any metrics, baselines, ablation studies, error bars, or experimental protocol, so the primary empirical assertion cannot be evaluated from the manuscript.
  2. [Abstract] Abstract (method description): no equation, algorithm, or sampling rule (threshold, top-k, stochastic, etc.) is supplied for constructing the agent-pair probability matrix from historical observations or for drawing the sparse graph; this mechanism is load-bearing for the claim that the procedure simultaneously captures dependencies, models uncertainty, and avoids redundant/detrimental exchanges.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying the role of the abstract versus the full paper and indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'superior performance' on the StarCraft II benchmark is stated without any metrics, baselines, ablation studies, error bars, or experimental protocol, so the primary empirical assertion cannot be evaluated from the manuscript.

    Authors: The abstract is intentionally concise to summarize the contribution. The full manuscript provides the requested details in the Experiments section, including quantitative metrics on StarCraft II, comparisons to baselines, ablation studies, error bars across multiple runs, and the full experimental protocol. We agree the abstract claim would be stronger with a brief quantitative highlight and will revise it to include key performance deltas while respecting length constraints. revision: partial

  2. Referee: [Abstract] Abstract (method description): no equation, algorithm, or sampling rule (threshold, top-k, stochastic, etc.) is supplied for constructing the agent-pair probability matrix from historical observations or for drawing the sparse graph; this mechanism is load-bearing for the claim that the procedure simultaneously captures dependencies, models uncertainty, and avoids redundant/detrimental exchanges.

    Authors: The abstract offers a textual overview. The full manuscript supplies the precise formulation: Section 3 derives the agent-pair probability matrix from historical observations via the temporal encoder, and Section 4 details the stochastic sampling procedure (Gumbel-softmax reparameterization) used to draw the sparse graph, which explicitly models uncertainty and prunes low-probability edges. The Predict-Future and Infer-Present modules are formalized with their respective loss terms. We will augment the abstract with a short reference to the sampling mechanism to address this concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes LTS-CG as computing an agent-pair probability matrix from historical observations, sampling a sparse graph for knowledge exchange, and augmenting with Predict-Future and Infer-Present. No equations, derivations, or self-citations are exhibited in the provided text that reduce any claimed result or prediction to a fitted input or self-definition by construction. The central claims rest on the introduction of these mechanisms and end-to-end training, which are presented as independent of the target performance metrics on StarCraft II. This is the common case of a self-contained proposal without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the unstated premise that historical observation sequences contain sufficient signal to infer a useful probability matrix and that the two auxiliary mechanisms (Predict-Future, Infer-Present) add independent value beyond standard graph construction.

axioms (2)
  • domain assumption Historical observations contain sufficient information to infer future agent interactions and current context
    Invoked to justify the use of past data for the probability matrix and the two auxiliary features.
  • domain assumption Sparse sampled graphs reduce redundant or detrimental information exchange without losing necessary coordination signals
    Core justification for moving from dense to sampled sparse graphs.
invented entities (1)
  • Latent Temporal Sparse Coordination Graph (LTS-CG) no independent evidence
    purpose: To represent agent-pair dependencies and relation uncertainty for selective knowledge exchange
    New structure introduced to solve the stated limitations of one-step dense graphs.

pith-pipeline@v0.9.0 · 5757 in / 1501 out tokens · 41570 ms · 2026-05-24T03:22:29.687108+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

    cs.LG 2024-04 unverdicted novelty 6.0

    GACG infers a coordination graph capturing both pair and group dependencies for information exchange in MARL, adds a group distance loss for consistency, and reports superior performance on StarCraft II micromanagement tasks.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper

  1. [1]

    Traffic signal control with reinforcement learning based on region- aware cooperative strategy,

    M. Wang, L. Wu, J. Li, and L. He, “Traffic signal control with reinforcement learning based on region- aware cooperative strategy,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 6774–6785, 2022

  2. [2]

    Cooperative heterogeneous multi-robot systems: A survey,

    Y . Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi-robot systems: A survey,” ACM Comput. Surv., vol. 52, no. 2, pp. 29:1–29:31, 2019

  3. [3]

    Multi-agent rein- forcement learning-based resource allocation for UA V networks,

    J. Cui, Y . Liu, and A. Nallanathan, “Multi-agent rein- forcement learning-based resource allocation for UA V networks,” IEEE Trans. Wirel. Commun. , vol. 19, no. 2, pp. 729–743, 2020

  4. [4]

    Value-decomposition networks for cooperative multi-agent learning based on JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, 2024 13 team reward,

    P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V . F. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning based on JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, 2024 13 team reward,” in the 17th International Conference on Autonomous Agents and MultiA...

  5. [5]

    QMIX: monotonic value function factorisation for deep multi-agent rein- forcement learning,

    T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, “QMIX: monotonic value function factorisation for deep multi-agent rein- forcement learning,” inthe 35th International Conference on Machine Learning (ICML 2018), Stockholmsm ¨assan, Stockholm, Sweden, vol. 80, 2018, pp. 4292–4301

  6. [6]

    QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning,

    K. Son, D. Kim, W. J. Kang, D. Hostallero, and Y . Yi, “QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA , vol. 97, 2019, pp. 5887–5896

  7. [7]

    Rethinking individual global max in cooperative multi-agent reinforcement learning,

    Y . Hong, Y . Jin, and Y . Tang, “Rethinking individual global max in cooperative multi-agent reinforcement learning,” in the 36th Annual Conference on Neural Information Processing Systems (NIPS 2022) , vol. 35, 2022, pp. 32 438–32 449

  8. [8]

    Coor- dinated reinforcement learning,

    C. Guestrin, M. G. Lagoudakis, and R. Parr, “Coor- dinated reinforcement learning,” in the 19th Interna- tional Conference (ICML 2002), University of New South Wales, Sydney, Australia, 2002, pp. 227–234

  9. [9]

    Pic: Permuta- tion invariant critic for multi-agent deep reinforcement learning,

    I.-J. Liu, R. A. Yeh, and A. G. Schwing, “Pic: Permuta- tion invariant critic for multi-agent deep reinforcement learning,” in the 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan , vol. 100, 2020, pp. 590– 602

  10. [10]

    Deep coordi- nation graphs,

    W. Boehmer, V . Kurin, and S. Whiteson, “Deep coordi- nation graphs,” in the 37th International Conference on Machine Learning (ICML 2020), Virtual Event, vol. 119, 2020, pp. 980–991

  11. [11]

    Graph convolutional value decomposi- tion in multi-agent reinforcement learning,

    N. Naderializadeh, F. H. Hung, S. Soleyman, and D. Khosla, “Graph convolutional value decomposi- tion in multi-agent reinforcement learning,” CoRR, vol. abs/2010.04740, 2020

  12. [12]

    Deep implicit coordination graphs for multi-agent reinforcement learning,

    S. Li, J. K. Gupta, P. Morales, R. E. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” in the 20th Interna- tional Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), Virtual Event, United Kingdom , 2021, pp. 764–772

  13. [13]

    Self-organized polynomial-time coordination graphs,

    Q. Yang, W. Dong, Z. Ren, J. Wang, T. Wang, and C. Zhang, “Self-organized polynomial-time coordination graphs,” in International Conference on Machine Learn- ing (ICML 2022), Baltimore, Maryland, USA , vol. 162, 2022, pp. 24 963–24 979

  14. [14]

    Context-aware sparse deep coordination graphs,

    T. Wang, L. Zeng, W. Dong, Q. Yang, Y . Yu, and C. Zhang, “Context-aware sparse deep coordination graphs,” in the 10th International Conference on Learn- ing Representations (ICLR 2022), Virtual Event , 2022

  15. [15]

    Learning to score behaviors for guided policy optimization,

    A. Pacchiano, J. Parker-Holder, Y . Tang, K. Choroman- ski, A. Choromanska, and M. Jordan, “Learning to score behaviors for guided policy optimization,” in the 37th International Conference on Machine Learning, (ICML 2020), vol. 119, 13–18 Jul 2020, pp. 7445–7454

  16. [16]

    A survey on multi-agent deep re- inforcement learning: from the perspective of challenges and applications,

    W. Du and S. Ding, “A survey on multi-agent deep re- inforcement learning: from the perspective of challenges and applications,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3215–3238, 2021

  17. [17]

    A review of coop- erative multi-agent deep reinforcement learning,

    A. Oroojlooy and D. Hajinezhad, “A review of coop- erative multi-agent deep reinforcement learning,” Appl. Intell., vol. 53, no. 11, pp. 13 677–13 722, 2023

  18. [18]

    Multi-agent actor-critic for mixed cooperative-competitive environments,

    R. Lowe, Y . Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in the 30th An- nual Conference on Neural Information Processing Sys- tems (NIPS 2017), Long Beach, CA, USA , 2017, pp. 6379–6390

  19. [19]

    Counterfactual multi-agent policy gradients,

    J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, Louisiana, USA , 2018, pp. 2974–2982

  20. [20]

    A comprehensive survey on graph neural networks,

    Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 1, pp. 4–24, 2021

  21. [21]

    Multi-agent game abstraction via graph attention neural network,

    Y . Liu, W. Wang, Y . Hu, J. Hao, X. Chen, and Y . Gao, “Multi-agent game abstraction via graph attention neural network,” in the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, NY, USA, , 2020, pp. 7211–7218

  22. [22]

    Learning nearly decomposable value functions via communication minimization,

    T. Wang, J. Wang, C. Zheng, and C. Zhang, “Learning nearly decomposable value functions via communication minimization,” in the 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 2020

  23. [23]

    Learning from the dark: Boosting graph convolutional neural networks with diverse negative samples,

    W. Duan, J. Xuan, M. Qiao, and J. Lu, “Learning from the dark: Boosting graph convolutional neural networks with diverse negative samples,” in the 36th AAAI Con- ference on Artificial Intelligence (AAAI 2022), Virtual Event. AAAI Press, 2022, pp. 6550–6558

  24. [24]

    Graph con- volutional reinforcement learning,

    J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph con- volutional reinforcement learning,” in 8th International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia , 2020

  25. [25]

    Actor-attention-critic for multi- agent reinforcement learning,

    S. Iqbal and F. Sha, “Actor-attention-critic for multi- agent reinforcement learning,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA , vol. 97, 2019, pp. 2961–2970

  26. [26]

    ROMA: multi-agent reinforcement learning with emergent roles,

    T. Wang, H. Dong, V . R. Lesser, and C. Zhang, “ROMA: multi-agent reinforcement learning with emergent roles,” in the 37th International Conference on Machine Learn- ing (ICML 2020), Virtual Event , vol. 119, 2020, pp. 9876–9886

  27. [27]

    Reducing bus bunching with asyn- chronous multi-agent reinforcement learning,

    J. Wang and L. Sun, “Reducing bus bunching with asyn- chronous multi-agent reinforcement learning,” in Pro- ceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021), Virtual Event / Montreal, Canada, 19-27 August , 2021, pp. 426–433

  28. [28]

    Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,

    B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting,” in the 27th International Joint Con- ference on Artificial Intelligence (IJCAI 2018), Stock- JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, 2024 14 holm, Sweden, 2018, pp. 3634–3640

  29. [29]

    Connecting the dots: Multivariate time series forecasting with graph neural networks,

    Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, “Connecting the dots: Multivariate time series forecasting with graph neural networks,” in The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020), Virtual Event, CA, USA, 2020, pp. 753–763

  30. [30]

    Multivariate time series forecasting with latent graph inference.arXiv preprint arXiv:2203.03423,

    V . G. Satorras, S. S. Rangapuram, and T. Januschowski, “Multivariate time series forecasting with latent graph inference,” CoRR, vol. abs/2203.03423, 2022

  31. [31]

    Neural relational inference for interacting sys- tems,

    T. N. Kipf, E. Fetaya, K. Wang, M. Welling, and R. S. Zemel, “Neural relational inference for interacting sys- tems,” in the 35th International Conference on Machine Learning (ICML 2018), Stockholmsm ¨assan, Stockholm, Sweden, vol. 80. PMLR, 2018, pp. 2693–2702

  32. [32]

    Learn- ing discrete structures for graph neural networks,

    L. Franceschi, M. Niepert, M. Pontil, and X. He, “Learn- ing discrete structures for graph neural networks,” in the 36th International Conference on Machine Learning (ICML 2019), Long Beach, California, USA , vol. 97, 2019, pp. 1972–1982

  33. [33]

    Discrete graph structure learning for forecasting multiple time series,

    C. Shang, J. Chen, and J. Bi, “Discrete graph structure learning for forecasting multiple time series,” in the 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, Austria , 2021

  34. [34]

    Evolvehypergraph: Group-aware dynamic relational reasoning for trajectory prediction,

    J. Li, C. Hua, J. Park, H. Ma, V . M. Dax, and M. J. Kochenderfer, “Evolvehypergraph: Group-aware dynamic relational reasoning for trajectory prediction,” CoRR, vol. abs/2208.05470, 2022

  35. [35]

    Approximation capabilities of multilayer feedforward networks,

    K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks , vol. 4, no. 2, pp. 251–257, 1991

  36. [36]

    Categorical reparameter- ization with gumbel-softmax,

    E. Jang, S. Gu, and B. Poole, “Categorical reparameter- ization with gumbel-softmax,” in the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 2017

  37. [37]

    The concrete distribution: A continuous relaxation of discrete random variables,

    C. J. Maddison, A. Mnih, and Y . W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” in the 5th International Conference on Learn- ing Representations (ICLR 2017),Toulon, France , 2017

  38. [38]

    Diffusion con- volutional recurrent neural network: Data-driven traffic forecasting,

    Y . Li, R. Yu, C. Shahabi, and Y . Liu, “Diffusion con- volutional recurrent neural network: Data-driven traffic forecasting,” in the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 2018

  39. [39]

    Layer- diverse negative sampling for graph neural networks,

    W. Duan, J. Lu, Y . G. Wang, and J. Xuan, “Layer- diverse negative sampling for graph neural networks,” Transactions on Machine Learning Research , 2024

  40. [40]

    Semi-supervised classifi- cation with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classifi- cation with graph convolutional networks,” in the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, April 24-26 , 2017

  41. [41]

    The starcraft multi- agent challenge,

    M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. Hung, P. H. S. Torr, J. N. Foerster, and S. Whiteson, “The starcraft multi- agent challenge,” in the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019), Montreal, QC, Canada, , 2019, pp. 2186–2188

  42. [42]

    Lenient learning in independent- learner stochastic cooperative games,

    E. Wei and S. Luke, “Lenient learning in independent- learner stochastic cooperative games,” J. Mach. Learn. Res., vol. 17, pp. 84:1–84:42, 2016

  43. [43]

    DFAC framework: Fac- torizing the value function via quantile mixture for multi-agent distributional q-learning,

    W. Sun, C. Lee, and C. Lee, “DFAC framework: Fac- torizing the value function via quantile mixture for multi-agent distributional q-learning,” in Proceedings of the 38th International Conference on Machine Learning (ICML 2021) 2021, 18-24 July 2021, Virtual Event , ser. Proceedings of Machine Learning Research, vol. 139, 2021, pp. 9945–9954

  44. [44]

    The surprising effectiveness of PPO in cooperative multi-agent games,

    C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. M. Bayen, and Y . Wu, “The surprising effectiveness of PPO in cooperative multi-agent games,” in The Annual Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, November 28 - December 9, 2022