pith. sign in

arxiv: 2605.17393 · v1 · pith:6OMOKWOAnew · submitted 2026-05-17 · 💻 cs.AI · cs.LG· cs.MA

Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-20 13:02 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MA
keywords multi-agent reinforcement learningcoordination graphsinformation bottlenecksparse graph learningvariational boundswater-filling allocationgroup structure
0
0 comments X

The pith

HIBCG learns sparse coordination graphs with closed-form criteria for both edge existence and per-agent message capacity using a group-aligned block-diagonal prior in the graph information bottleneck.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to replace heuristic methods for building sparse coordination graphs in cooperative multi-agent reinforcement learning with a theoretically grounded approach. It applies the graph information bottleneck to first create a group-aligned block-diagonal prior that supplies an explicit rule for deciding which edges to keep and at what density within each group. On the resulting graph it then compresses messages by allocating limited feature bandwidth to each agent according to a water-filling rule that retains only task-relevant content. A sympathetic reader would care because current sparse-graph learners offer no formal guarantee on topology or capacity, which can produce inefficient communication patterns that hurt both performance and scalability in multi-agent tasks.

Core claim

With the graph information bottleneck serving as the underlying tool, HIBCG constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention and then controls per-agent feature bandwidth on the resulting topology, with proofs that the prior strictly tightens the variational bound, the objective decomposes per group block, and capacity allocation follows a water-filling principle.

What carries the argument

The group-aligned block-diagonal prior inside the graph information bottleneck, which supplies the closed-form edge-retention criterion and permits the objective to decompose for differential per-group control.

If this is right

  • The variational bound on topology learning is strictly tightened by the group-aligned prior.
  • The overall objective decomposes per group block, allowing independent control of edge density in different groups.
  • Message capacity is allocated across agents according to a water-filling principle that keeps only task-relevant information.
  • Both the decision of which edges exist and how much capacity each carries receive explicit theoretical justification rather than heuristic selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The block-diagonal construction could be tested by measuring whether learned graphs become sparser yet more effective when the task naturally contains group structure.
  • If groups must be discovered rather than given, joint inference of partitions alongside the graph might preserve the closed-form benefits while broadening applicability.
  • The water-filling capacity rule suggests that agents with more intra-group connections would receive proportionally different bandwidth, which could be verified by inspecting per-agent compression rates in trained models.

Load-bearing premise

That agents can be partitioned into groups such that a block-diagonal prior on the coordination graph produces a closed-form, task-relevant criterion for edge retention and remains compatible with the graph information bottleneck objective.

What would settle it

In a multi-agent environment with known group partitions, the graphs learned by HIBCG either violate the expected block-diagonal structure or yield no improvement in task reward and total communication volume compared with heuristic sparse-graph baselines.

Figures

Figures reproduced from arXiv: 2605.17393 by En Yu, Jie Lu, Junyu Xuan, Wei Duan, Xiaoyu Yang.

Figure 1
Figure 1. Figure 1: Arbitrary vs. heterogeneous sparse coordination graphs. Homogeneous sparsification (left) applies one criterion to all edges, producing sparse but structurally arbitrary links. HIBCG (right) learns a group-aligned sparse graph with dense intra-group connections and selective inter-group links. The pattern emerges from the group-aligned prior in Section 4, which gives different retention costs to different … view at source ↗
Figure 2
Figure 2. Figure 2: HIBCG architecture overview. Stage 1 (Initialisation): GACG produces the initial graph A(0) and group partition G from agent observations. Stage 2 (Heterogeneous Structural Pruning): At each GNN layer l, a layer-wise structural encoder maps (A(0), Z(l−1) X ) to Gaussian parameters P (l) ; the sampled adjacency A(l) gates message passing. A group-aligned block-diagonal prior Q(l) applies asymmetric KL press… view at source ↗
Figure 3
Figure 3. Figure 3: Overall performance comparison across nine scenarios spanning SMACv1 (3s5z, 1c3s5z, MMM2, 25m), SMACv2 (protoss 8v8, terran 10v10) and MAgent Battle at three scales (n ∈ {36, 64, 100}). Curves show mean over seeds; shaded bands are ±0.5σ. We compare HIBCG against six external baselines (QMIX, MAGI, CommFormer, ExpoComm, GACG, and BVME—our published message-IB prior work), and include HIB-flat as an interna… view at source ↗
Figure 4
Figure 4. Figure 4: Component ablation learning curves on 3s5z, MMM2, and MAgent-36. On heterogeneous maps the gap between HIBCG and HIB-flat is large and sustained; on 2-type near-saturated 3s5z all variants cluster at the ceiling. Heterogeneous maps: full stack wins [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: AIB bound tightening (Prop. 4.2) across six scenarios. Curves show the training trajectory of lossAIB (log scale); lower is tighter. HIBCG (red) attains a strictly lower AIB loss than HIB-flat (orange) and AIB-only (blue) on every scenario. Tightening ratios range from 2.9× (MAgent) to 6.5× (SMACv2), matching [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: HIBCG on MAgent-64 (Episode 3) across five timesteps. Row 1 : battle render (red = our team, blue = enemy; 63 vs. 48 at t=200). Row 2 : agent positions colored by learned group (green = G0, orange = G1); groups track spatial flanks without supervision. Row 3 : 64×64 adjacency matrix; dense intra-group blocks and sparse inter-group blocks confirm the heterogeneous pruning driven by the group￾aligned AIB pri… view at source ↗
Figure 3
Figure 3. Figure 3 [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mechanism and information-allocation analysis. (a) Graph density evolution on 3s5z: AIB￾only follows a “warm-then-prune” trajectory; HIBCG reaches a comparable asymptote with lower variance, confirming dual-path coupling. (b) Differential compression inside HIBCG on MAgent-36: intra-group AIB loss stays at ∼0.07 while cross-group loss stabilises at ∼62—a ∼906× ratio that holds at n ∈ {36, 64, 100} (structu… view at source ↗
read the original abstract

Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism to decide which edges should exist and how much information each edge should carry. Current methods rely on heuristic criteria that offer no formal guarantee on the learned topology, and no principled way to allocate different communication capacities to structurally different agent relationships. To address this, we propose Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG), which learns a group-aware sparse graph in which both edge existence and message capacity are theoretically justified. With the graph information bottleneck (GIB) serving as the underlying tool, HIBCG first constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention -- determining which edges should exist and at what density per group block -- and then controls per-agent feature bandwidth on the resulting topology, compressing messages to retain only task-relevant content. We prove that the group-aligned prior strictly tightens the variational bound on topology learning, that the objective decomposes per group block, enabling differential edge control, and that capacity allocation follows a water-filling principle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG) for cooperative multi-agent reinforcement learning. Building on the graph information bottleneck (GIB) framework, it introduces a group-aligned block-diagonal prior over the coordination graph that supplies a closed-form criterion for edge retention and per-group density. The method then compresses per-agent messages while allocating heterogeneous capacities according to a water-filling rule. The authors claim three theoretical results: the block-diagonal prior strictly tightens the GIB variational bound, the objective decomposes across group blocks, and capacity allocation follows the water-filling principle.

Significance. If the claimed proofs hold and the group-partition assumption can be satisfied without external domain knowledge, the work would supply the first information-theoretically justified mechanism for jointly learning sparse topology and heterogeneous message capacities in MARL. This could improve both sample efficiency and interpretability relative to heuristic sparsity methods, while extending GIB techniques to structured multi-agent settings.

major comments (2)
  1. [§3.2] §3.2 (Group-Aligned Prior Construction): The derivation of the closed-form edge-retention criterion rests on a static partition of agents into groups that produces a block-diagonal prior. The manuscript does not specify how this partition is obtained in general environments; if it is supplied by the user or by an unstated preprocessing step rather than derived from the GIB objective, the claimed strict tightening of the variational bound and the per-block decomposition become conditional on an external assumption that is not guaranteed when coordination structures are overlapping or dynamic.
  2. [Theorem 1] Theorem 1 (Variational Bound Tightening): The proof that the group-aligned prior strictly improves the GIB bound should explicitly quantify the approximation error introduced when inter-group edges that carry task-relevant information are pruned by the block-diagonal structure. Without this quantification, it is unclear whether the tightening holds in environments where optimal coordination crosses the assumed group boundaries.
minor comments (2)
  1. [§3.1] Notation for the block-diagonal prior matrix (Eq. 7) is introduced without an explicit definition of the group indicator matrix; adding a short paragraph or diagram would improve readability.
  2. [§5] The experimental section reports performance gains but does not include an ablation that isolates the contribution of the water-filling allocation versus uniform capacity; such a control would strengthen the empirical support for the heterogeneous-capacity claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments on our manuscript. We address each of the major comments below, providing clarifications and indicating the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Group-Aligned Prior Construction): The derivation of the closed-form edge-retention criterion rests on a static partition of agents into groups that produces a block-diagonal prior. The manuscript does not specify how this partition is obtained in general environments; if it is supplied by the user or by an unstated preprocessing step rather than derived from the GIB objective, the claimed strict tightening of the variational bound and the per-block decomposition become conditional on an external assumption that is not guaranteed when coordination structures are overlapping or dynamic.

    Authors: We appreciate the referee pointing out the need for clarity on the group partition. The manuscript assumes that the partition into groups is provided as input, either from domain knowledge or as a preprocessing step based on agent similarities, which is a standard assumption in structured MARL settings. This allows the block-diagonal prior to be constructed and yields the closed-form criterion and decomposition. To address this comment, we will revise the manuscript to explicitly describe possible ways to obtain the partition (e.g., via clustering on agent state representations) and discuss the implications when the partition does not perfectly align with the optimal coordination structure. We will also emphasize that the theoretical results hold under this modeling assumption. revision: yes

  2. Referee: [Theorem 1] Theorem 1 (Variational Bound Tightening): The proof that the group-aligned prior strictly improves the GIB bound should explicitly quantify the approximation error introduced when inter-group edges that carry task-relevant information are pruned by the block-diagonal structure. Without this quantification, it is unclear whether the tightening holds in environments where optimal coordination crosses the assumed group boundaries.

    Authors: Theorem 1 proves that the group-aligned block-diagonal prior leads to a strictly tighter variational bound than the standard GIB by reducing the feasible set of graphs, which decreases the upper bound on the relevant mutual information. This tightening is strict in the sense that the bound is lower (tighter) for any fixed partition. However, we agree that in cases where important coordination occurs across groups, pruning those edges introduces an approximation. Providing a general quantification of this error without environment-specific assumptions is not feasible within the current framework, as it would require bounding the information loss from excluded edges. In the revision, we will include a discussion of this limitation following the proof of Theorem 1, noting the trade-off and suggesting that the method is most suitable when groups can be chosen to capture the primary coordination patterns. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper applies the established graph information bottleneck (GIB) objective to MARL coordination graphs and introduces a group-aligned block-diagonal prior as a modeling choice that enables closed-form edge retention and per-block decomposition. The claimed tightening of the variational bound and objective decomposition follow directly from substituting the block-diagonal structure into the GIB variational form, which is a standard algebraic step rather than a redefinition of the inputs. Capacity allocation is identified with the water-filling solution from rate-distortion theory, an external mathematical result applied after the topology is fixed. No equation reduces a claimed prediction or proof to a fitted parameter or self-citation by construction, and the group partition is treated as an assumption compatible with the objective rather than derived from it. The central results therefore retain independent content beyond the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the applicability of the graph information bottleneck to coordination graphs and on the existence of meaningful agent groupings that admit a block-diagonal structure; no numerical free parameters or new entities are stated in the abstract.

axioms (1)
  • domain assumption Agents in the target MARL tasks admit a grouping that permits construction of a group-aligned block-diagonal prior providing a closed-form edge-retention criterion.
    This premise is invoked to justify the prior that determines which edges exist and at what density per group block.

pith-pipeline@v0.9.0 · 5733 in / 1370 out tokens · 91043 ms · 2026-05-20T13:02:44.457344+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    Learning multiagent communication with backpropagation,

    S. Sukhbaatar, R. Ferguset al., “Learning multiagent communication with backpropagation,” inAd- vances in Neural Information Processing Systems (NeurIPS), vol. 29, 2016

  2. [2]

    Learning attentional communication for multi-agent cooperation,

    J. Jiang and Z. Lu, “Learning attentional communication for multi-agent cooperation,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 31, 2018

  3. [4]

    A survey of multi-agent deep reinforcement learning with communication,

    C. Guo, H. Li, D. Qiu, H. Li, Y. Zhang, P. Liet al., “A survey of multi-agent deep reinforcement learning with communication,”Autonomous Agents and Multi-Agent Systems, vol. 38, no. 1, p. 13, 2024

  4. [5]

    Learning to communicate with deep multi-agent reinforcement learning,

    J. Foerster, I. A. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 29, 2016

  5. [6]

    Partially observable multi-agent rl with (quasi-)efficiency: The blessing of in- formation sharing,

    X. Liu and K. Bai, “Partially observable multi-agent rl with (quasi-)efficiency: The blessing of in- formation sharing,” inInternational Conference on Machine Learning (ICML). PMLR, 2023, pp. 22 106–22 130

  6. [7]

    Bayesian ego-graph inference for networked multi-agent reinforcement learning,

    W. Duan, J. Lu, and J. Xuan, “Bayesian ego-graph inference for networked multi-agent reinforcement learning,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems (NIPS 2025), 2025

  7. [8]

    Context-aware sparse deep coordination graphs,

    T. Yang, J. Wang, L. Wu, and C. Zhang, “Context-aware sparse deep coordination graphs,” inInter- national Conference on Learning Representations (ICLR), 2022

  8. [9]

    Self-organized polynomial-time co- ordination graphs,

    Q. Yang, W. Dong, Z. Ren, J. Wang, T. Wang, and C. Zhang, “Self-organized polynomial-time co- ordination graphs,” inInternational Conference on Machine Learning (ICML), vol. 162, 2022, pp. 24 963–24 979. 40

  9. [10]

    Group-aware coordination graph for multi-agent reinforcement learning,

    W. Duan, J. Lu, and J. Xuan, “Group-aware coordination graph for multi-agent reinforcement learning,” inProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2024, pp. 3926–3934

  10. [11]

    Networked agents in the dark: Team value learning under partial observability,

    G. S. Varela, A. Sardinha, and F. S. Melo, “Networked agents in the dark: Team value learning under partial observability,” inProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2025, Detroit, MI, USA, May 19-23, 2025. International Foundation for Autonomous Agents and Multiagent Systems / ACM, 2025, pp. 2087–2095

  11. [12]

    Deep implicit coordination graphs for multi-agent reinforcement learning,

    S. Li, J. K. Gupta, P. Morales, R. E. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” inProceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021, pp. 764–772

  12. [13]

    Multi-agent game abstraction via graph attention neural network,

    Y. Liu, W. Wang, Y. Hu, J. Hao, X. Chen, and Y. Gao, “Multi-agent game abstraction via graph attention neural network,” inAAAI Conference on Artificial Intelligence, 2020, pp. 7211–7218

  13. [14]

    Dynamic deep factor graph for multi-agent reinforcement learning,

    Y. Shi, S. Duan, C. Xu, R. Wang, F. Ye, and C. Yuen, “Dynamic deep factor graph for multi-agent reinforcement learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 3, pp. 3417–3431, 2026

  14. [15]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017

  15. [16]

    Learning from the dark: Boosting graph convolutional neural networks with diverse negative samples,

    W. Duan, J. Xuan, M. Qiao, and J. Lu, “Learning from the dark: Boosting graph convolutional neural networks with diverse negative samples,” inThirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022), Virtual Event. AAAI Press, 2022, pp. 6550–6558

  16. [17]

    Layer-diverse negative sampling for graph neural networks,

    W. Duan, J. Lu, Y. G. Wang, and J. Xuan, “Layer-diverse negative sampling for graph neural networks,” Transactions on Machine Learning Research, 2024

  17. [18]

    Graph convolutional neural networks with diverse negative samples via decomposed determinant point processes,

    W. Duan, J. Xuan, M. Qiao, and J. Lu, “Graph convolutional neural networks with diverse negative samples via decomposed determinant point processes,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 12, pp. 18 160–18 171, 2024

  18. [19]

    Mecon: A gnn-based graph classification framework for MEV activity detection,

    Z. Yao, F. Huang, Y. Li, W. Duan, P. Qian, N. Yang, and W. Susilo, “Mecon: A gnn-based graph classification framework for MEV activity detection,”Expert Syst. Appl., vol. 269, p. 126486, 2025

  19. [20]

    Graph convolutional reinforcement learning,

    J. Jiang, C. Dun, T. Huang, and Z. Lu, “Graph convolutional reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2020

  20. [21]

    Nervenet: Learning structured policy with graph neural networks,

    T. Wang, R. Liao, J. Ba, and S. Fidler, “Nervenet: Learning structured policy with graph neural networks,” inInternational Conference on Learning Representations (ICLR), 2018

  21. [22]

    Deep coordination graphs,

    W. B¨ ohmer, V. Kurin, and S. Whiteson, “Deep coordination graphs,” inInternational Conference on Machine Learning (ICML), vol. 119. PMLR, 2020, pp. 980–991

  22. [23]

    Tarmac: Targeted multi- agent communication,

    A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “Tarmac: Targeted multi- agent communication,” inInternational Conference on Machine Learning (ICML). PMLR, 2019, pp. 1538–1546

  23. [24]

    Multi-agent graph-attention communication and teaming,

    Y. Niu, R. R. Paleja, and M. C. Gombolay, “Multi-agent graph-attention communication and teaming,” inProceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021, pp. 964–973

  24. [25]

    Inferring latent temporal sparse coordination graph for multiagent reinforcement learning,

    W. Duan, J. Lu, and J. Xuan, “Inferring latent temporal sparse coordination graph for multiagent reinforcement learning,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 8, pp. 14 358–14 370, 2025. 41

  25. [26]

    Learning multi-agent communication from graph modeling perspective,

    S. He, H. Ni, J. Wang, L. Wu, and C. Zhang, “Learning multi-agent communication from graph modeling perspective,” inInternational Conference on Learning Representations (ICLR), 2024

  26. [27]

    HGAP: boosting permutation invariant and permutation equivariant in multi-agent reinforcement learning via graph attention network,

    B. Lin and C. Lee, “HGAP: boosting permutation invariant and permutation equivariant in multi-agent reinforcement learning via graph attention network,” inForty-first International Conference on Machine Learning, (ICML 2024), Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024

  27. [28]

    Towards generalizability of multi-agent reinforcement learning in graphs with recurrent message passing,

    J. Weil, Z. Bao, O. Abboud, and T. Meuser, “Towards generalizability of multi-agent reinforcement learning in graphs with recurrent message passing,” inProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, (AAMAS 2024), Auckland, New Zealand, May 6-10, 2024, 2024, pp. 1919–1927

  28. [29]

    Hierarchical cooperative multi-agent reinforcement learning with skill discovery,

    J. Yang, I. Borovikov, and H. Zha, “Hierarchical cooperative multi-agent reinforcement learning with skill discovery,” inProceedings of the 19th International Conference on Autonomous Agents and Multi- agent Systems (AAMAS), 2020, pp. 1566–1574

  29. [30]

    Exponential topology-enabled scalable communication in multi- agent reinforcement learning,

    X. Li, X. Wang, C. Bai, and J. Zhang, “Exponential topology-enabled scalable communication in multi- agent reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2025

  30. [31]

    Roma: Multi-agent reinforcement learning with emergent roles,

    T. Wang, H. Dong, V. Lesser, and C. Zhang, “Roma: Multi-agent reinforcement learning with emergent roles,” inInternational Conference on Machine Learning (ICML), 2020, pp. 9876–9886

  31. [32]

    VAST: Value function factorization with variable agent sub-teams,

    T. Phan, F. Ritz, L. Belzner, P. Altmann, T. Gabor, and C. Linnhoff-Popien, “VAST: Value function factorization with variable agent sub-teams,” inAdvances in Neural Information Processing Systems (NeurIPS), 2021, pp. 24 018–24 032

  32. [33]

    REFIL: Randomized extrapolation for multi-agent reinforcement learning,

    S. Iqbal, C. A. S. de Witt, B. Peng, W. B¨ ohmer, S. Whiteson, and F. Sha, “REFIL: Randomized extrapolation for multi-agent reinforcement learning,” inInternational Conference on Machine Learning (ICML), 2021, pp. 4555–4566

  33. [34]

    Self-organized group for cooperative multi-agent reinforcement learning,

    Y. Pan, J. Chen, B. Huang, D. Wang, and H. Deng, “Self-organized group for cooperative multi-agent reinforcement learning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022

  34. [35]

    Group-oriented multi-agent reinforcement learning,

    Z. Li, P. Liu, J. Chen, and C. Zhang, “Group-oriented multi-agent reinforcement learning,” inInterna- tional Conference on Machine Learning (ICML), 2023

  35. [36]

    Learning to schedule communication in multi-agent reinforcement learning,

    D. Kim, S. Moon, D. Hostallero, W. J. Kang, T. Lee, K. Son, and Y. Yi, “Learning to schedule communication in multi-agent reinforcement learning,” in7th International Conference on Learning Representations, ICLR 2019, 2019

  36. [37]

    Learning agent communication under limited band- width by message pruning,

    H. Mao, Z. Zhang, Z. Xiao, Z. Gong, and Y. Ni, “Learning agent communication under limited band- width by message pruning,” inThe Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Int...

  37. [38]

    Asynchronous cooperative multi-agent rein- forcement learning with limited communication,

    S. Dolan, S. Nayak, J. J. Aloor, and H. Balakrishnan, “Asynchronous cooperative multi-agent rein- forcement learning with limited communication,” inProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, (AAMAS 2025), Detroit, MI, USA, May 19-23, 2025. International Foundation for Autonomous Agents and Multiagent Syst...

  38. [39]

    Learning nearly decomposable value functions via com- munication minimization,

    T. Wang, J. Wang, C. Zheng, and C. Zhang, “Learning nearly decomposable value functions via com- munication minimization,” inInternational Conference on Learning Representations (ICLR), 2020

  39. [40]

    Multi-agent shared information aggregation for cooperative multi-agent rein- forcement learning,

    T. Wang and C. Zhang, “Multi-agent shared information aggregation for cooperative multi-agent rein- forcement learning,” inInternational Conference on Machine Learning (ICML), 2023

  40. [41]

    Robust multi-agent communication with graph information bottleneck optimization,

    S. Ding, W. Du, L. Ding, J. Zhang, L. Guo, and B. An, “Robust multi-agent communication with graph information bottleneck optimization,”IEEE Transactions on Pattern Analysis and Machine In- telligence, vol. 46, no. 5, pp. 3096–3107, 2024. 42

  41. [42]

    The information bottleneck method,

    N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,”Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, pp. 368–377, 1999

  42. [43]

    Graph information bottleneck,

    T. Wu, H. Ren, P. Li, and J. Leskovec, “Graph information bottleneck,”Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 20 437–20 448, 2020

  43. [44]

    Deep variational information bottleneck,

    A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in International Conference on Learning Representations (ICLR), 2017

  44. [45]

    Generalization in reinforcement learning with selective noise injection and information bottleneck,

    M. Igl, K. Ciosek, Y. Li, S. Tschiatschek, C. Zhang, S. Devlin, and K. Hofmann, “Generalization in reinforcement learning with selective noise injection and information bottleneck,” inAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Cana...

  45. [46]

    Bandwidth-constrained variational message encoding for multi-agent co- ordination,

    W. Duan, J. Lu, and J. Xuan, “Bandwidth-constrained variational message encoding for multi-agent co- ordination,” inProceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2026

  46. [47]

    Enhancing value decomposition with target transformation in cooperative multi-agent reinforcement learning,

    Z. Liu, L. Wan, S. Sun, X. Sui, X. Chen, X. Lan, and N. Zheng, “Enhancing value decomposition with target transformation in cooperative multi-agent reinforcement learning,”IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–17, 2026

  47. [48]

    Efficient exploration for multi-agent diversity with agent identity,

    T. Li and K. Zhu, “Efficient exploration for multi-agent diversity with agent identity,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 48, no. 5, pp. 5460–5473, 2026

  48. [49]

    Randomized exploration in cooperative multi-agent rein- forcement learning,

    H. Hsu, W. Wang, M. Pajic, and P. Xu, “Randomized exploration in cooperative multi-agent rein- forcement learning,” inAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, December 10 - 15, 2024

  49. [50]

    Trust region policy optimisa- tion in multi-agent reinforcement learning,

    J. G. Kuba, R. Chen, M. Wen, Y. Wen, F. Sun, J. Wang, and Y. Yang, “Trust region policy optimisa- tion in multi-agent reinforcement learning,” inInternational Conference on Learning Representations (ICLR), 2022

  50. [51]

    Resilient contrastive pre-training under non-stationary drift,

    X. Yang, J. Lu, E. Yu, and W. Duan, “Resilient contrastive pre-training under non-stationary drift,”

  51. [52]

    Available: https://arxiv.org/abs/2502.07620

    [Online]. Available: https://arxiv.org/abs/2502.07620

  52. [53]

    Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,

    X. Yang, J. Lu, and E. Yu, “Walking the tightrope: Autonomous disentangling beneficial and detrimental drifts in non-stationary custom-tuning,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id= 1BAiQmAFsx

  53. [54]

    Adapting multi-modal large language model to concept drift from pre-training onwards,

    ——, “Adapting multi-modal large language model to concept drift from pre-training onwards,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=b20VK2GnSs

  54. [55]

    Drift-aware collaborative assistance mixture of experts for heterogeneous multistream learning,

    E. Yu, J. Lu, K. Wang, X. Yang, and G. Zhang, “Drift-aware collaborative assistance mixture of experts for heterogeneous multistream learning,”arXiv preprint arXiv:2508.01598, 2025

  55. [56]

    F. A. Oliehoek and C. Amato,A Concise Introduction to Decentralized POMDPs, ser. Springer Briefs in Intelligent Systems. Springer, 2016

  56. [57]

    QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning,

    T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, “QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning,” inProceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholmsm¨ assan, Stockholm, Sweden, vol. 80, 2018, pp. 4292–4301

  57. [58]

    T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed. Wiley-Interscience, 2006. 43

  58. [59]

    The starcraft multi-agent challenge,

    M. Samvelyan, T. Rashid, C. Schr¨ oder de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” inProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2019, pp. 2186–2188

  59. [60]

    Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning,

    B. Ellis, J. Cook, S. Moalla, M. Samvelyan, M. Sun, A. Mahajan, J. N. Foerster, and S. Whiteson, “Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning,” inThe 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, December 10 - 16, 2023

  60. [61]

    Magent: A many-agent re- inforcement learning platform for artificial collective intelligence,

    L. Zheng, J. Yang, H. Cai, M. Zhou, W. Zhang, J. Wang, and Y. Yu, “Magent: A many-agent re- inforcement learning platform for artificial collective intelligence,” inAAAI Conference on Artificial Intelligence, 2018, pp. 8222–8223. 44