pith. sign in

arxiv: 2605.18024 · v1 · pith:6E5DKUXPnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI· cs.MA

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-20 12:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.MA
keywords multi-agent reinforcement learningadversarial robustnesscoordinationinteraction breakinginformation-theoretic attacksMARL
0
0 comments X

The pith

Multi-agent reinforcement learning agents can be trained to keep coordinating when their observations and actions face adversarial perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on the problem that coordination learned by multiple agents often breaks when external changes disrupt how they share information or act together. It proposes a framework that builds attacks by using information theory to alter what agents see and do, then trains the agents to complete tasks reliably even under those changes. This targets the interaction structure itself rather than only the values or rewards. Results show the trained agents handle a wider range of disruptions better than earlier robust methods and continue to work when some agents disappear entirely.

Core claim

The central claim is that an interaction-breaking adversarial learning framework, built on an information-theoretic view of attacks, can generate perturbations to agents' observations and actions that specifically impede coordination, and that training agents against these perturbations produces policies that remain effective when real disruptions occur.

What carries the argument

The interaction-breaking adversarial learning (IBAL) framework that constructs attacks by perturbing agents' observations and actions to reduce shared information.

If this is right

  • The approach yields higher robustness than prior robust multi-agent reinforcement learning methods across varied attack types.
  • Performance remains stronger in settings where some agents are missing.
  • Robustness extends to corruption of interaction structures, not only to value-based attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same perturbation idea could be tested in single-agent tasks where environment changes mimic loss of useful signals.
  • Different measures of information, such as mutual information variants, might produce stronger or weaker attacks worth comparing directly.
  • Deployment in systems like vehicle fleets or robot teams would reveal whether the learned robustness transfers beyond simulated attacks.

Load-bearing premise

Perturbations chosen by information-theoretic measures on observations and actions serve as a good model for the interaction disruptions that actually occur in real multi-agent environments.

What would settle it

Measure whether agents trained under the proposed framework complete cooperative tasks at higher rates than baselines when placed in a physical testbed that introduces real sensor noise or intermittent communication loss.

Figures

Figures reproduced from arXiv: 2605.18024 by Mingu Kang, Seungyul Han, Sunwoo Lee, Yonghyeon Jo.

Figure 1
Figure 1. Figure 1: Illustration of the proposed interaction-breaking attack in StarCraft II: (a) normal, (b) observation attack, (c) action attack. how agents influence one another, and therefore may fail to capture attacks that intentionally break inter-agent relation￾ships, under which coordination can collapse abruptly. To study and mitigate this vulnerability, we propose an interaction-breaking attack that explicitly tar… view at source ↗
Figure 2
Figure 2. Figure 2: Dimension-wise MI for the G1 agent in the StarCraft II scenario. MI values are normalized to [0, 1]. Here, |G1| = 1 and |G2| = 7, and we set L = 5 × |G2| to match the number of G1 observation dimensions used to observe G2 agents. Action attacker. We design an action attacker that directly minimizes the action-level MI. Given the perturbed observa￾tions o˜t, the ego joint policy first samples an intermediat… view at source ↗
Figure 5
Figure 5. Figure 5: Average test win rate under various adversarial attacks. 5. Experiments 5.1. Experimental Setup In this section, we compare the robustness of the proposed IBAL and prior robust MARL methods on the StarCraft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), which requires cooperative decision-making among ally units to defeat enemy units. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average test win rate under various non-parametric perturbations. 5.2. Performance Comparison We report the mean test win rate over 5 seeds under adver￾sarial attacks and non-parametric perturbations in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trajectory analysis of the interaction-breaking attack and IBAL policies in 8m and MMM tasks [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Component evaluation. (a) 2s3z (b) 8m [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Analysis on the maximum group size K. wise evaluations and analyze sensitivity to the maximum group size K. In Appendix F.3, we provide additional anal￾yses on the masking budget L, the minimum attack proba￾bility, and computational complexity. Component Evaluation. To quantify each component’s contribution, we evaluate ablations on 8m under Dis-1. We compare IBAL (Ours) with four variants: IBAL w/o adapti… view at source ↗
Figure 10
Figure 10. Figure 10: Reconstruction models for MI estimation: (a) observation reconstruction model and (b) action reconstruction model. The proposed joint interaction-breaking attackers f IB adv and π IB adv requires estimating the associated MI terms. In particular, Eq. (1) decomposes the total interaction into the action-level MI term and the observation-level MI term, both of which must be estimated to construct the attack… view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of SMAC scenarios. SMAC is a standard testbed for cooperative multi-agent reinforcement learning, designed around decentralized micro￾management in StarCraft II. In SMAC, each unit is controlled by an individual agent that makes decisions from its own partial, local observations without access to global state at execution time. The benchmark provides diverse combat settings, enabling systema… view at source ↗
Figure 12
Figure 12. Figure 12: Performance comparison under unseen interaction-breaking attack. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: plots the per-timestep values of the group redundancy, the group-wise MI, and the individual MI. We observe that (a) the group redundancy stays close to zero and its magnitude is substantially smaller than the MI values in (b) and (c), suggesting that it has negligible impact on dimension-wise MI selection. Overall, these results empirically support the validity of decomposing the observation-level MI ter… view at source ↗
Figure 14
Figure 14. Figure 14: Trajectory analysis of IBAL under Dis-1 setting on 8m and MMM scenarios. In the main paper, we analyzed how agents respond to the proposed interaction-breaking attack that suppresses cross-group coordination. Here, we further examine IBAL under non-parametric perturbations, focusing on Dis-1 in 8m and MMM. Under Dis-1, one agent becomes disabled and remains stationary, contributing no further actions to c… view at source ↗
Figure 15
Figure 15. Figure 15: Ablation study for the masking budget L. Minimum Attack Probability P min act . We ablate the minimum attack probability P min act in our scheduling scheme, where Pact ∼ Unif P min act , P max act  is sampled during training. In this ablation, we use K = 1 for 2s3z and K = 4 for 8m; thus, we sweep P min act up to the maximum probability for each environment. As shown in [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 16
Figure 16. Figure 16: Ablation study for the minimum attack probability P min act . 26 [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗
read the original abstract

Cooperation is central to multi-agent reinforcement learning (MARL), yet learned coordination can be fragile when external perturbations disrupt inter-agent interactions. Prior robust MARL methods have primarily considered value-oriented attacks, leaving a gap in robustness when interaction structures themselves are corrupted. In this paper, we propose an interaction-breaking adversarial learning (IBAL) framework that takes an information-theoretic view to construct attacks that impede coordination by perturbing agents' observations and actions, and trains agents to perform reliably under such disruptions. Empirically, our approach improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even under agent-missing scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes an Interaction-Breaking Adversarial Learning (IBAL) framework for robust multi-agent reinforcement learning. It adopts an information-theoretic perspective to generate attacks that impede inter-agent coordination by perturbing agents' observations and actions, then trains policies to remain effective under these disruptions. The central empirical claim is that IBAL improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even in agent-missing scenarios.

Significance. If the empirical results hold and the information-theoretic perturbations prove representative of real coordination-breaking disruptions, the work would address a genuine gap in robust MARL by shifting focus from value-oriented attacks to interaction-structure corruption. This could be useful for applications such as multi-robot coordination where learned policies must tolerate partial observability or communication failures.

major comments (1)
  1. [Experiments] Experiments section: the manuscript reports robustness gains over baselines under its own attack family and agent-missing scenarios, yet contains no ablation that replaces the information-theoretic attack generator with an alternative disruption model (e.g., direct reward hacking or dynamics perturbation) while keeping the training procedure otherwise identical. Without this comparison, the reported improvements could be explained by the specific attack distribution rather than by a general interaction-breaking principle, which is load-bearing for the headline claim.
minor comments (1)
  1. [Abstract] Abstract: the claim of improvement 'across diverse attack settings' is stated without enumerating the settings or metrics, making it difficult for readers to gauge the breadth of the evaluation from the outset.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the manuscript reports robustness gains over baselines under its own attack family and agent-missing scenarios, yet contains no ablation that replaces the information-theoretic attack generator with an alternative disruption model (e.g., direct reward hacking or dynamics perturbation) while keeping the training procedure otherwise identical. Without this comparison, the reported improvements could be explained by the specific attack distribution rather than by a general interaction-breaking principle, which is load-bearing for the headline claim.

    Authors: We appreciate the referee's observation. The information-theoretic attack generator is a defining component of the IBAL framework because it explicitly targets reductions in mutual information to disrupt coordination, which is distinct from value-oriented attacks studied in prior work. Our experiments already evaluate robustness under the proposed attack family as well as agent-missing scenarios, the latter of which constitutes an alternative form of interaction disruption. Nevertheless, we agree that an ablation replacing the attack generator with alternatives such as direct reward hacking or dynamics perturbation (while holding the remainder of the training procedure fixed) would help isolate the contribution of the interaction-breaking principle. We will add this comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and robustness claims are empirically grounded without self-referential reductions

full rationale

The paper introduces the IBAL framework as a novel information-theoretic method for constructing adversarial perturbations to observations and actions that impede inter-agent coordination, then demonstrates empirical robustness gains over baselines in multiple attack settings and agent-missing scenarios. No equations, fitted parameters, or self-citations are shown in the abstract or described structure that reduce the claimed improvements to a definition, renaming, or input by construction. The central premise relies on external empirical validation against existing robust MARL methods rather than internal loops or uniqueness theorems imported from prior author work. This qualifies as a self-contained proposal with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5643 in / 1124 out tokens · 49647 ms · 2026-05-20T12:19:20.107478+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 4 internal anchors

  1. [1]

    Explaining and Harnessing Adversarial Examples

    Goodfellow, I. J., Shlens, J., and Szegedy, C. Explain- ing and harnessing adversarial examples.arXiv preprint arXiv:1412.6572,

  2. [2]

    What is the solution for state-adversarial multi-agent re- inforcement learning?arXiv preprint arXiv:2212.02705,

    Han, S., Su, S., He, S., Han, S., Yang, H., Zou, S., and Miao, F. What is the solution for state-adversarial multi-agent re- inforcement learning?arXiv preprint arXiv:2212.02705,

  3. [3]

    Robust multi-agent reinforcement learning with state uncertainty

    He, S., Han, S., Su, S., Han, S., Zou, S., and Miao, F. Robust multi-agent reinforcement learning with state uncertainty. arXiv preprint arXiv:2307.16212,

  4. [4]

    Wolfpack adversar- ial attack for robust multi-agent reinforcement learning

    Lee, S., Hwang, J., Jo, Y ., and Han, S. Wolfpack adversar- ial attack for robust multi-agent reinforcement learning. arXiv preprint arXiv:2502.02844,

  5. [5]

    E., Tao, W., Wang, Z., et al

    Li, P., Tang, H., Yang, T., Hao, X., Sang, T., Zheng, Y ., Hao, J., Taylor, M. E., Tao, W., Wang, Z., et al. Pmic: Improving multi-agent reinforcement learning with pro- gressive mutual information collaboration.arXiv preprint arXiv:2203.08553,

  6. [6]

    Byzantine robust cooperative multi- agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023a

    Li, S., Guo, J., Xiu, J., Xu, R., Yu, X., Wang, J., Liu, A., Yang, Y ., and Liu, X. Byzantine robust cooperative multi- agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023a. Li, S., Xu, R., Guo, J., Feng, P., Wang, J., Liu, A., Yang, Y ., Liu, X., and Lv, W. Mir2: Towards provably robust multi-agent reinforcement learning...

  7. [7]

    Robust deep reinforcement learning with adaptive adversarial perturbations in action space.arXiv preprint arXiv:2405.11982,

    Liu, Q., Kuang, Y ., and Wang, J. Robust deep reinforcement learning with adaptive adversarial perturbations in action space.arXiv preprint arXiv:2405.11982,

  8. [8]

    Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

    Park, Y ., Lee, S., and Han, S. Center of gravity-guided focus- ing influence mechanism for multi-agent reinforcement learning.arXiv preprint arXiv:2506.19417,

  9. [9]

    Robust Deep Reinforcement Learning with Adversarial Attacks

    Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., and Chowdhary, G. Robust deep reinforcement learning with adversarial attacks.arXiv preprint arXiv:1712.03632,

  10. [10]

    Reward poisoning in reinforcement learning: Attacks against un- known learners in unknown environments.arXiv preprint arXiv:2102.08492,

    Rakhsha, A., Zhang, X., Zhu, X., and Singla, A. Reward poisoning in reinforcement learning: Attacks against un- known learners in unknown environments.arXiv preprint arXiv:2102.08492,

  11. [11]

    The StarCraft Multi-Agent Challenge,

    Rashid, T., Farquhar, G., Peng, B., and Whiteson, S. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33: 10199–10210, 2020a. Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G., Foerster, J., and Whiteson, S. Monotonic value function factori...

  12. [12]

    and Shenoy, P

    Singha, S. and Shenoy, P. P. An adaptive heuristic for feature selection based on complementarity.Machine Learning, 107(12):2027–2071,

  13. [13]

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zam- baldi, V ., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., et al. Value-decomposition networks for cooperative multi-agent learning.arXiv preprint arXiv:1706.05296,

  14. [14]

    Reward poisoning attack against offline reinforcement learning.arXiv preprint arXiv:2402.09695,

    Xu, Y ., Gumaste, R., and Singh, G. Reward poisoning attack against offline reinforcement learning.arXiv preprint arXiv:2402.09695,

  15. [15]

    Xue, W., Qiu, W., An, B., Rabinovich, Z., Obraztsova, S., and Yeo, C. K. Mis-spoke or mis-lead: Achieving ro- bustness in multi-agent communicative reinforcement learning.arXiv preprint arXiv:2108.03803,

  16. [16]

    Robust deep reinforcement learning against adversarial perturbations on state observations

    Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D., and Hsieh, C.-J. Robust deep reinforcement learning against adversarial perturbations on state observations. Advances in Neural Information Processing Systems, 33: 21024–21037, 2020a. Zhang, H., Chen, H., Boning, D., and Hsieh, C.-J. Robust reinforcement learning on state observations with learne...

  17. [17]

    Implementation Details We provide additional implementation details for our attacks and training pipeline

    13 Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning B. Implementation Details We provide additional implementation details for our attacks and training pipeline. Section B.1 describes the MI estimation required for MI-based observation and action attacks. Section B.2 presents the MARL implementation used to...

  18. [18]

    Following CLUB, we upper-bound I oi t+1;a j t |a i t,τ t ≤ I CLUB oi t+1;a j t |a i t,τ t

    upper bound and use its sample-based estimate as a surrogate for the observation-level MI. Following CLUB, we upper-bound I oi t+1;a j t |a i t,τ t ≤ I CLUB oi t+1;a j t |a i t,τ t . We estimate ICLUB from samples. Let D+ be a positive buffer containing aligned tuples {(oi t+1, ai t, aj t ,τ t)} drawn from the same transition. We construct negative pairs ...

  19. [19]

    Map Ally Units Enemy Units State Dimension Obs

    16 Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning Table 1.Configuration for each SMAC scenario. Map Ally Units Enemy Units State Dimension Obs. Dimension Num. of Actions 3m 3 Marines 3 Marines 48 30 9 3s vs 3z 3 Stalkers 3 Zealots 54 36 9 2s3z 2 Stalkers, 3 Zealots 2 Stalkers, 3 Zealots 120 80 11 8m 8 Mar...

  20. [20]

    These values provided the most stable training behavior in our experiments

    Across all SMAC scenarios, we fix two parameters in the adaptive attack probability schedule: the growth rate α= 1.1 , which controls how aggressively the attack probability is increased, and the success-rate thresholdη= 0.8 , which determines when the probability is updated. These values provided the most stable training behavior in our experiments. Task...