pith. sign in

arxiv: 2605.23027 · v1 · pith:SGTMXY5Jnew · submitted 2026-05-21 · 💻 cs.RO

PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

Pith reviewed 2026-05-25 05:24 UTC · model grok-4.3

classification 💻 cs.RO
keywords multi-robot reinforcement learningadversarial manipulationsocial dilemmasreward manipulationpolicy manipulationadaptive controllerembedded robotics
0
0 comments X

The pith

PIMbot lets one robot manipulate multi-robot RL social dilemmas by altering rewards and its own policy through an adaptive controller.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a framework called PIMbot can shift outcomes in multi-robot reinforcement learning environments involving social dilemmas. It achieves this by combining manipulation of the shared reward signal with changes to the manipulating robot's own actions, using an online controller to balance the two. A sympathetic reader would care because cooperative robot systems often depend on reward structures to encourage collective behavior, and this approach exposes how one agent might disrupt that balance. The work validates the method through experiments in a Gazebo simulation and on physical NVIDIA Jetson hardware. If correct, the result indicates that multi-robot RL systems built around unique reward functions carry inherent manipulation risks.

Core claim

PIMbot manipulates multi-robot RL social dilemmas through two levers—incentive manipulation of the reward channel and policy manipulation of the agent's own actions—balanced by an adaptive multi-objective controller that operates online, enabling a robot to effectively alter the environment's outcomes as shown in both simulated and embedded-device settings.

What carries the argument

PIMbot's dual-lever adaptive controller that balances reward-channel incentive changes with self-policy adjustments in real time.

If this is right

  • A manipulating robot can shift social dilemma outcomes toward self-interest rather than collective benefit.
  • The method produces measurable effects in Gazebo-simulated multi-robot setups.
  • The approach runs on real embedded hardware such as the NVIDIA Jetson Orin Nano while quantifying system costs.
  • PIMbot functions as a stress-test tool that reveals vulnerabilities in multi-robot cooperative tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of multi-robot RL systems may need mechanisms to detect or isolate reward-channel tampering.
  • The dual-lever idea could apply to other multi-agent settings where one participant has partial control over shared signals.
  • Robustness testing against adaptive adversaries becomes necessary once reward functions are treated as attack surfaces.

Load-bearing premise

The multi-robot environment relies on a unique reward function that can be directly changed through the reward channel without other agents detecting or adapting to the alteration.

What would settle it

If other agents detect the altered rewards and modify their policies to restore prior cooperation levels despite the manipulation, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.23027 by Cong Liu, Hyoseung Kim, Zexin Li, ZiLiang Zhang.

Figure 1
Figure 1. Figure 1: A red agent manipulates both the incentive channel (Eq. 3) and its policy. By injecting [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of the “Bypass Policy” method in ER. Cooperators send negative incentives to [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Fake incentives destroy cooperation, producing early but unstable successes during explo [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Partial communication shortens convergence in ER. Bottom panels show the adversary’s [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reverse Policy: minimizing the adversary’s own reward depresses incentives and shrinks [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Agent success rates across scenarios (ER and IPD) for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-agent rewards across scenarios (ER and IPD) for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average steps per episode across ER configurations for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Agent success rates across Stag Hunt configurations ( [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Agent success rates across Escape Room configurations under the SOTA Reciprocators [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Agent success rates across IPD configurations under the SOTA Reciprocators Baseline [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Agent success rates across Stag Hunt configurations under the SOTA Reciprocators [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Robotic Simulation of PIMbot in Gazebo sumulator. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: System profiling traces for IPD and ER benchmarks under the LIO framework (CPU [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: System profiling traces for IPD and ER benchmarks with truncated long zero-GPU uti [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
read the original abstract

Recent research has demonstrated the potential of reinforcement learning in effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interest and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents PIMbot, a framework that manipulates outcomes via two complementary levers: (i) incentive manipulation of the reward channel and (ii) policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers in an online manner. Our work introduces a novel approach to manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. Comprehensive experimental results demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Moreover, a real embedded device case study on NVIDIA Jetson Orin Nano quantifies system cost and validates PIMbot's effectiveness on realistic autonomous embedded systems scenarios beyond simulation. Together, these results position PIMbot as a rigorous stress-test tool exposing critical vulnerabilities in multi-robot cooperative tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents PIMbot, a framework for adversarial manipulation of multi-robot RL in social dilemmas. It manipulates outcomes via two levers—incentive manipulation of the reward channel and policy manipulation of an agent's actions—balanced by an adaptive multi-objective controller. The work claims effectiveness in a Gazebo-simulated multi-robot environment and validates it on an NVIDIA Jetson Orin Nano embedded device, positioning PIMbot as a stress-test tool for vulnerabilities in cooperative multi-robot tasks.

Significance. If the results hold with proper validation of the core assumptions, the work would be significant for exposing practical attack surfaces in multi-agent RL systems used for robot collaboration. The combination of simulation and real embedded hardware experiments, along with the self-adaptive controller, could provide a useful benchmark for robustness testing in social dilemma scenarios.

major comments (2)
  1. [Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.
  2. [Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.
minor comments (2)
  1. [Method] Notation for the multi-objective controller and the two levers should be defined explicitly with equations rather than prose descriptions.
  2. [Experimental setup] The Gazebo environment description should include the exact reward functions and state observations available to non-attacker agents to allow reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened. We address each major comment below and commit to revisions that provide the requested details and clarifications without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.

    Authors: We agree that the manuscript does not explicitly model or test the stealth of reward-channel manipulation against detection or adaptation by other agents. The current presentation focuses on the manipulation framework and its outcomes under the stated environmental assumptions. In revision, we will expand the abstract and add a dedicated subsection on assumptions, including discussion of potential observation models for other agents and preliminary analysis of policy shifts. We will also incorporate new experiments that monitor reward signals and quantify robustness where feasible. revision: yes

  2. Referee: [Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.

    Authors: We acknowledge that the experimental results section presents high-level claims without the requested quantitative details. The manuscript reports effectiveness in Gazebo and on Jetson hardware but does not include the adaptive controller equations, specific metrics with error bars, or ablations. In the revised version, we will add the controller equations, report success rates, reward deltas, and other metrics with error bars from repeated trials, and include ablation studies isolating the reward and policy levers to allow verification of the results. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations; claims rest on experiments, not self-referential math

full rationale

The provided abstract and description contain no equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations. The framework is described at a high level (two manipulation levers plus adaptive controller) and evaluated via Gazebo simulation plus Jetson hardware runs. No step reduces by construction to its own inputs or prior author work; the central claim of effective manipulation is presented as an empirical outcome rather than a mathematical derivation. This is the normal case of a paper whose contribution is algorithmic and experimental rather than deductive.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be identified. The central claim rests on an unstated assumption that the reward channel is directly manipulable and that the adaptive controller can balance the two levers without additional constraints.

pith-pipeline@v0.9.0 · 5752 in / 1059 out tokens · 15188 ms · 2026-05-25T05:24:23.689898+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

  1. [1]

    Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

    Aakriti Agrawal, Senthil Hariharan Arul, Amrit Singh Bedi, and Dinesh Manocha. DC- MRTA: decentralized multi-robot task allocation and navigation in complex environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022, pages 11711–11718. IEEE, 2022. doi: 10.1109/IROS47612.2022. 9981353. ...

  2. [2]

    Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach

    Sajjad Ahangar, Mehdi Valizadeh Mehrabani, Alireza Pouransari Shorijeh, and Mehdi Tale Masouleh. Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach. In2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), pages 413–418. IEEE, 2019

  3. [3]

    Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

    Amine Andam, Jamal Bentahar, and Mustapha Hedabou. Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

  4. [4]

    ASHRAE, 4th edition, 2015

    ASHRAE TC 9.9.Thermal Guidelines for Data Processing Environments. ASHRAE, 4th edition, 2015

  5. [5]

    Multi-robot task planning under individual and collaborative temporal logic specifications

    Ruofei Bai, Ronghao Zheng, Meiqin Liu, and Senlin Zhang. Multi-robot task planning under individual and collaborative temporal logic specifications. InIEEE/RSJ International Con- ference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021, pages 6382–6389. IEEE, 2021. doi: 10.1109/IROS51168.2021.9636287. URL h...

  6. [6]

    The case for energy-proportional computing.Computer, 40(12):33–37, 2007

    Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing.Computer, 40(12):33–37, 2007

  7. [7]

    A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008

    Lucian Buşoniu, Robert Babuška, and Bart De Schutter. A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008. doi: 10.1109/TSMCC.2007.913919. 25

  8. [8]

    Multi-agent reinforcement learning: An overview

    Lucian Buşoniu, Robert Babuška, and Bart De Schutter. Multi-agent reinforcement learning: An overview. In Dipti Srinivasan and Lakhmi C. Jain, editors,Innovations in Multi-Agent Systems and Applications–1, volume 310 ofStudies in Computational Intelligence, pages 183–

  9. [9]

    doi: 10.1007/978-3-642-14435-6_7

    Springer, Berlin, Heidelberg, 2010. doi: 10.1007/978-3-642-14435-6_7

  10. [10]

    Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021

    Lorenzo Canese, Gian Carlo Cardarilli, Luigi Di Nunzio, Roberto Fazzolari, Daniele Giardino, Marco Re, and Stefania Spanò. Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021. doi: 10.3390/app11114948

  11. [11]

    Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle

    Samuel J Carlson, Tugrul Karakurt, Pallavi Arora, and Christos Papachristos. Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle. InIEEE International Conference on Unmanned Aircraft Systems (ICUAS), pages 237–244, 2022

  12. [12]

    Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination

    Simin Chen, Pranav Pusarla, and Baishakhi Ray. Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination. InForty-second International Conference on Machine Learning

  13. [13]

    Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems

    SiminChen, CongLiu, MirazulHaque, ZiheSong, andWeiYang. Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1148–1160, 2022

  14. [14]

    Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models

    Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15365–15374, 2022

  15. [15]

    The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection

    Simin Chen, Hanlin Chen, Mirazul Haque, Cong Liu, and Wei Yang. The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24585–24594, 2023

  16. [16]

    Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

    Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, and Baishakhi Ray. Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

  17. [17]

    Dy- namic transformers provide a false sense of efficiency

    Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby Tan, and Haizhou Li. Dy- namic transformers provide a false sense of efficiency. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7164–7180, Toronto, Canada, July

  18. [18]

    doi: 10.18653/v1/2023.acl-long.395

    Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.395. URL https://aclanthology.org/2023.acl-long.395/

  19. [19]

    Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models

    Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D’Haro, Robby Tan, and Haizhou Li. Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 1359–1375, Bangkok, Thai...

  20. [20]

    URLhttps://aclanthology.org/2024.findings-acl.80/. 26

  21. [21]

    Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

    Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 122–130, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems

  22. [22]

    Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

    Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 122–130, 2018

  23. [23]

    Foerster, Yannis M

    Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

  24. [24]

    Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack

    Yu Fu, Yufei Li, Wen Xiao, Cong Liu, and Yue Dong. Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8483–8502, 2024

  25. [25]

    Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024

    Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, and Haizhou Li. Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024. doi: 10.1109/LSP. 2024.3443711

  26. [26]

    Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, and Nancy F. Chen. Ttslow: Slow down text-to-speech with efficiency robustness evaluations.IEEE Transactions on Audio, Speech and Language Processing, 33:693–704, 2025. doi: 10.1109/TASLPRO.2025.3533357

  27. [27]

    Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

    Yuman Gao, Yingjian Wang, Xingguang Zhong, Tiankai Yang, Mingyang Wang, Zhixiong Xu, Yongchao Wang, Yi Lin, Chao Xu, and Fei Gao. Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022...

  28. [28]

    Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

    Sahar Ghoflsaz Ghinani, Jingyao Zhang, and Elaheh Sadredini. Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

  29. [29]

    Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

    Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

  30. [30]

    Backdoor detection and mitigation in competitive rein- forcement learning, 2023

    Junfeng Guo, Ang Li, and Cong Liu. Backdoor detection and mitigation in competitive rein- forcement learning, 2023

  31. [31]

    What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

    Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, and Fei Miao. What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

  32. [32]

    A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022

    Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, and Fei Miao. A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022. 27

  33. [33]

    Traversing supervisor problem: An approximately optimal approach to multi-robot assistance

    Tianchen Ji, Roy Dong, and Katherine Driggs-Campbell. Traversing supervisor problem: An approximately optimal approach to multi-robot assistance. InProceedings of Robotics: Science and Systems (RSS), 2022

  34. [34]

    Pccl: Energy-efficient llm training with power- aware collective communication

    Ziyang Jia, Laxmi N Bhuyan, and Daniel Wong. Pccl: Energy-efficient llm training with power- aware collective communication. In2024 IEEE 42nd International Conference on Computer Design (ICCD), pages 84–91. IEEE, 2024

  35. [35]

    Fine-grained warm water cooling for improving datacenter economy

    Weixiang Jiang, Ziyang Jia, Sirui Feng, Fangming Liu, and Hai Jin. Fine-grained warm water cooling for improving datacenter economy. InProceedings of the 46th International Symposium on Computer Architecture, pages 474–486, 2019

  36. [36]

    Autonomous teamed exploration of sub- terranean environments using legged and aerial robots

    Maitreyi Kulkarni, Mihir Dharmadhikari, Marco Tranzatto, Samuel Zimmermann, Vincent Reijgwart, Pietro De Petris, Huy Nguyen, Ninad Khedekar, Christos Papachristos, Lionel Ott, Roland Siegwart, Marco Hutter, and Kostas Alexis. Autonomous teamed exploration of sub- terranean environments using legged and aerial robots. InIEEE International Conference on Rob...

  37. [37]

    Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

    Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Bo An, et al. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

  38. [38]

    White-box multi-objective adversarial attack on dialogue generation

    Yufei Li, Zexin Li, Yingfan Gao, and Cong Liu. White-box multi-objective adversarial attack on dialogue generation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1792, 2023

  39. [39]

    Rt-lm: Uncertainty-aware resource management for real-time inference of language models

    Yufei Li, Zexin Li, Wei Yang, and Cong Liu. Rt-lm: Uncertainty-aware resource management for real-time inference of language models. In2023 IEEE Real-Time Systems Symposium (RTSS), pages 158–171. IEEE, 2023

  40. [40]

    Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

    Yufei Li, Yu Fu, Yue Dong, and Cong Liu. Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

  41. [41]

    Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

    Yufei Li, Zexin Li, Yinglun Zhu, and Cong Liu. Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

  42. [42]

    Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

    Yufei Li, John Nham, Ganesh Jawahar, Lei Shu, David Uthus, Yun-Hsuan Sung, Chengrun Yang, Itai Rolnick, Yi Qiao, and Cong Liu. Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

  43. [43]

    A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation

    Xiao Lin, Hongjie Chen, Changhua Pei, Fei Sun, Xuanji Xiao, Hanxiao Sun, Yongfeng Zhang, Wenwu Ou, and Peng Jiang. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In Toine Bogers, Alan Said, Peter Brusilovsky, and Domonkos Tikk, editors,Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 201...

  44. [44]

    Efficient adversarial attacks on online multi-agent reinforcement learning

    Guanlin Liu and Lifeng LAI. Efficient adversarial attacks on online multi-agent reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 24401–24433. Curran Associates, Inc., 2023. URLhttps://proceedings.neurips.cc/paper_files/paper/2023/ fi...

  45. [45]

    Maven: Multi-agent variational exploration

    Anuj Mahajan, Mikayel Samvelyan, Christian Schroeder de Witt, Bohdan Sun, Tabish Rashid, Shimon Whiteson, and Jakob Foerster. Maven: Multi-agent variational exploration. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

  46. [46]

    A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

    Bijan Mehralizadeh, Bahar Baradaran, Shahab Nikkhoo, Pegah Soleiman, and Hadi Moradi. A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

  47. [47]

    doi: 10.3390/su15107790

    ISSN 2071-1050. doi: 10.3390/su15107790. URLhttps://www.mdpi.com/2071-1050/ 15/10/7790

  48. [48]

    Ng, Daishi Harada, and Stuart J

    Andrew Y. Ng, Daishi Harada, and Stuart J. Russell. Policy invariance under reward transfor- mations: Theory and application to reward shaping. InProceedings of the 16th International Conference on Machine Learning (ICML), pages 278–287, 1999

  49. [49]

    Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas

    Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, and Cong Liu. Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5630–

  50. [50]

    Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023

    NVIDIA Corporation. Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023. Accessed: 2025-09-29

  51. [51]

    Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

    Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V. Albrecht. Dealing with non-stationarity in multi-agent deep reinforcement learning.arXiv preprint arXiv:1906.04737, 2019. URLhttps://arxiv.org/abs/1906.04737

  52. [52]

    Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters

    Qiangyu Pei, Shutong Chen, Qixia Zhang, Xinhui Zhu, Fangming Liu, Ziyang Jia, Yishuo Wang, and Yongjie Yuan. Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 814–829, 2022

  53. [53]

    Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning

    Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 4295–4304, 2018

  54. [54]

    Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019. URL https://arxiv.org/abs/1902.04043

  55. [55]

    Cambridge University Press, Cambridge, UK, 2009

    Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK, 2009. ISBN 978- 0521899437

  56. [56]

    Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

    Philipp Dominic Siedler. Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

  57. [57]

    Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

    Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks. InInternational Conference on Learning Representations (ICLR), 2019. URLhttps://arxiv.org/abs/1812.09755. 29

  58. [58]

    Hostallero, and Yung Yi

    Kyunghwan Son, Daewoo Kim, Wan Ju Kang, Debbie G. Hostallero, and Yung Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 5887–5896, 2019

  59. [59]

    Learning to cooperate in a social dilemma: A satisficing approach to bargaining

    Jeff L Stimpson and Michael A Goodrich. Learning to cooperate in a social dilemma: A satisficing approach to bargaining. InICML, pages 728–735. Citeseer, 2003

  60. [60]

    Learning multiagent communication with backpropagation

    Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam, and Jason Weston. Learning multiagent communication with backpropagation. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

  61. [61]

    Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z

    Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech M. Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2...

  62. [62]

    Banerjee

    Veniamin Tereshchuk, John Stewart, Nikolay Bykov, Samuel Pedigo, Santosh Devasia, and Ashis G. Banerjee. An efficient scheduling algorithm for multi-robot task allocation in assembling aircraft structures.IEEE Robotics Autom. Lett., 4(4):3844–3851, 2019. doi: 10.1109/LRA.2019.2929983. URLhttps://doi.org/10.1109/LRA.2019.2929983

  63. [63]

    Justin K. Terry, Benjamin Black, Mario Jayakumar, Akshara Hari, Chace Sullivan, Ritchie Lee Santos, Clayton Dieffenderfer, Colin Horsch, Keon Perez, Akilesh Ravi, Alexander Williams, Yashas Lokesh, Morgan Dickens, Lilian Weng, Andreas Kallinteris, Shumeet Baluja, Woj- ciech M. Czarnecki, and Marc Lanctot. Pettingzoo: Gym for multi-agent reinforcement lear...

  64. [64]

    Pue: A comprehensive examination of the metric

    The Green Grid. Pue: A comprehensive examination of the metric. Technical report, The Green Grid, 2012. URLhttps://www.thegreengrid.org. White paper TGG-2012

  65. [65]

    Adversarial attacks on multi-agent communication

    James Tu, Tsunhsuan Wang, Jingkang Wang, Sivabalan Manivasagam, Mengye Ren, and Raquel Urtasun. Adversarial attacks on multi-agent communication. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7768–7777, 2021

  66. [66]

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    Nikos Vlassis.A Concise Introduction to Multiagent Systems and Distributed Artificial Intelli- gence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, San Rafael, CA, 2007. doi: 10.2200/S00090ED1V01Y200705AIM002

  67. [67]

    Qplex: Duplex du- eling multi-agent q-learning

    Jiechuan Wang, Zhanghen Ren, Wenbo Liu, Yong Yu, and Weinan Zhang. Qplex: Duplex du- eling multi-agent q-learning. InInternational Conference on Learning Representations (ICLR),

  68. [68]

    URLhttps://openreview.net/forum?id=Rcmk0xxIQV

  69. [69]

    MIT Press, Cambridge, MA, 1999

    Gerhard Weiss, editor.Multiagent Systems: A Modern Approach to Distributed Artificial In- telligence. MIT Press, Cambridge, MA, 1999

  70. [70]

    Learning to incentivize other learning agents

    Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. 30

  71. [71]

    Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

    Jiachen Yang, Ethan Wang, Rakshit Trivedi, Tuo Zhao, and Hongyuan Zha. Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

  72. [72]

    Mean field multi-agent reinforcement learning

    Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 5571–5580, 2018

  73. [73]

    Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

    Chao Yu, Yinzhao Dong, Yangning Li, and Yatong Chen. Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

  74. [74]

    Robust communicative multi-agent reinforcement learning with active defense

    Lebin Yu, Yunbo Qiu, Quanming Yao, Yuan Shen, Xudong Zhang, and Jian Wang. Robust communicative multi-agent reinforcement learning with active defense. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17575–17582, 2024

  75. [75]

    A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

    Jingyao Zhang and Elaheh Sadredini. A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

  76. [76]

    SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

    Jingyao Zhang, Jaewoo Park, Jongeun Lee, and Elaheh Sadredini. SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

  77. [77]

    Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

    Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

  78. [78]

    doi: 10.1109/JPROC.2021.3076600

  79. [79]

    Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

    Lin Zhang, Yufeng Sun, Andrew Barth, and Ou Ma. Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

  80. [80]

    OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices

    Qian Zhang, Ruiyang Quan, Siqin Qimuge, Peimin Xia, Jiaheng Wang, Xin Zan, Fangshi Wang, Changchuan Chen, Qi Wei, Huichan Zhao, Xinjun Liu, and Fei Qiao. OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 202...

Showing first 80 references.