PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning
Pith reviewed 2026-05-25 05:24 UTC · model grok-4.3
The pith
PIMbot lets one robot manipulate multi-robot RL social dilemmas by altering rewards and its own policy through an adaptive controller.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PIMbot manipulates multi-robot RL social dilemmas through two levers—incentive manipulation of the reward channel and policy manipulation of the agent's own actions—balanced by an adaptive multi-objective controller that operates online, enabling a robot to effectively alter the environment's outcomes as shown in both simulated and embedded-device settings.
What carries the argument
PIMbot's dual-lever adaptive controller that balances reward-channel incentive changes with self-policy adjustments in real time.
If this is right
- A manipulating robot can shift social dilemma outcomes toward self-interest rather than collective benefit.
- The method produces measurable effects in Gazebo-simulated multi-robot setups.
- The approach runs on real embedded hardware such as the NVIDIA Jetson Orin Nano while quantifying system costs.
- PIMbot functions as a stress-test tool that reveals vulnerabilities in multi-robot cooperative tasks.
Where Pith is reading between the lines
- Designers of multi-robot RL systems may need mechanisms to detect or isolate reward-channel tampering.
- The dual-lever idea could apply to other multi-agent settings where one participant has partial control over shared signals.
- Robustness testing against adaptive adversaries becomes necessary once reward functions are treated as attack surfaces.
Load-bearing premise
The multi-robot environment relies on a unique reward function that can be directly changed through the reward channel without other agents detecting or adapting to the alteration.
What would settle it
If other agents detect the altered rewards and modify their policies to restore prior cooperation levels despite the manipulation, the central claim would not hold.
Figures
read the original abstract
Recent research has demonstrated the potential of reinforcement learning in effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interest and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents PIMbot, a framework that manipulates outcomes via two complementary levers: (i) incentive manipulation of the reward channel and (ii) policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers in an online manner. Our work introduces a novel approach to manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. Comprehensive experimental results demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Moreover, a real embedded device case study on NVIDIA Jetson Orin Nano quantifies system cost and validates PIMbot's effectiveness on realistic autonomous embedded systems scenarios beyond simulation. Together, these results position PIMbot as a rigorous stress-test tool exposing critical vulnerabilities in multi-robot cooperative tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents PIMbot, a framework for adversarial manipulation of multi-robot RL in social dilemmas. It manipulates outcomes via two levers—incentive manipulation of the reward channel and policy manipulation of an agent's actions—balanced by an adaptive multi-objective controller. The work claims effectiveness in a Gazebo-simulated multi-robot environment and validates it on an NVIDIA Jetson Orin Nano embedded device, positioning PIMbot as a stress-test tool for vulnerabilities in cooperative multi-robot tasks.
Significance. If the results hold with proper validation of the core assumptions, the work would be significant for exposing practical attack surfaces in multi-agent RL systems used for robot collaboration. The combination of simulation and real embedded hardware experiments, along with the self-adaptive controller, could provide a useful benchmark for robustness testing in social dilemma scenarios.
major comments (2)
- [Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.
- [Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.
minor comments (2)
- [Method] Notation for the multi-objective controller and the two levers should be defined explicitly with equations rather than prose descriptions.
- [Experimental setup] The Gazebo environment description should include the exact reward functions and state observations available to non-attacker agents to allow reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened. We address each major comment below and commit to revisions that provide the requested details and clarifications without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.
Authors: We agree that the manuscript does not explicitly model or test the stealth of reward-channel manipulation against detection or adaptation by other agents. The current presentation focuses on the manipulation framework and its outcomes under the stated environmental assumptions. In revision, we will expand the abstract and add a dedicated subsection on assumptions, including discussion of potential observation models for other agents and preliminary analysis of policy shifts. We will also incorporate new experiments that monitor reward signals and quantify robustness where feasible. revision: yes
-
Referee: [Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.
Authors: We acknowledge that the experimental results section presents high-level claims without the requested quantitative details. The manuscript reports effectiveness in Gazebo and on Jetson hardware but does not include the adaptive controller equations, specific metrics with error bars, or ablations. In the revised version, we will add the controller equations, report success rates, reward deltas, and other metrics with error bars from repeated trials, and include ablation studies isolating the reward and policy levers to allow verification of the results. revision: yes
Circularity Check
No derivation chain or equations; claims rest on experiments, not self-referential math
full rationale
The provided abstract and description contain no equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations. The framework is described at a high level (two manipulation levers plus adaptive controller) and evaluated via Gazebo simulation plus Jetson hardware runs. No step reduces by construction to its own inputs or prior author work; the central claim of effective manipulation is presented as an empirical outcome rather than a mathematical derivation. This is the normal case of a paper whose contribution is algorithmic and experimental rather than deductive.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aakriti Agrawal, Senthil Hariharan Arul, Amrit Singh Bedi, and Dinesh Manocha. DC- MRTA: decentralized multi-robot task allocation and navigation in complex environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022, pages 11711–11718. IEEE, 2022. doi: 10.1109/IROS47612.2022. 9981353. ...
-
[2]
Sajjad Ahangar, Mehdi Valizadeh Mehrabani, Alireza Pouransari Shorijeh, and Mehdi Tale Masouleh. Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach. In2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), pages 413–418. IEEE, 2019
work page 2019
-
[3]
Amine Andam, Jamal Bentahar, and Mustapha Hedabou. Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025
-
[4]
ASHRAE TC 9.9.Thermal Guidelines for Data Processing Environments. ASHRAE, 4th edition, 2015
work page 2015
-
[5]
Multi-robot task planning under individual and collaborative temporal logic specifications
Ruofei Bai, Ronghao Zheng, Meiqin Liu, and Senlin Zhang. Multi-robot task planning under individual and collaborative temporal logic specifications. InIEEE/RSJ International Con- ference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021, pages 6382–6389. IEEE, 2021. doi: 10.1109/IROS51168.2021.9636287. URL h...
-
[6]
The case for energy-proportional computing.Computer, 40(12):33–37, 2007
Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing.Computer, 40(12):33–37, 2007
work page 2007
-
[7]
Lucian Buşoniu, Robert Babuška, and Bart De Schutter. A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008. doi: 10.1109/TSMCC.2007.913919. 25
-
[8]
Multi-agent reinforcement learning: An overview
Lucian Buşoniu, Robert Babuška, and Bart De Schutter. Multi-agent reinforcement learning: An overview. In Dipti Srinivasan and Lakhmi C. Jain, editors,Innovations in Multi-Agent Systems and Applications–1, volume 310 ofStudies in Computational Intelligence, pages 183–
-
[9]
doi: 10.1007/978-3-642-14435-6_7
Springer, Berlin, Heidelberg, 2010. doi: 10.1007/978-3-642-14435-6_7
-
[10]
Lorenzo Canese, Gian Carlo Cardarilli, Luigi Di Nunzio, Roberto Fazzolari, Daniele Giardino, Marco Re, and Stefania Spanò. Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021. doi: 10.3390/app11114948
-
[11]
Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle
Samuel J Carlson, Tugrul Karakurt, Pallavi Arora, and Christos Papachristos. Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle. InIEEE International Conference on Unmanned Aircraft Systems (ICUAS), pages 237–244, 2022
work page 2022
-
[12]
Simin Chen, Pranav Pusarla, and Baishakhi Ray. Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination. InForty-second International Conference on Machine Learning
-
[13]
Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems
SiminChen, CongLiu, MirazulHaque, ZiheSong, andWeiYang. Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1148–1160, 2022
work page 2022
-
[14]
Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models
Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15365–15374, 2022
work page 2022
-
[15]
The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection
Simin Chen, Hanlin Chen, Mirazul Haque, Cong Liu, and Wei Yang. The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24585–24594, 2023
work page 2023
-
[16]
Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, and Baishakhi Ray. Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025
-
[17]
Dy- namic transformers provide a false sense of efficiency
Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby Tan, and Haizhou Li. Dy- namic transformers provide a false sense of efficiency. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7164–7180, Toronto, Canada, July
-
[18]
doi: 10.18653/v1/2023.acl-long.395
Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.395. URL https://aclanthology.org/2023.acl-long.395/
-
[19]
Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D’Haro, Robby Tan, and Haizhou Li. Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 1359–1375, Bangkok, Thai...
-
[20]
URLhttps://aclanthology.org/2024.findings-acl.80/. 26
work page 2024
-
[21]
Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch
Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 122–130, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems
work page 2018
-
[22]
Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch
Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 122–130, 2018
work page 2018
-
[23]
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016
work page 2016
-
[24]
Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack
Yu Fu, Yufei Li, Wen Xiao, Cong Liu, and Yue Dong. Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8483–8502, 2024
work page 2024
-
[25]
Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024
Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, and Haizhou Li. Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024. doi: 10.1109/LSP. 2024.3443711
work page doi:10.1109/lsp 2024
-
[26]
Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, and Nancy F. Chen. Ttslow: Slow down text-to-speech with efficiency robustness evaluations.IEEE Transactions on Audio, Speech and Language Processing, 33:693–704, 2025. doi: 10.1109/TASLPRO.2025.3533357
-
[27]
Yuman Gao, Yingjian Wang, Xingguang Zhong, Tiankai Yang, Mingyang Wang, Zhixiong Xu, Yongchao Wang, Yi Lin, Chao Xu, and Fei Gao. Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022...
-
[28]
Sahar Ghoflsaz Ghinani, Jingyao Zhang, and Elaheh Sadredini. Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025
-
[29]
Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019
-
[30]
Backdoor detection and mitigation in competitive rein- forcement learning, 2023
Junfeng Guo, Ang Li, and Cong Liu. Backdoor detection and mitigation in competitive rein- forcement learning, 2023
work page 2023
-
[31]
Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, and Fei Miao. What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022
-
[32]
Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, and Fei Miao. A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022. 27
-
[33]
Traversing supervisor problem: An approximately optimal approach to multi-robot assistance
Tianchen Ji, Roy Dong, and Katherine Driggs-Campbell. Traversing supervisor problem: An approximately optimal approach to multi-robot assistance. InProceedings of Robotics: Science and Systems (RSS), 2022
work page 2022
-
[34]
Pccl: Energy-efficient llm training with power- aware collective communication
Ziyang Jia, Laxmi N Bhuyan, and Daniel Wong. Pccl: Energy-efficient llm training with power- aware collective communication. In2024 IEEE 42nd International Conference on Computer Design (ICCD), pages 84–91. IEEE, 2024
work page 2024
-
[35]
Fine-grained warm water cooling for improving datacenter economy
Weixiang Jiang, Ziyang Jia, Sirui Feng, Fangming Liu, and Hai Jin. Fine-grained warm water cooling for improving datacenter economy. InProceedings of the 46th International Symposium on Computer Architecture, pages 474–486, 2019
work page 2019
-
[36]
Autonomous teamed exploration of sub- terranean environments using legged and aerial robots
Maitreyi Kulkarni, Mihir Dharmadhikari, Marco Tranzatto, Samuel Zimmermann, Vincent Reijgwart, Pietro De Petris, Huy Nguyen, Ninad Khedekar, Christos Papachristos, Lionel Ott, Roland Siegwart, Marco Hutter, and Kostas Alexis. Autonomous teamed exploration of sub- terranean environments using legged and aerial robots. InIEEE International Conference on Rob...
work page 2022
-
[37]
Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Bo An, et al. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025
work page 2025
-
[38]
White-box multi-objective adversarial attack on dialogue generation
Yufei Li, Zexin Li, Yingfan Gao, and Cong Liu. White-box multi-objective adversarial attack on dialogue generation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1792, 2023
work page 2023
-
[39]
Rt-lm: Uncertainty-aware resource management for real-time inference of language models
Yufei Li, Zexin Li, Wei Yang, and Cong Liu. Rt-lm: Uncertainty-aware resource management for real-time inference of language models. In2023 IEEE Real-Time Systems Symposium (RTSS), pages 158–171. IEEE, 2023
work page 2023
-
[40]
Yufei Li, Yu Fu, Yue Dong, and Cong Liu. Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025
-
[41]
Yufei Li, Zexin Li, Yinglun Zhu, and Cong Liu. Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025
-
[42]
Yufei Li, John Nham, Ganesh Jawahar, Lei Shu, David Uthus, Yun-Hsuan Sung, Chengrun Yang, Itai Rolnick, Yi Qiao, and Cong Liu. Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025
-
[43]
A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation
Xiao Lin, Hongjie Chen, Changhua Pei, Fei Sun, Xuanji Xiao, Hanxiao Sun, Yongfeng Zhang, Wenwu Ou, and Peng Jiang. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In Toine Bogers, Alan Said, Peter Brusilovsky, and Domonkos Tikk, editors,Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 201...
-
[44]
Efficient adversarial attacks on online multi-agent reinforcement learning
Guanlin Liu and Lifeng LAI. Efficient adversarial attacks on online multi-agent reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 24401–24433. Curran Associates, Inc., 2023. URLhttps://proceedings.neurips.cc/paper_files/paper/2023/ fi...
work page 2023
-
[45]
Maven: Multi-agent variational exploration
Anuj Mahajan, Mikayel Samvelyan, Christian Schroeder de Witt, Bohdan Sun, Tabish Rashid, Shimon Whiteson, and Jakob Foerster. Maven: Multi-agent variational exploration. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, 2019
work page 2019
-
[46]
A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),
Bijan Mehralizadeh, Bahar Baradaran, Shahab Nikkhoo, Pegah Soleiman, and Hadi Moradi. A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),
-
[47]
ISSN 2071-1050. doi: 10.3390/su15107790. URLhttps://www.mdpi.com/2071-1050/ 15/10/7790
-
[48]
Ng, Daishi Harada, and Stuart J
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell. Policy invariance under reward transfor- mations: Theory and application to reward shaping. InProceedings of the 16th International Conference on Machine Learning (ICML), pages 278–287, 1999
work page 1999
-
[49]
Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas
Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, and Cong Liu. Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5630–
-
[50]
NVIDIA Corporation. Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023. Accessed: 2025-09-29
work page 2023
-
[51]
Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning
Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V. Albrecht. Dealing with non-stationarity in multi-agent deep reinforcement learning.arXiv preprint arXiv:1906.04737, 2019. URLhttps://arxiv.org/abs/1906.04737
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[52]
Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters
Qiangyu Pei, Shutong Chen, Qixia Zhang, Xinhui Zhu, Fangming Liu, Ziyang Jia, Yishuo Wang, and Yongjie Yuan. Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 814–829, 2022
work page 2022
-
[53]
Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 4295–4304, 2018
work page 2018
-
[54]
Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019. URL https://arxiv.org/abs/1902.04043
-
[55]
Cambridge University Press, Cambridge, UK, 2009
Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK, 2009. ISBN 978- 0521899437
work page 2009
-
[56]
Philipp Dominic Siedler. Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022
-
[57]
Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks
Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks. InInternational Conference on Learning Representations (ICLR), 2019. URLhttps://arxiv.org/abs/1812.09755. 29
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[58]
Kyunghwan Son, Daewoo Kim, Wan Ju Kang, Debbie G. Hostallero, and Yung Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 5887–5896, 2019
work page 2019
-
[59]
Learning to cooperate in a social dilemma: A satisficing approach to bargaining
Jeff L Stimpson and Michael A Goodrich. Learning to cooperate in a social dilemma: A satisficing approach to bargaining. InICML, pages 728–735. Citeseer, 2003
work page 2003
-
[60]
Learning multiagent communication with backpropagation
Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam, and Jason Weston. Learning multiagent communication with backpropagation. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016
work page 2016
-
[61]
Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech M. Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2...
work page 2085
-
[62]
Veniamin Tereshchuk, John Stewart, Nikolay Bykov, Samuel Pedigo, Santosh Devasia, and Ashis G. Banerjee. An efficient scheduling algorithm for multi-robot task allocation in assembling aircraft structures.IEEE Robotics Autom. Lett., 4(4):3844–3851, 2019. doi: 10.1109/LRA.2019.2929983. URLhttps://doi.org/10.1109/LRA.2019.2929983
-
[63]
Justin K. Terry, Benjamin Black, Mario Jayakumar, Akshara Hari, Chace Sullivan, Ritchie Lee Santos, Clayton Dieffenderfer, Colin Horsch, Keon Perez, Akilesh Ravi, Alexander Williams, Yashas Lokesh, Morgan Dickens, Lilian Weng, Andreas Kallinteris, Shumeet Baluja, Woj- ciech M. Czarnecki, and Marc Lanctot. Pettingzoo: Gym for multi-agent reinforcement lear...
work page 2021
-
[64]
Pue: A comprehensive examination of the metric
The Green Grid. Pue: A comprehensive examination of the metric. Technical report, The Green Grid, 2012. URLhttps://www.thegreengrid.org. White paper TGG-2012
work page 2012
-
[65]
Adversarial attacks on multi-agent communication
James Tu, Tsunhsuan Wang, Jingkang Wang, Sivabalan Manivasagam, Mengye Ren, and Raquel Urtasun. Adversarial attacks on multi-agent communication. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7768–7777, 2021
work page 2021
-
[66]
Synthesis Lectures on Artificial Intelligence and Machine Learning
Nikos Vlassis.A Concise Introduction to Multiagent Systems and Distributed Artificial Intelli- gence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, San Rafael, CA, 2007. doi: 10.2200/S00090ED1V01Y200705AIM002
-
[67]
Qplex: Duplex du- eling multi-agent q-learning
Jiechuan Wang, Zhanghen Ren, Wenbo Liu, Yong Yu, and Weinan Zhang. Qplex: Duplex du- eling multi-agent q-learning. InInternational Conference on Learning Representations (ICLR),
-
[68]
URLhttps://openreview.net/forum?id=Rcmk0xxIQV
-
[69]
MIT Press, Cambridge, MA, 1999
Gerhard Weiss, editor.Multiagent Systems: A Modern Approach to Distributed Artificial In- telligence. MIT Press, Cambridge, MA, 1999
work page 1999
-
[70]
Learning to incentivize other learning agents
Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. 30
work page 2020
-
[71]
Jiachen Yang, Ethan Wang, Rakshit Trivedi, Tuo Zhao, and Hongyuan Zha. Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021
-
[72]
Mean field multi-agent reinforcement learning
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 5571–5580, 2018
work page 2018
-
[73]
Chao Yu, Yinzhao Dong, Yangning Li, and Yatong Chen. Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020
work page 2020
-
[74]
Robust communicative multi-agent reinforcement learning with active defense
Lebin Yu, Yunbo Qiu, Quanming Yao, Yuan Shen, Xudong Zhang, and Jian Wang. Robust communicative multi-agent reinforcement learning with active defense. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17575–17582, 2024
work page 2024
-
[75]
Jingyao Zhang and Elaheh Sadredini. A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025
-
[76]
Jingyao Zhang, Jaewoo Park, Jongeun Lee, and Elaheh Sadredini. SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025
-
[77]
Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,
-
[78]
doi: 10.1109/JPROC.2021.3076600
-
[79]
Lin Zhang, Yufeng Sun, Andrew Barth, and Ou Ma. Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020
work page 2020
-
[80]
Qian Zhang, Ruiyang Quan, Siqin Qimuge, Peimin Xia, Jiaheng Wang, Xin Zan, Fangshi Wang, Changchuan Chen, Qi Wei, Huichan Zhao, Xinjun Liu, and Fei Qiao. OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 202...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.