PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

Cong Liu; Hyoseung Kim; Zexin Li; ZiLiang Zhang

arxiv: 2605.23027 · v1 · pith:SGTMXY5Jnew · submitted 2026-05-21 · 💻 cs.RO

PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

Zexin Li , Ziliang Zhang , Hyoseung Kim , Cong Liu This is my paper

Pith reviewed 2026-05-25 05:24 UTC · model grok-4.3

classification 💻 cs.RO

keywords multi-robot reinforcement learningadversarial manipulationsocial dilemmasreward manipulationpolicy manipulationadaptive controllerembedded robotics

0 comments

The pith

PIMbot lets one robot manipulate multi-robot RL social dilemmas by altering rewards and its own policy through an adaptive controller.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a framework called PIMbot can shift outcomes in multi-robot reinforcement learning environments involving social dilemmas. It achieves this by combining manipulation of the shared reward signal with changes to the manipulating robot's own actions, using an online controller to balance the two. A sympathetic reader would care because cooperative robot systems often depend on reward structures to encourage collective behavior, and this approach exposes how one agent might disrupt that balance. The work validates the method through experiments in a Gazebo simulation and on physical NVIDIA Jetson hardware. If correct, the result indicates that multi-robot RL systems built around unique reward functions carry inherent manipulation risks.

Core claim

PIMbot manipulates multi-robot RL social dilemmas through two levers—incentive manipulation of the reward channel and policy manipulation of the agent's own actions—balanced by an adaptive multi-objective controller that operates online, enabling a robot to effectively alter the environment's outcomes as shown in both simulated and embedded-device settings.

What carries the argument

PIMbot's dual-lever adaptive controller that balances reward-channel incentive changes with self-policy adjustments in real time.

If this is right

A manipulating robot can shift social dilemma outcomes toward self-interest rather than collective benefit.
The method produces measurable effects in Gazebo-simulated multi-robot setups.
The approach runs on real embedded hardware such as the NVIDIA Jetson Orin Nano while quantifying system costs.
PIMbot functions as a stress-test tool that reveals vulnerabilities in multi-robot cooperative tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of multi-robot RL systems may need mechanisms to detect or isolate reward-channel tampering.
The dual-lever idea could apply to other multi-agent settings where one participant has partial control over shared signals.
Robustness testing against adaptive adversaries becomes necessary once reward functions are treated as attack surfaces.

Load-bearing premise

The multi-robot environment relies on a unique reward function that can be directly changed through the reward channel without other agents detecting or adapting to the alteration.

What would settle it

If other agents detect the altered rewards and modify their policies to restore prior cooperation levels despite the manipulation, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.23027 by Cong Liu, Hyoseung Kim, Zexin Li, ZiLiang Zhang.

**Figure 2.** Figure 2: Impact of the “Bypass Policy” method in ER. Cooperators send negative incentives to [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Fake incentives destroy cooperation, producing early but unstable successes during explo [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Partial communication shortens convergence in ER. Bottom panels show the adversary’s [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Reverse Policy: minimizing the adversary’s own reward depresses incentives and shrinks [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Agent success rates across scenarios (ER and IPD) for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Per-agent rewards across scenarios (ER and IPD) for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Average steps per episode across ER configurations for multi-objective manipulation. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Agent success rates across Stag Hunt configurations ( [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Agent success rates across Escape Room configurations under the SOTA Reciprocators [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Agent success rates across IPD configurations under the SOTA Reciprocators Baseline [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Agent success rates across Stag Hunt configurations under the SOTA Reciprocators [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Robotic Simulation of PIMbot in Gazebo sumulator. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: System profiling traces for IPD and ER benchmarks under the LIO framework (CPU [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗

**Figure 15.** Figure 15: System profiling traces for IPD and ER benchmarks with truncated long zero-GPU uti [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

read the original abstract

Recent research has demonstrated the potential of reinforcement learning in effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interest and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents PIMbot, a framework that manipulates outcomes via two complementary levers: (i) incentive manipulation of the reward channel and (ii) policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers in an online manner. Our work introduces a novel approach to manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. Comprehensive experimental results demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Moreover, a real embedded device case study on NVIDIA Jetson Orin Nano quantifies system cost and validates PIMbot's effectiveness on realistic autonomous embedded systems scenarios beyond simulation. Together, these results position PIMbot as a rigorous stress-test tool exposing critical vulnerabilities in multi-robot cooperative tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PIMbot combines reward and policy manipulation via an online controller for multi-robot social dilemmas, with sim and Jetson hardware tests, but the stealth assumption for reward changes is unaddressed.

read the letter

PIMbot introduces a dual-lever attack on multi-robot RL cooperation in social dilemmas by pairing incentive changes through the reward channel with direct policy tweaks, balanced by an adaptive multi-objective controller. That framing is the main new element compared to single-lever adversarial RL work. The paper does a solid job moving from Gazebo simulation to a real embedded case study on the NVIDIA Jetson Orin Nano, which gives concrete numbers on system cost and shows the approach can run on actual hardware rather than staying theoretical. Those experiments are the clearest evidence of effort to demonstrate practicality. The soft spot is the load-bearing assumption that reward manipulation stays undetected and that other agents do not adapt. The abstract states the environment uses a unique reward function that can be directly manipulated, yet provides no mechanism or test for stealth against observation or policy response from the other robots. If that assumption fails, the claimed effectiveness of the controller does not follow. No equations, error bars, or direct comparisons to prior techniques appear in the abstract, which makes it difficult to judge how much the combination actually advances beyond routine extensions. This work is mainly for researchers focused on robustness and security in multi-agent RL systems who want concrete attack examples to test their own methods. It is not a foundational result but could serve as a stress-test tool if the details hold. The hardware validation is enough to justify sending it to peer review rather than desk rejection, though the authors would need to add explicit handling of the adaptation issue and more technical specifics on the controller.

Referee Report

2 major / 2 minor

Summary. The paper presents PIMbot, a framework for adversarial manipulation of multi-robot RL in social dilemmas. It manipulates outcomes via two levers—incentive manipulation of the reward channel and policy manipulation of an agent's actions—balanced by an adaptive multi-objective controller. The work claims effectiveness in a Gazebo-simulated multi-robot environment and validates it on an NVIDIA Jetson Orin Nano embedded device, positioning PIMbot as a stress-test tool for vulnerabilities in cooperative multi-robot tasks.

Significance. If the results hold with proper validation of the core assumptions, the work would be significant for exposing practical attack surfaces in multi-agent RL systems used for robot collaboration. The combination of simulation and real embedded hardware experiments, along with the self-adaptive controller, could provide a useful benchmark for robustness testing in social dilemma scenarios.

major comments (2)

[Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.
[Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.

minor comments (2)

[Method] Notation for the multi-objective controller and the two levers should be defined explicitly with equations rather than prose descriptions.
[Experimental setup] The Gazebo environment description should include the exact reward functions and state observations available to non-attacker agents to allow reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where the manuscript can be strengthened. We address each major comment below and commit to revisions that provide the requested details and clarifications without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract and § on reward manipulation: The central effectiveness claim requires that reward-channel manipulation remains undetected and unadapted to by other agents, yet no mechanism, observation model, or experimental test (e.g., monitoring other agents' reward signals or policy shifts) is provided to establish stealth or robustness; this assumption is load-bearing for the 'effective manipulation' result.

Authors: We agree that the manuscript does not explicitly model or test the stealth of reward-channel manipulation against detection or adaptation by other agents. The current presentation focuses on the manipulation framework and its outcomes under the stated environmental assumptions. In revision, we will expand the abstract and add a dedicated subsection on assumptions, including discussion of potential observation models for other agents and preliminary analysis of policy shifts. We will also incorporate new experiments that monitor reward signals and quantify robustness where feasible. revision: yes
Referee: [Experimental results] Experimental results section: Only high-level claims of effectiveness are stated without equations for the adaptive controller, quantitative metrics (success rates, reward deltas), error bars, or ablation on the two levers, making it impossible to verify whether the reported outcomes support the manipulation claims under the stated assumptions.

Authors: We acknowledge that the experimental results section presents high-level claims without the requested quantitative details. The manuscript reports effectiveness in Gazebo and on Jetson hardware but does not include the adaptive controller equations, specific metrics with error bars, or ablations. In the revised version, we will add the controller equations, report success rates, reward deltas, and other metrics with error bars from repeated trials, and include ablation studies isolating the reward and policy levers to allow verification of the results. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations; claims rest on experiments, not self-referential math

full rationale

The provided abstract and description contain no equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations. The framework is described at a high level (two manipulation levers plus adaptive controller) and evaluated via Gazebo simulation plus Jetson hardware runs. No step reduces by construction to its own inputs or prior author work; the central claim of effective manipulation is presented as an empirical outcome rather than a mathematical derivation. This is the normal case of a paper whose contribution is algorithmic and experimental rather than deductive.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so free parameters, axioms, and invented entities cannot be identified. The central claim rests on an unstated assumption that the reward channel is directly manipulable and that the adaptive controller can balance the two levers without additional constraints.

pith-pipeline@v0.9.0 · 5752 in / 1059 out tokens · 15188 ms · 2026-05-25T05:24:23.689898+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

[1]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Aakriti Agrawal, Senthil Hariharan Arul, Amrit Singh Bedi, and Dinesh Manocha. DC- MRTA: decentralized multi-robot task allocation and navigation in complex environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022, pages 11711–11718. IEEE, 2022. doi: 10.1109/IROS47612.2022. 9981353. ...

work page doi:10.1109/iros47612.2022 2022
[2]

Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach

Sajjad Ahangar, Mehdi Valizadeh Mehrabani, Alireza Pouransari Shorijeh, and Mehdi Tale Masouleh. Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach. In2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), pages 413–418. IEEE, 2019

work page 2019
[3]

Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

Amine Andam, Jamal Bentahar, and Mustapha Hedabou. Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

work page arXiv 2025
[4]

ASHRAE, 4th edition, 2015

ASHRAE TC 9.9.Thermal Guidelines for Data Processing Environments. ASHRAE, 4th edition, 2015

work page 2015
[5]

Multi-robot task planning under individual and collaborative temporal logic specifications

Ruofei Bai, Ronghao Zheng, Meiqin Liu, and Senlin Zhang. Multi-robot task planning under individual and collaborative temporal logic specifications. InIEEE/RSJ International Con- ference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021, pages 6382–6389. IEEE, 2021. doi: 10.1109/IROS51168.2021.9636287. URL h...

work page doi:10.1109/iros51168.2021.9636287 2021
[6]

The case for energy-proportional computing.Computer, 40(12):33–37, 2007

Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing.Computer, 40(12):33–37, 2007

work page 2007
[7]

A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008

Lucian Buşoniu, Robert Babuška, and Bart De Schutter. A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008. doi: 10.1109/TSMCC.2007.913919. 25

work page doi:10.1109/tsmcc.2007.913919 2008
[8]

Multi-agent reinforcement learning: An overview

Lucian Buşoniu, Robert Babuška, and Bart De Schutter. Multi-agent reinforcement learning: An overview. In Dipti Srinivasan and Lakhmi C. Jain, editors,Innovations in Multi-Agent Systems and Applications–1, volume 310 ofStudies in Computational Intelligence, pages 183–

work page
[9]

doi: 10.1007/978-3-642-14435-6_7

Springer, Berlin, Heidelberg, 2010. doi: 10.1007/978-3-642-14435-6_7

work page doi:10.1007/978-3-642-14435-6_7 2010
[10]

Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021

Lorenzo Canese, Gian Carlo Cardarilli, Luigi Di Nunzio, Roberto Fazzolari, Daniele Giardino, Marco Re, and Stefania Spanò. Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021. doi: 10.3390/app11114948

work page doi:10.3390/app11114948 2021
[11]

Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle

Samuel J Carlson, Tugrul Karakurt, Pallavi Arora, and Christos Papachristos. Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle. InIEEE International Conference on Unmanned Aircraft Systems (ICUAS), pages 237–244, 2022

work page 2022
[12]

Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination

Simin Chen, Pranav Pusarla, and Baishakhi Ray. Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination. InForty-second International Conference on Machine Learning

work page
[13]

Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems

SiminChen, CongLiu, MirazulHaque, ZiheSong, andWeiYang. Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1148–1160, 2022

work page 2022
[14]

Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models

Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15365–15374, 2022

work page 2022
[15]

The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection

Simin Chen, Hanlin Chen, Mirazul Haque, Cong Liu, and Wei Yang. The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24585–24594, 2023

work page 2023
[16]

Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, and Baishakhi Ray. Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

work page arXiv 2025
[17]

Dy- namic transformers provide a false sense of efficiency

Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby Tan, and Haizhou Li. Dy- namic transformers provide a false sense of efficiency. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7164–7180, Toronto, Canada, July

work page
[18]

doi: 10.18653/v1/2023.acl-long.395

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.395. URL https://aclanthology.org/2023.acl-long.395/

work page doi:10.18653/v1/2023.acl-long.395 2023
[19]

Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models

Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D’Haro, Robby Tan, and Haizhou Li. Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 1359–1375, Bangkok, Thai...

work page doi:10.18653/v1/2024.findings-acl 2024
[20]

URLhttps://aclanthology.org/2024.findings-acl.80/. 26

work page 2024
[21]

Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 122–130, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems

work page 2018
[22]

Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 122–130, 2018

work page 2018
[23]

Foerster, Yannis M

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

work page 2016
[24]

Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack

Yu Fu, Yufei Li, Wen Xiao, Cong Liu, and Yue Dong. Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8483–8502, 2024

work page 2024
[25]

Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024

Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, and Haizhou Li. Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024. doi: 10.1109/LSP. 2024.3443711

work page doi:10.1109/lsp 2024
[26]

Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, and Nancy F. Chen. Ttslow: Slow down text-to-speech with efficiency robustness evaluations.IEEE Transactions on Audio, Speech and Language Processing, 33:693–704, 2025. doi: 10.1109/TASLPRO.2025.3533357

work page doi:10.1109/taslpro.2025.3533357 2025
[27]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Yuman Gao, Yingjian Wang, Xingguang Zhong, Tiankai Yang, Mingyang Wang, Zhixiong Xu, Yongchao Wang, Yi Lin, Chao Xu, and Fei Gao. Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022...

work page doi:10.1109/iros47612.2022.9981544 2022
[28]

Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

Sahar Ghoflsaz Ghinani, Jingyao Zhang, and Elaheh Sadredini. Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

work page arXiv 2025
[29]

Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

work page arXiv 1905
[30]

Backdoor detection and mitigation in competitive rein- forcement learning, 2023

Junfeng Guo, Ang Li, and Cong Liu. Backdoor detection and mitigation in competitive rein- forcement learning, 2023

work page 2023
[31]

What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, and Fei Miao. What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

work page arXiv 2022
[32]

A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022

Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, and Fei Miao. A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022. 27

work page arXiv 2022
[33]

Traversing supervisor problem: An approximately optimal approach to multi-robot assistance

Tianchen Ji, Roy Dong, and Katherine Driggs-Campbell. Traversing supervisor problem: An approximately optimal approach to multi-robot assistance. InProceedings of Robotics: Science and Systems (RSS), 2022

work page 2022
[34]

Pccl: Energy-efficient llm training with power- aware collective communication

Ziyang Jia, Laxmi N Bhuyan, and Daniel Wong. Pccl: Energy-efficient llm training with power- aware collective communication. In2024 IEEE 42nd International Conference on Computer Design (ICCD), pages 84–91. IEEE, 2024

work page 2024
[35]

Fine-grained warm water cooling for improving datacenter economy

Weixiang Jiang, Ziyang Jia, Sirui Feng, Fangming Liu, and Hai Jin. Fine-grained warm water cooling for improving datacenter economy. InProceedings of the 46th International Symposium on Computer Architecture, pages 474–486, 2019

work page 2019
[36]

Autonomous teamed exploration of sub- terranean environments using legged and aerial robots

Maitreyi Kulkarni, Mihir Dharmadhikari, Marco Tranzatto, Samuel Zimmermann, Vincent Reijgwart, Pietro De Petris, Huy Nguyen, Ninad Khedekar, Christos Papachristos, Lionel Ott, Roland Siegwart, Marco Hutter, and Kostas Alexis. Autonomous teamed exploration of sub- terranean environments using legged and aerial robots. InIEEE International Conference on Rob...

work page 2022
[37]

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Bo An, et al. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

work page 2025
[38]

White-box multi-objective adversarial attack on dialogue generation

Yufei Li, Zexin Li, Yingfan Gao, and Cong Liu. White-box multi-objective adversarial attack on dialogue generation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1792, 2023

work page 2023
[39]

Rt-lm: Uncertainty-aware resource management for real-time inference of language models

Yufei Li, Zexin Li, Wei Yang, and Cong Liu. Rt-lm: Uncertainty-aware resource management for real-time inference of language models. In2023 IEEE Real-Time Systems Symposium (RTSS), pages 158–171. IEEE, 2023

work page 2023
[40]

Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

Yufei Li, Yu Fu, Yue Dong, and Cong Liu. Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

work page arXiv 2025
[41]

Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

Yufei Li, Zexin Li, Yinglun Zhu, and Cong Liu. Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

work page arXiv 2025
[42]

Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

Yufei Li, John Nham, Ganesh Jawahar, Lei Shu, David Uthus, Yun-Hsuan Sung, Chengrun Yang, Itai Rolnick, Yi Qiao, and Cong Liu. Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

work page arXiv 2025
[43]

A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation

Xiao Lin, Hongjie Chen, Changhua Pei, Fei Sun, Xuanji Xiao, Hanxiao Sun, Yongfeng Zhang, Wenwu Ou, and Peng Jiang. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In Toine Bogers, Alan Said, Peter Brusilovsky, and Domonkos Tikk, editors,Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 201...

work page doi:10.1145/3298689.3346998 2019
[44]

Efficient adversarial attacks on online multi-agent reinforcement learning

Guanlin Liu and Lifeng LAI. Efficient adversarial attacks on online multi-agent reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 24401–24433. Curran Associates, Inc., 2023. URLhttps://proceedings.neurips.cc/paper_files/paper/2023/ fi...

work page 2023
[45]

Maven: Multi-agent variational exploration

Anuj Mahajan, Mikayel Samvelyan, Christian Schroeder de Witt, Bohdan Sun, Tabish Rashid, Shimon Whiteson, and Jakob Foerster. Maven: Multi-agent variational exploration. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

work page 2019
[46]

A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

Bijan Mehralizadeh, Bahar Baradaran, Shahab Nikkhoo, Pegah Soleiman, and Hadi Moradi. A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

work page
[47]

doi: 10.3390/su15107790

ISSN 2071-1050. doi: 10.3390/su15107790. URLhttps://www.mdpi.com/2071-1050/ 15/10/7790

work page doi:10.3390/su15107790 2071
[48]

Ng, Daishi Harada, and Stuart J

Andrew Y. Ng, Daishi Harada, and Stuart J. Russell. Policy invariance under reward transfor- mations: Theory and application to reward shaping. InProceedings of the 16th International Conference on Machine Learning (ICML), pages 278–287, 1999

work page 1999
[49]

Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas

Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, and Cong Liu. Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5630–

work page
[50]

Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023

NVIDIA Corporation. Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023. Accessed: 2025-09-29

work page 2023
[51]

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V. Albrecht. Dealing with non-stationarity in multi-agent deep reinforcement learning.arXiv preprint arXiv:1906.04737, 2019. URLhttps://arxiv.org/abs/1906.04737

work page internal anchor Pith review Pith/arXiv arXiv 1906
[52]

Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters

Qiangyu Pei, Shutong Chen, Qixia Zhang, Xinhui Zhu, Fangming Liu, Ziyang Jia, Yishuo Wang, and Yongjie Yuan. Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 814–829, 2022

work page 2022
[53]

Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 4295–4304, 2018

work page 2018
[54]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019. URL https://arxiv.org/abs/1902.04043

work page arXiv 1902
[55]

Cambridge University Press, Cambridge, UK, 2009

Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK, 2009. ISBN 978- 0521899437

work page 2009
[56]

Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

Philipp Dominic Siedler. Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

work page arXiv 2022
[57]

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks. InInternational Conference on Learning Representations (ICLR), 2019. URLhttps://arxiv.org/abs/1812.09755. 29

work page internal anchor Pith review Pith/arXiv arXiv 2019
[58]

Hostallero, and Yung Yi

Kyunghwan Son, Daewoo Kim, Wan Ju Kang, Debbie G. Hostallero, and Yung Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 5887–5896, 2019

work page 2019
[59]

Learning to cooperate in a social dilemma: A satisficing approach to bargaining

Jeff L Stimpson and Michael A Goodrich. Learning to cooperate in a social dilemma: A satisficing approach to bargaining. InICML, pages 728–735. Citeseer, 2003

work page 2003
[60]

Learning multiagent communication with backpropagation

Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam, and Jason Weston. Learning multiagent communication with backpropagation. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

work page 2016
[61]

Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech M. Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2...

work page 2085
[62]

Banerjee

Veniamin Tereshchuk, John Stewart, Nikolay Bykov, Samuel Pedigo, Santosh Devasia, and Ashis G. Banerjee. An efficient scheduling algorithm for multi-robot task allocation in assembling aircraft structures.IEEE Robotics Autom. Lett., 4(4):3844–3851, 2019. doi: 10.1109/LRA.2019.2929983. URLhttps://doi.org/10.1109/LRA.2019.2929983

work page doi:10.1109/lra.2019.2929983 2019
[63]

Justin K. Terry, Benjamin Black, Mario Jayakumar, Akshara Hari, Chace Sullivan, Ritchie Lee Santos, Clayton Dieffenderfer, Colin Horsch, Keon Perez, Akilesh Ravi, Alexander Williams, Yashas Lokesh, Morgan Dickens, Lilian Weng, Andreas Kallinteris, Shumeet Baluja, Woj- ciech M. Czarnecki, and Marc Lanctot. Pettingzoo: Gym for multi-agent reinforcement lear...

work page 2021
[64]

Pue: A comprehensive examination of the metric

The Green Grid. Pue: A comprehensive examination of the metric. Technical report, The Green Grid, 2012. URLhttps://www.thegreengrid.org. White paper TGG-2012

work page 2012
[65]

Adversarial attacks on multi-agent communication

James Tu, Tsunhsuan Wang, Jingkang Wang, Sivabalan Manivasagam, Mengye Ren, and Raquel Urtasun. Adversarial attacks on multi-agent communication. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7768–7777, 2021

work page 2021
[66]

Synthesis Lectures on Artificial Intelligence and Machine Learning

Nikos Vlassis.A Concise Introduction to Multiagent Systems and Distributed Artificial Intelli- gence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, San Rafael, CA, 2007. doi: 10.2200/S00090ED1V01Y200705AIM002

work page doi:10.2200/s00090ed1v01y200705aim002 2007
[67]

Qplex: Duplex du- eling multi-agent q-learning

Jiechuan Wang, Zhanghen Ren, Wenbo Liu, Yong Yu, and Weinan Zhang. Qplex: Duplex du- eling multi-agent q-learning. InInternational Conference on Learning Representations (ICLR),

work page
[68]

URLhttps://openreview.net/forum?id=Rcmk0xxIQV

work page
[69]

MIT Press, Cambridge, MA, 1999

Gerhard Weiss, editor.Multiagent Systems: A Modern Approach to Distributed Artificial In- telligence. MIT Press, Cambridge, MA, 1999

work page 1999
[70]

Learning to incentivize other learning agents

Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. 30

work page 2020
[71]

Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

Jiachen Yang, Ethan Wang, Rakshit Trivedi, Tuo Zhao, and Hongyuan Zha. Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

work page arXiv 2021
[72]

Mean field multi-agent reinforcement learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 5571–5580, 2018

work page 2018
[73]

Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

Chao Yu, Yinzhao Dong, Yangning Li, and Yatong Chen. Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

work page 2020
[74]

Robust communicative multi-agent reinforcement learning with active defense

Lebin Yu, Yunbo Qiu, Quanming Yao, Yuan Shen, Xudong Zhang, and Jian Wang. Robust communicative multi-agent reinforcement learning with active defense. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17575–17582, 2024

work page 2024
[75]

A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

Jingyao Zhang and Elaheh Sadredini. A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

work page arXiv 2025
[76]

SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

Jingyao Zhang, Jaewoo Park, Jongeun Lee, and Elaheh Sadredini. SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

work page arXiv 2025
[77]

Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

work page
[78]

doi: 10.1109/JPROC.2021.3076600

work page doi:10.1109/jproc.2021.3076600 2021
[79]

Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

Lin Zhang, Yufeng Sun, Andrew Barth, and Ou Ma. Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

work page 2020
[80]

OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices

Qian Zhang, Ruiyang Quan, Siqin Qimuge, Peimin Xia, Jiaheng Wang, Xin Zan, Fangshi Wang, Changchuan Chen, Qi Wei, Huichan Zhao, Xinjun Liu, and Fei Qiao. OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 202...

work page doi:10.1109/iros47612 2022

Showing first 80 references.

[1] [1]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Aakriti Agrawal, Senthil Hariharan Arul, Amrit Singh Bedi, and Dinesh Manocha. DC- MRTA: decentralized multi-robot task allocation and navigation in complex environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022, pages 11711–11718. IEEE, 2022. doi: 10.1109/IROS47612.2022. 9981353. ...

work page doi:10.1109/iros47612.2022 2022

[2] [2]

Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach

Sajjad Ahangar, Mehdi Valizadeh Mehrabani, Alireza Pouransari Shorijeh, and Mehdi Tale Masouleh. Design a 3-dof delta parallel robot by one degree redundancy along the conveyor axis, a novel automation approach. In2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), pages 413–418. IEEE, 2019

work page 2019

[3] [3]

Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

Amine Andam, Jamal Bentahar, and Mustapha Hedabou. Constrained black-box attacks against multi-agent reinforcement learning.arXiv preprint arXiv:2508.09275, 2025

work page arXiv 2025

[4] [4]

ASHRAE, 4th edition, 2015

ASHRAE TC 9.9.Thermal Guidelines for Data Processing Environments. ASHRAE, 4th edition, 2015

work page 2015

[5] [5]

Multi-robot task planning under individual and collaborative temporal logic specifications

Ruofei Bai, Ronghao Zheng, Meiqin Liu, and Senlin Zhang. Multi-robot task planning under individual and collaborative temporal logic specifications. InIEEE/RSJ International Con- ference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021, pages 6382–6389. IEEE, 2021. doi: 10.1109/IROS51168.2021.9636287. URL h...

work page doi:10.1109/iros51168.2021.9636287 2021

[6] [6]

The case for energy-proportional computing.Computer, 40(12):33–37, 2007

Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing.Computer, 40(12):33–37, 2007

work page 2007

[7] [7]

A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008

Lucian Buşoniu, Robert Babuška, and Bart De Schutter. A comprehensive survey of multia- gent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2):156–172, 2008. doi: 10.1109/TSMCC.2007.913919. 25

work page doi:10.1109/tsmcc.2007.913919 2008

[8] [8]

Multi-agent reinforcement learning: An overview

Lucian Buşoniu, Robert Babuška, and Bart De Schutter. Multi-agent reinforcement learning: An overview. In Dipti Srinivasan and Lakhmi C. Jain, editors,Innovations in Multi-Agent Systems and Applications–1, volume 310 ofStudies in Computational Intelligence, pages 183–

work page

[9] [9]

doi: 10.1007/978-3-642-14435-6_7

Springer, Berlin, Heidelberg, 2010. doi: 10.1007/978-3-642-14435-6_7

work page doi:10.1007/978-3-642-14435-6_7 2010

[10] [10]

Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021

Lorenzo Canese, Gian Carlo Cardarilli, Luigi Di Nunzio, Roberto Fazzolari, Daniele Giardino, Marco Re, and Stefania Spanò. Multi-agent reinforcement learning: A review of challenges and applications.Applied Sciences, 11(11):4948, 2021. doi: 10.3390/app11114948

work page doi:10.3390/app11114948 2021

[11] [11]

Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle

Samuel J Carlson, Tugrul Karakurt, Pallavi Arora, and Christos Papachristos. Integrated solar power harvesting and hibernation for a recurrent-mission vtol micro aerial vehicle. InIEEE International Conference on Unmanned Aircraft Systems (ICUAS), pages 237–244, 2022

work page 2022

[12] [12]

Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination

Simin Chen, Pranav Pusarla, and Baishakhi Ray. Dycodeeval: Dynamic benchmarking of reasoning capabilities in code large language models under data contamination. InForty-second International Conference on Machine Learning

work page

[13] [13]

Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems

SiminChen, CongLiu, MirazulHaque, ZiheSong, andWeiYang. Nmtsloth: understandingand testing efficiency degradation of neural machine translation systems. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1148–1160, 2022

work page 2022

[14] [14]

Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models

Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluat- ing the efficiency robustness of neural image caption generation models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15365–15374, 2022

work page 2022

[15] [15]

The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection

Simin Chen, Hanlin Chen, Mirazul Haque, Cong Liu, and Wei Yang. The dark side of dy- namic routing neural networks: Towards efficiency backdoor injection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24585–24594, 2023

work page 2023

[16] [16]

Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, and Baishakhi Ray. Your compiler is back- dooring your model: Understanding and exploiting compilation inconsistency vulnerabilities in deep learning compilers.arXiv preprint arXiv:2509.11173, 2025

work page arXiv 2025

[17] [17]

Dy- namic transformers provide a false sense of efficiency

Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby Tan, and Haizhou Li. Dy- namic transformers provide a false sense of efficiency. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7164–7180, Toronto, Canada, July

work page

[18] [18]

doi: 10.18653/v1/2023.acl-long.395

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.395. URL https://aclanthology.org/2023.acl-long.395/

work page doi:10.18653/v1/2023.acl-long.395 2023

[19] [19]

Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models

Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D’Haro, Robby Tan, and Haizhou Li. Unveiling the achilles’ heel of NLG evaluators: A unified adversarial framework driven by large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 1359–1375, Bangkok, Thai...

work page doi:10.18653/v1/2024.findings-acl 2024

[20] [20]

URLhttps://aclanthology.org/2024.findings-acl.80/. 26

work page 2024

[21] [21]

Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 122–130, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems

work page 2018

[22] [22]

Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch

Jakob Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. InProceedings of the 17th Interna- tional Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 122–130, 2018

work page 2018

[23] [23]

Foerster, Yannis M

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

work page 2016

[24] [24]

Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack

Yu Fu, Yufei Li, Wen Xiao, Cong Liu, and Yue Dong. Safety alignment in nlp tasks: Weakly aligned summarization as an in-context attack. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8483–8502, 2024

work page 2024

[25] [25]

Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024

Xiaoxue Gao, Zexin Li, Yiming Chen, Cong Liu, and Haizhou Li. Transferable adversarial attacks against asr.IEEE Signal Processing Letters, 31:2200–2204, 2024. doi: 10.1109/LSP. 2024.3443711

work page doi:10.1109/lsp 2024

[26] [26]

Xiaoxue Gao, Yiming Chen, Xianghu Yue, Yu Tsao, and Nancy F. Chen. Ttslow: Slow down text-to-speech with efficiency robustness evaluations.IEEE Transactions on Audio, Speech and Language Processing, 33:693–704, 2025. doi: 10.1109/TASLPRO.2025.3533357

work page doi:10.1109/taslpro.2025.3533357 2025

[27] [27]

Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration

Yuman Gao, Yingjian Wang, Xingguang Zhong, Tiankai Yang, Mingyang Wang, Zhixiong Xu, Yongchao Wang, Yi Lin, Chao Xu, and Fei Gao. Meeting-merging-mission: A multi- robot coordinate framework for large-scale communication-limited exploration. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, October 23-27, 2022...

work page doi:10.1109/iros47612.2022.9981544 2022

[28] [28]

Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

Sahar Ghoflsaz Ghinani, Jingyao Zhang, and Elaheh Sadredini. Enabling low-cost secure com- puting on untrusted in-memory architectures.arXiv preprint arXiv:2501.17292, 2025

work page arXiv 2025

[29] [29]

Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

work page arXiv 1905

[30] [30]

Backdoor detection and mitigation in competitive rein- forcement learning, 2023

Junfeng Guo, Ang Li, and Cong Liu. Backdoor detection and mitigation in competitive rein- forcement learning, 2023

work page 2023

[31] [31]

What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, and Fei Miao. What is the solution for state adversarial multi-agent reinforcement learning?arXiv preprint arXiv:2212.02705, 2022

work page arXiv 2022

[32] [32]

A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022

Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, and Fei Miao. A robust and constrained multi-agent reinforcement learning framework for electric vehicle amod systems.arXiv preprint arXiv:2209.08230, 2022. 27

work page arXiv 2022

[33] [33]

Traversing supervisor problem: An approximately optimal approach to multi-robot assistance

Tianchen Ji, Roy Dong, and Katherine Driggs-Campbell. Traversing supervisor problem: An approximately optimal approach to multi-robot assistance. InProceedings of Robotics: Science and Systems (RSS), 2022

work page 2022

[34] [34]

Pccl: Energy-efficient llm training with power- aware collective communication

Ziyang Jia, Laxmi N Bhuyan, and Daniel Wong. Pccl: Energy-efficient llm training with power- aware collective communication. In2024 IEEE 42nd International Conference on Computer Design (ICCD), pages 84–91. IEEE, 2024

work page 2024

[35] [35]

Fine-grained warm water cooling for improving datacenter economy

Weixiang Jiang, Ziyang Jia, Sirui Feng, Fangming Liu, and Hai Jin. Fine-grained warm water cooling for improving datacenter economy. InProceedings of the 46th International Symposium on Computer Architecture, pages 474–486, 2019

work page 2019

[36] [36]

Autonomous teamed exploration of sub- terranean environments using legged and aerial robots

Maitreyi Kulkarni, Mihir Dharmadhikari, Marco Tranzatto, Samuel Zimmermann, Vincent Reijgwart, Pietro De Petris, Huy Nguyen, Ninad Khedekar, Christos Papachristos, Lionel Ott, Roland Siegwart, Marco Hutter, and Kostas Alexis. Autonomous teamed exploration of sub- terranean environments using legged and aerial robots. InIEEE International Conference on Rob...

work page 2022

[37] [37]

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

Simin Li, Jun Guo, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Bo An, et al. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.Neural Networks, 191:107747, 2025

work page 2025

[38] [38]

White-box multi-objective adversarial attack on dialogue generation

Yufei Li, Zexin Li, Yingfan Gao, and Cong Liu. White-box multi-objective adversarial attack on dialogue generation. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1792, 2023

work page 2023

[39] [39]

Rt-lm: Uncertainty-aware resource management for real-time inference of language models

Yufei Li, Zexin Li, Wei Yang, and Cong Liu. Rt-lm: Uncertainty-aware resource management for real-time inference of language models. In2023 IEEE Real-Time Systems Symposium (RTSS), pages 158–171. IEEE, 2023

work page 2023

[40] [40]

Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

Yufei Li, Yu Fu, Yue Dong, and Cong Liu. Mace: A hybrid llm serving system with colocated slo-aware continuous retraining alignment.arXiv preprint arXiv:2510.03283, 2025

work page arXiv 2025

[41] [41]

Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

Yufei Li, Zexin Li, Yinglun Zhu, and Cong Liu. Lemix: Unified scheduling for llm training and inference on multi-gpu systems.arXiv preprint arXiv:2507.21276, 2025

work page arXiv 2025

[42] [42]

Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

Yufei Li, John Nham, Ganesh Jawahar, Lei Shu, David Uthus, Yun-Hsuan Sung, Chengrun Yang, Itai Rolnick, Yi Qiao, and Cong Liu. Dr genre: Reinforcement learning from decoupled llm feedback for generic text rewriting.arXiv preprint arXiv:2503.06781, 2025

work page arXiv 2025

[43] [43]

A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation

Xiao Lin, Hongjie Chen, Changhua Pei, Fei Sun, Xuanji Xiao, Hanxiao Sun, Yongfeng Zhang, Wenwu Ou, and Peng Jiang. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In Toine Bogers, Alan Said, Peter Brusilovsky, and Domonkos Tikk, editors,Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 201...

work page doi:10.1145/3298689.3346998 2019

[44] [44]

Efficient adversarial attacks on online multi-agent reinforcement learning

Guanlin Liu and Lifeng LAI. Efficient adversarial attacks on online multi-agent reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 24401–24433. Curran Associates, Inc., 2023. URLhttps://proceedings.neurips.cc/paper_files/paper/2023/ fi...

work page 2023

[45] [45]

Maven: Multi-agent variational exploration

Anuj Mahajan, Mikayel Samvelyan, Christian Schroeder de Witt, Bohdan Sun, Tabish Rashid, Shimon Whiteson, and Jakob Foerster. Maven: Multi-agent variational exploration. InAd- vances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

work page 2019

[46] [46]

A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

Bijan Mehralizadeh, Bahar Baradaran, Shahab Nikkhoo, Pegah Soleiman, and Hadi Moradi. A sensorized toy car for autism screening using multi-modal features.Sustainability, 15(10),

work page

[47] [47]

doi: 10.3390/su15107790

ISSN 2071-1050. doi: 10.3390/su15107790. URLhttps://www.mdpi.com/2071-1050/ 15/10/7790

work page doi:10.3390/su15107790 2071

[48] [48]

Ng, Daishi Harada, and Stuart J

Andrew Y. Ng, Daishi Harada, and Stuart J. Russell. Policy invariance under reward transfor- mations: Theory and application to reward shaping. InProceedings of the 16th International Conference on Machine Learning (ICML), pages 278–287, 1999

work page 1999

[49] [49]

Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas

Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, and Cong Liu. Pimbot: Policy and incentive manipulation for multi-robot reinforcement learning in social dilemmas. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5630–

work page

[50] [50]

Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023

NVIDIA Corporation. Nvidia jetson orin nano developer kit: Technical overview.https: //developer.nvidia.com/embedded/jetson-orin-nano, 2023. Accessed: 2025-09-29

work page 2023

[51] [51]

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis, Filippos Christianos, Arrasy Rahman, and Stefano V. Albrecht. Dealing with non-stationarity in multi-agent deep reinforcement learning.arXiv preprint arXiv:1906.04737, 2019. URLhttps://arxiv.org/abs/1906.04737

work page internal anchor Pith review Pith/arXiv arXiv 1906

[52] [52]

Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters

Qiangyu Pei, Shutong Chen, Qixia Zhang, Xinhui Zhu, Fangming Liu, Ziyang Jia, Yishuo Wang, and Yongjie Yuan. Cooledge: hotspot-relievable warm water cooling for energy-efficient edge datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 814–829, 2022

work page 2022

[53] [53]

Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 4295–4304, 2018

work page 2018

[54] [54]

Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, and Shimon Whiteson. The starcraft multi-agent challenge.arXiv preprint arXiv:1902.04043, 2019. URL https://arxiv.org/abs/1902.04043

work page arXiv 1902

[55] [55]

Cambridge University Press, Cambridge, UK, 2009

Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, UK, 2009. ISBN 978- 0521899437

work page 2009

[56] [56]

Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

Philipp Dominic Siedler. Dynamic collaborative multi-agent reinforcement learning communi- cation for autonomous drone reforestation.arXiv preprint arXiv:2211.15414, 2022

work page arXiv 2022

[57] [57]

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Amanpreet Singh, Tushar Jain, and Sainbayar Sukhbaatar. Learning when to communicate at scale in multiagent cooperative and competitive tasks. InInternational Conference on Learning Representations (ICLR), 2019. URLhttps://arxiv.org/abs/1812.09755. 29

work page internal anchor Pith review Pith/arXiv arXiv 2019

[58] [58]

Hostallero, and Yung Yi

Kyunghwan Son, Daewoo Kim, Wan Ju Kang, Debbie G. Hostallero, and Yung Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 5887–5896, 2019

work page 2019

[59] [59]

Learning to cooperate in a social dilemma: A satisficing approach to bargaining

Jeff L Stimpson and Michael A Goodrich. Learning to cooperate in a social dilemma: A satisficing approach to bargaining. InICML, pages 728–735. Citeseer, 2003

work page 2003

[60] [60]

Learning multiagent communication with backpropagation

Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam, and Jason Weston. Learning multiagent communication with backpropagation. InAdvances in Neural Information Processing Systems (NIPS), volume 29, 2016

work page 2016

[61] [61]

Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech M. Czarnecki, Vinıcius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 2...

work page 2085

[62] [62]

Banerjee

Veniamin Tereshchuk, John Stewart, Nikolay Bykov, Samuel Pedigo, Santosh Devasia, and Ashis G. Banerjee. An efficient scheduling algorithm for multi-robot task allocation in assembling aircraft structures.IEEE Robotics Autom. Lett., 4(4):3844–3851, 2019. doi: 10.1109/LRA.2019.2929983. URLhttps://doi.org/10.1109/LRA.2019.2929983

work page doi:10.1109/lra.2019.2929983 2019

[63] [63]

Justin K. Terry, Benjamin Black, Mario Jayakumar, Akshara Hari, Chace Sullivan, Ritchie Lee Santos, Clayton Dieffenderfer, Colin Horsch, Keon Perez, Akilesh Ravi, Alexander Williams, Yashas Lokesh, Morgan Dickens, Lilian Weng, Andreas Kallinteris, Shumeet Baluja, Woj- ciech M. Czarnecki, and Marc Lanctot. Pettingzoo: Gym for multi-agent reinforcement lear...

work page 2021

[64] [64]

Pue: A comprehensive examination of the metric

The Green Grid. Pue: A comprehensive examination of the metric. Technical report, The Green Grid, 2012. URLhttps://www.thegreengrid.org. White paper TGG-2012

work page 2012

[65] [65]

Adversarial attacks on multi-agent communication

James Tu, Tsunhsuan Wang, Jingkang Wang, Sivabalan Manivasagam, Mengye Ren, and Raquel Urtasun. Adversarial attacks on multi-agent communication. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7768–7777, 2021

work page 2021

[66] [66]

Synthesis Lectures on Artificial Intelligence and Machine Learning

Nikos Vlassis.A Concise Introduction to Multiagent Systems and Distributed Artificial Intelli- gence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool, San Rafael, CA, 2007. doi: 10.2200/S00090ED1V01Y200705AIM002

work page doi:10.2200/s00090ed1v01y200705aim002 2007

[67] [67]

Qplex: Duplex du- eling multi-agent q-learning

Jiechuan Wang, Zhanghen Ren, Wenbo Liu, Yong Yu, and Weinan Zhang. Qplex: Duplex du- eling multi-agent q-learning. InInternational Conference on Learning Representations (ICLR),

work page

[68] [68]

URLhttps://openreview.net/forum?id=Rcmk0xxIQV

work page

[69] [69]

MIT Press, Cambridge, MA, 1999

Gerhard Weiss, editor.Multiagent Systems: A Modern Approach to Distributed Artificial In- telligence. MIT Press, Cambridge, MA, 1999

work page 1999

[70] [70]

Learning to incentivize other learning agents

Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, and Hongyuan Zha. Learning to incentivize other learning agents. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. 30

work page 2020

[71] [71]

Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

Jiachen Yang, Ethan Wang, Rakshit Trivedi, Tuo Zhao, and Hongyuan Zha. Adaptive incentive design with multi-agent meta-gradient reinforcement learning.arXiv preprint arXiv:2112.10859, 2021

work page arXiv 2021

[72] [72]

Mean field multi-agent reinforcement learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pages 5571–5580, 2018

work page 2018

[73] [73]

Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

Chao Yu, Yinzhao Dong, Yangning Li, and Yatong Chen. Distributed multi-agent deep rein- forcement learning for cooperative multi-robot pursuit.The Journal of Engineering, 2020(13): 499–504, 2020

work page 2020

[74] [74]

Robust communicative multi-agent reinforcement learning with active defense

Lebin Yu, Yunbo Qiu, Quanming Yao, Yuan Shen, Xudong Zhang, and Jian Wang. Robust communicative multi-agent reinforcement learning with active defense. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17575–17582, 2024

work page 2024

[75] [75]

A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

Jingyao Zhang and Elaheh Sadredini. A near-cache architectural framework for cryptographic computing.arXiv preprint arXiv:2509.23179, 2025

work page arXiv 2025

[76] [76]

SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

Jingyao Zhang, Jaewoo Park, Jongeun Lee, and Elaheh Sadredini. SAIL: SRAM-accelerated LLM inference system with lookup-table-based GEMV.arXiv preprint arXiv:2509.25853, 2025

work page arXiv 2025

[77] [77]

Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

Kaiqing Zhang, Zhuoran Yang, and Tamer Başar. Multi-agent reinforcement learning: A selective overview of theories and algorithms.Proceedings of the IEEE, 109(12):2278–2314,

work page

[78] [78]

doi: 10.1109/JPROC.2021.3076600

work page doi:10.1109/jproc.2021.3076600 2021

[79] [79]

Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

Lin Zhang, Yufeng Sun, Andrew Barth, and Ou Ma. Decentralized control of multi-robot system in cooperative object transportation using deep reinforcement learning.IEEE Access, 8:184109–184119, 2020

work page 2020

[80] [80]

OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices

Qian Zhang, Ruiyang Quan, Siqin Qimuge, Peimin Xia, Jiaheng Wang, Xin Zan, Fangshi Wang, Changchuan Chen, Qi Wei, Huichan Zhao, Xinjun Liu, and Fei Qiao. OCTOANTS: A hetero- geneous lightweight intelligent multi-robot collaboration system with resource-constrained iot devices. InIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 202...

work page doi:10.1109/iros47612 2022