arxiv: 2605.13213 · v1 · submitted 2026-05-13 · 💻 cs.AI

Recognition: unknown

Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

Hao Zhou , Tiru Wu , Yan Jiang , Wanqi Zhou , Junxing Hu , Ai Han

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:12 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-modal multi-agent systemshierarchical attacksadversarial robustnessreasoning vulnerabilitiesGQA benchmarkperception attackscommunication attacks

0 comments

The pith

A hierarchical attack framework exposes vulnerabilities in multi-modal multi-agent reasoning systems by achieving up to 78.3 percent attack success rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HAM³ to attack multi-modal multi-agent systems at three interconnected layers. At the perception layer it perturbs visual inputs, textual inputs, and their fused representations. At the communication layer it corrupts message content and interaction topology to distort collective information flow. At the reasoning layer it biases each agent's cognitive pipeline to compromise final decisions. Experiments on the GQA benchmark with ReAct, Plan-and-Solve, and Reflexion systems show attack success rates reaching 78.3 percent, with reasoning-layer attacks most effective and more than half of successes producing consistent errors across agents.

Core claim

HAM³ decomposes attacks into three interconnected layers: perception attacks by perturbing visual inputs, textual inputs, and fused visual-textual representations; communication attacks by corrupting message content and interaction topology to distort collective information flow; and reasoning attacks by interfering with each agent's cognitive pipeline to bias reasoning trajectories. When evaluated on multi-agent systems for the GQA benchmark built on ReAct, Plan-and-Solve, and Reflexion, the framework achieves an attack success rate of up to 78.3 percent, with reasoning-layer attacks proving most effective and more than half of successful attacks leading multiple agents to produce the same

What carries the argument

HAM³, the hierarchical attack framework that decomposes attacks into perception-layer perturbations of visual and textual inputs, communication-layer corruption of messages and topology, and reasoning-layer interference with cognitive pipelines.

Load-bearing premise

Multi-agent systems built on ReAct, Plan-and-Solve, and Reflexion using the GQA benchmark are representative of real-world multi-modal multi-agent deployments and that the reported attack success rates generalize beyond the specific experimental setup.

What would settle it

Applying the same hierarchical attacks to multi-agent systems on a different visual-question-answering benchmark or with a new reasoning paradigm and observing attack success rates well below 78.3 percent would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13213 by Ai Han, Hao Zhou, Junxing Hu, Tiru Wu, Wanqi Zhou, Yan Jiang.

**Figure 2.** Figure 2: Overview of the HAM3 attack framework in the multimodal multi-agent paradigm teractions. Consequently, risks such as shared-memory corruption, communication topology perturbations, and crosslayer interactions remain unexplored. To address this gap, we introduce HAM³, a hierarchical attack framework that analyzes how adversarial perturbations across the perception, communication, and reasoning layers pro… view at source ↗

**Figure 3.** Figure 3: Example of Perception layer attack 4.1. Experiment Setup Dataset. We adopt the GQA dataset [12], which is built upon scene graphs and requires more multi-step reasoning and tools usage than conventional Visual Question Answering (VQA) benchmarks [1, 7]. From the training split, we sample 5,984 image–question pairs, covering ten semantic categories: daily life, animal world, academic research, sports, nat… view at source ↗

**Figure 5.** Figure 5: summarizes the distribution of these error types across layers. In the perception layer, systemic errors (58.8%) exceed local errors (40.9%) by 17.9 points, meaning attacks already trigger coordinated failures at the earliest processing stage. In the communication layer, systemic (49.8%) and local errors (48.6%) are nearly balanced, suggesting that message disturbances can either remain localized or pr… view at source ↗

**Figure 4.** Figure 4: Analysis of Multimodal Attack Effects and CMC [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Multi-modal multi-agent systems (MM-MAS) have gained increasing attention for their capacity to enable complex reasoning and coordination across diverse modalities. As these systems continue to expand in scale and functionality, investigating their potential vulnerabilities has become increasingly important. However, existing studies on adversarial attacks in multi-agent systems primarily focus on isolated agents or unimodal settings, leaving the vulnerabilities of MM-MAS largely underexplored. To bridge this gap, we introduce HAM$^{3}$, a Hierarchical Attack framework for multi-modal multi-agent systems that decomposes attacks into three interconnected layers. Specifically, at the perception layer, HAM$^{3}$ mounts attacks by perturbing visual inputs, textual inputs, and their fused visual-textual representations. At the communication layer, it performs communication-level attacks that corrupt message content and interaction topology, such as manipulating shared context or communication links to distort collective information flow. At the reasoning layer, it conducts reasoning-level attacks that interfere with each agent's cognitive pipeline, biasing reasoning trajectories and ultimately compromising final decisions. We evaluate HAM$^{3}$ on the GQA benchmark through multi-agent systems built on distinct reasoning paradigms including ReAct, Plan-and-Solve, and Reflexion. Experiments demonstrate that our framework achieves an Attack Success Rate of up to 78.3%, with reasoning-layer attacks being the most effective. More than half of the successful attacks lead multiple agents to produce consistent errors. These findings offer valuable insights for building more robust and interpretable multi-agent intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HAM³ sketches a three-layer attack structure for multi-modal multi-agent systems that extends prior single-agent work, but the 78.3% ASR claim sits on unreported implementation choices and cannot be checked from the given details.

read the letter

The main takeaway is that this paper organizes attacks on multi-modal multi-agent systems into perception, communication, and reasoning layers, which is a straightforward way to map the problem space beyond isolated-agent or single-modality cases. They test the idea on GQA using ReAct, Plan-and-Solve, and Reflexion agents and report that reasoning-layer attacks perform best while more than half of successes produce matching errors across agents. That framing and the consistency observation are the parts worth noting if the numbers hold up. The paper does a decent job naming the attack surfaces at each layer and picking a public benchmark with established agent templates. Those choices make the setup easy to understand at a high level. The soft spot is exactly what the stress-test flags: the abstract states the 78.3% success rate and the layer rankings but supplies no account of how the perturbations are generated, what counts as a successful attack, or any baseline or ablation results. Without those pieces it is impossible to judge whether the rates reflect genuine hierarchical weaknesses or depend on unstated prompt details and GQA subset choices. The claim that these agent setups represent real deployments is also thin. This work is aimed at researchers who build or test robustness for collaborative vision-language agents. Someone already working on multi-agent safety could borrow the layer structure as a checklist, but they would need to implement the attacks themselves to get any usable numbers. I would send it to peer review. The layered framing is new enough that referees could push for the missing methods and controls, which would turn the current sketch into something more verifiable.

Referee Report

2 major / 2 minor

Summary. The paper introduces HAM³, a hierarchical attack framework for multi-modal multi-agent systems (MM-MAS) that decomposes attacks into perception-layer perturbations of visual/textual inputs, communication-layer corruption of messages and topology, and reasoning-layer interference with agent cognitive pipelines. It evaluates the framework on the GQA benchmark using multi-agent systems built on ReAct, Plan-and-Solve, and Reflexion, reporting an attack success rate of up to 78.3% (with reasoning-layer attacks most effective) and noting that more than half of successful attacks produce consistent errors across multiple agents.

Significance. If the empirical results hold with proper documentation, the work would be significant for highlighting structured vulnerabilities in emerging MM-MAS and for providing a layered taxonomy that could inform robustness research. The emphasis on consistent multi-agent errors and the comparison across reasoning paradigms are potentially useful contributions to adversarial evaluation in multi-agent settings.

major comments (2)

[Experimental Evaluation] The central claim of up to 78.3% ASR (and the ranking of reasoning-layer attacks) is load-bearing yet unsupported: the manuscript provides no definition of attack success, no description of how perturbations are generated at each layer, no baseline comparisons, and no statistical tests or error analysis (see Experimental Evaluation section and associated tables/figures).
[§5] The generalization claim that results on GQA with ReAct/Plan-and-Solve/Reflexion are representative of real-world MM-MAS is not substantiated; no ablation on attack parameters, no additional benchmarks, and no analysis of prompt sensitivity or agent configuration are reported, undermining the broader implications.

minor comments (2)

Add pseudocode or precise algorithmic descriptions for the three attack layers to allow reproducibility.
[§3] Clarify notation for fused visual-textual representations and communication topology in the framework definition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the experimental documentation and discussion of scope.

read point-by-point responses

Referee: [Experimental Evaluation] The central claim of up to 78.3% ASR (and the ranking of reasoning-layer attacks) is load-bearing yet unsupported: the manuscript provides no definition of attack success, no description of how perturbations are generated at each layer, no baseline comparisons, and no statistical tests or error analysis (see Experimental Evaluation section and associated tables/figures).

Authors: We agree that the Experimental Evaluation section needs expanded detail. In the revision we will add: (1) a formal definition of attack success rate, (2) explicit descriptions and pseudocode for perturbation generation at the perception, communication, and reasoning layers, (3) baseline comparisons against single-layer and non-hierarchical attacks, and (4) statistical significance tests together with error analysis. These changes will directly support the reported ASR figures and layer ranking. revision: yes
Referee: [§5] The generalization claim that results on GQA with ReAct/Plan-and-Solve/Reflexion are representative of real-world MM-MAS is not substantiated; no ablation on attack parameters, no additional benchmarks, and no analysis of prompt sensitivity or agent configuration are reported, undermining the broader implications.

Authors: We acknowledge the current evaluation is confined to GQA and the three reasoning paradigms. The revised manuscript will include ablations on attack parameters (e.g., perturbation magnitude and communication corruption rate), sensitivity analysis for prompts and agent configurations, and an expanded limitations discussion in §5 that more carefully qualifies the scope. Adding entirely new benchmarks is not feasible within the revision timeline, but we will strengthen the existing analysis and contextualize the results more conservatively. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical framework with no derivations or fitted predictions

full rationale

The manuscript introduces HAM³ as a novel hierarchical attack decomposition (perception, communication, reasoning layers) and reports empirical attack success rates on the public GQA benchmark for ReAct/Plan-and-Solve/Reflexion agents. No equations, parameter fitting, uniqueness theorems, or self-citation chains appear in the provided text. The central results are presented as direct experimental measurements rather than reductions of prior quantities by construction, satisfying the self-contained criterion for a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical attack paper. No free parameters, mathematical axioms, or new postulated entities are introduced beyond the descriptive framework itself.

pith-pipeline@v0.9.0 · 5571 in / 1061 out tokens · 70759 ms · 2026-05-14T20:12:13.167259+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Vqa: Visual question answering

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. Vqa: Visual question answering. InProceedings of the IEEE international conference on computer vision, pages 2425– 2433, 2015. 5

2015
[2]

Qwen2.5-vl technical report, 2025

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhao- hai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025. 5

2025
[3]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Informa- tion Processing Systems, 37:130185–130213, 2024

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Informa- tion Processing Systems, 37:130185–130213, 2024. 1

2024
[4]

Agent- dojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural In- formation Processing Systems, 37:82895–82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tram `er. Agent- dojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural In- formation Processing Systems, 37:82895–82920, 2024. 1

2024
[5]

Agentscope: A flexible yet ro- bust multi-agent platform.arXiv preprint arXiv:2402.14034,

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. Agentscope: A flexible yet ro- bust multi-agent platform.arXiv preprint arXiv:2402.14034,

work page arXiv
[6]

Figstep: Jailbreaking large vision-language models via typo- graphic visual prompts

Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. Figstep: Jailbreaking large vision-language models via typo- graphic visual prompts. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 23951–23959, 2025. 2

2025
[7]

Making the v in vqa matter: Elevating the role of image understanding in visual question answer- ing

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Ba- tra, and Devi Parikh. Making the v in vqa matter: Elevating the role of image understanding in visual question answer- ing. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913, 2017. 5

2017
[8]

Mdocagent: A multi-modal multi-agent framework for document understanding.arXiv preprint arXiv:2503.13964, 2025

Siwei Han, Peng Xia, Ruiyi Zhang, Tong Sun, Yun Li, Hongtu Zhu, and Huaxiu Yao. Mdocagent: A multi-modal multi-agent framework for document understanding.arXiv preprint arXiv:2503.13964, 2025. 2

work page arXiv 2025
[9]

Red-teaming llm multi-agent systems via commu- nication attacks

Pengfei He, Yuping Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. Red-teaming llm multi-agent systems via commu- nication attacks. InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 6726–6747, 2025. 2

2025
[10]

On the resilience of llm-based multi-agent collaboration with faulty agents.arXiv preprint arXiv:2408.00989, 2024

Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Michael R Lyu, and Maarten Sap. On the resilience of llm-based multi- agent collaboration with faulty agents.arXiv preprint arXiv:2408.00989, 2024. 1, 2

work page arXiv 2024
[11]

Evochart: A benchmark and a self-training approach towards real-world chart understand- ing

Muye Huang, Han Lai, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, and Jun Liu. Evochart: A benchmark and a self-training approach towards real-world chart understand- ing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3680–3688, 2025. 6

2025
[12]

Gqa: A new dataset for real-world visual reasoning and compositional question answering

Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6700–6709, 2019. 5

2019
[13]

Cow- pilot: a framework for autonomous and human-agent collab- orative web navigation

Faria Huq, Zora Zhiruo Wang, Frank F Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P Bigham, and Graham Neubig. Cow- pilot: a framework for autonomous and human-agent collab- orative web navigation. InProceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Associa- tion for Computational Linguistics: Human Language Tech- nologies (System...

2025
[14]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. 5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richard- son, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024. 5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Multi-modal and multi-agent systems meet rationality: A survey

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J Su, Camillo Jose Taylor, and Tanwi Mallick. Multi-modal and multi-agent systems meet rationality: A survey. InICML 2024 Workshop on LLMs and Cognition, 2024. 1

2024
[17]

M4sc: An mllm-based multi- modal, multi-task and multi-user semantic communication system.IEEE Wireless Communications, 32(5):40–47, 2025

Feibo Jiang, Siwei Tu, Jin Zhang, Li Dong, Kezhi Wang, Kun Yang, and Cunhua Pan. M4sc: An mllm-based multi- modal, multi-task and multi-user semantic communication system.IEEE Wireless Communications, 32(5):40–47, 2025. 2

2025
[18]

Camel: Communicative agents for” mind” exploration of large language model society.Ad- vances in neural information processing systems, 36:51991– 52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for” mind” exploration of large language model society.Ad- vances in neural information processing systems, 36:51991– 52008, 2023. 2

2023
[19]

V2x-sim: Multi-agent col- laborative perception dataset and benchmark for autonomous driving.IEEE robotics and automation letters, 7(4):10914– 10921, 2022

Yiming Li, Dekun Ma, Ziyan An, Zixun Wang, Yiqi Zhong, Siheng Chen, and Chen Feng. V2x-sim: Multi-agent col- laborative perception dataset and benchmark for autonomous driving.IEEE robotics and automation letters, 7(4):10914– 10921, 2022. 1

2022
[20]

Agent-omni: Test-time mul- timodal reasoning via model coordination for understanding anything.arXiv preprint arXiv:2511.02834, 2025

Huawei Lin, Yunzhi Shi, Tong Geng, Weijie Zhao, Wei Wang, and Ravender Pal Singh. Agent-omni: Test-time mul- timodal reasoning via model coordination for understanding anything.arXiv preprint arXiv:2511.02834, 2025. 2

work page arXiv 2025
[21]

Caml: Collaborative auxiliary modality learning for multi- agent systems.arXiv preprint arXiv:2502.17821, 2025

Rui Liu, Yu Shen, Peng Gao, Pratap Tokekar, and Ming Lin. Caml: Collaborative auxiliary modality learning for multi- agent systems.arXiv preprint arXiv:2502.17821, 2025. 1

work page arXiv 2025
[22]

Mm-safetybench: A benchmark for safety eval- uation of multimodal large language models

Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, and Yu Qiao. Mm-safetybench: A benchmark for safety eval- uation of multimodal large language models. InEuropean Conference on Computer Vision, pages 386–403. Springer,
[23]

Funcpoison: Poisoning func- tion library to hijack multi-agent autonomous driving sys- tems.arXiv preprint arXiv:2509.24408, 2025

Yuzhen Long and Songze Li. Funcpoison: Poisoning func- tion library to hijack multi-agent autonomous driving sys- tems.arXiv preprint arXiv:2509.24408, 2025. 2

work page arXiv 2025
[24]

Wsi-agents: A collaborative multi-agent system for multi-modal whole slide image analysis.arXiv preprint arXiv:2507.14680, 2025

Xinheng Lyu, Yuci Liang, Wenting Chen, Meidan Ding, Ji- aqi Yang, Guolin Huang, Daokun Zhang, Xiangjian He, and Linlin Shen. Wsi-agents: A collaborative multi-agent system for multi-modal whole slide image analysis.arXiv preprint arXiv:2507.14680, 2025. 2

work page arXiv 2025
[25]

Agents that reduce work and information over- load

Pattie Maes. Agents that reduce work and information over- load. InReadings in human–computer interaction, pages 811–821. Elsevier, 1995. 2

1995
[26]

Image-based prompt injection: Hijacking mul- timodal llms through visually embedded adversarial instruc- tions

Neha Nagaraja, Lan Zhang, Zhilong Wang, Bo Zhang, and Pawan Patil. Image-based prompt injection: Hijacking mul- timodal llms through visually embedded adversarial instruc- tions. In2025 3rd International Conference on Foundation and Large Language Models (FLLM), pages 916–922. IEEE,
[27]

Scaling large-language-model-based multi-agent collaboration

Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, et al. Scaling large language model-based multi-agent collaboration.arXiv preprint arXiv:2406.07155,

work page arXiv
[28]

A modern approach.Artificial Intelligence

Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach.Artificial Intelligence. Prentice-Hall, Eg- nlewood Cliffs, 25(27):79–80, 1995. 2

1995
[29]

Toolformer: Lan- guage models can teach themselves to use tools.Advances in neural information processing systems, 36:68539–68551,

Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Lan- guage models can teach themselves to use tools.Advances in neural information processing systems, 36:68539–68551,
[30]

Jail- break in pieces: Compositional adversarial attacks on multi- modal language models.arXiv preprint arXiv:2307.14539,

Erfan Shayegani, Yue Dong, and Nael Abu-Ghazaleh. Jail- break in pieces: Compositional adversarial attacks on multi- modal language models.arXiv preprint arXiv:2307.14539,

work page arXiv
[31]

Muma-tom: Multi- modal multi-agent theory of mind

Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, and Tianmin Shu. Muma-tom: Multi- modal multi-agent theory of mind. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1510– 1519, 2025. 1, 2

2025
[32]

Prompt injection attack to tool selection in llm agents.arXiv preprint arXiv:2504.19793, 2025

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injec- tion attack to tool selection in llm agents.arXiv preprint arXiv:2504.19793, 2025. 6

work page arXiv 2025
[33]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural in- formation processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural in- formation processing systems, 36:8634–8652, 2023. 5

2023
[34]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc- Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi- agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models

Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. InProceedings of the 61st an- nual meeting of the association for computational linguistics (volume 1: long papers), pages 2609–2634, 2023. 5

2023
[36]

Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews

Xintao Wang, Yunze Xiao, Jen-tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, et al. Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews. In Proceedings of the 62nd annual meeting of the associa- tion for computational linguistics (volume 1: Long papers), pages 1840–1873, 2024. 2

2024
[37]

Dissecting ad- versarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024

Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhut- dinov, Daniel Fried, and Aditi Raghunathan. Dissecting ad- versarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024. 2

work page arXiv 2024
[38]

Autogen: Enabling next-gen llm applica- tions via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applica- tions via multi-agent conversations. InFirst conference on language modeling, 2024. 2

2024
[39]

Embod- ied instruction following in unknown environments

Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Hang Yin, Yinan Liang, Angyuan Ma, Jiwen Lu, and Haibin Yan. Embod- ied instruction following in unknown environments. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 21825–21832. IEEE, 2025. 1

2025
[40]

Large multimodal agents: A survey,

Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, and Guanbin Li. Large multimodal agents: A survey.arXiv preprint arXiv:2402.15116, 2024. 1

work page arXiv 2024
[41]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022. 2, 5

2022
[42]

A survey on trustworthy llm agents: Threats and countermeasures

Miao Yu, Fanci Meng, Xinyun Zhou, Shilong Wang, Jun- yuan Mao, Linsey Pan, Tianlong Chen, Kun Wang, Xin- feng Li, Yongfeng Zhang, et al. A survey on trustworthy llm agents: Threats and countermeasures. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6216–6226, 2025. 2

2025
[43]

Injecagent: Benchmarking indirect prompt injections in tool- integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool- integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024. 2

2024
[44]

Multi-agent architecture search via agentic supernet.arXiv preprint arXiv:2502.04180, 2025

Guibin Zhang, Luyang Niu, Junfeng Fang, Kun Wang, Lei Bai, and Xiang Wang. Multi-agent architecture search via agentic supernet.arXiv preprint arXiv:2502.04180, 2025. 2

work page arXiv 2025
[45]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents. arXiv preprint arXiv:2410.02644, 2024. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2024
[46]

Demonstrations of integrity attacks in multi-agent systems

Can Zheng, Yuhan Cao, Xiaoning Dong, and Tianxing He. Demonstrations of integrity attacks in multi-agent systems. arXiv preprint arXiv:2506.04572, 2025. 2, 6

work page arXiv 2025
[47]

Revisiting the adversarial robustness of vision language models: a multimodal perspective.arXiv preprint arXiv:2404.19287, 2024

Wanqi Zhou, Shuanghao Bai, Danilo P Mandic, Qibin Zhao, and Badong Chen. Revisiting the adversarial robustness of vision language models: a multimodal perspective.arXiv preprint arXiv:2404.19287, 2024. 2

work page arXiv 2024
[48]

Corba: Contagious recur- sive blocking attacks on multi-agent systems based on large language models.arXiv preprint arXiv:2502.14529, 2025

Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, and Qing Guo. Corba: Contagious recur- sive blocking attacks on multi-agent systems based on large language models.arXiv preprint arXiv:2502.14529, 2025. 2

work page arXiv 2025
[49]

Image-to-text logic jailbreak: Your imagination can help you do anything

Xiaotian Zou, Ke Li, and Yongkang Chen. Image-to-text logic jailbreak: Your imagination can help you do anything. arXiv preprint arXiv:2407.02534, 2024. 2

work page arXiv 2024