Don't Trust Your Upstream: Exploiting LLM Multi-Agent System via Topology-Guided Adversarial Propagation
Pith reviewed 2026-05-17 02:59 UTC · model grok-4.3
The pith
Adversarial inputs can propagate through LLM multi-agent systems by following their communication topology from exposed edge agents to high-privilege ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a topology-guided attack scheme, built from reconnaissance of inter-agent dependencies, contamination propagation modeling, and hierarchical payload encapsulation, enables reliable multi-hop compromise of high-privilege agents starting from exposed edge agents. Experiments demonstrate success rates of 40%–78% across three common MAS frameworks and five topologies, rising to 85% on two real-world applications in twenty scenarios. The work further shows that a topology-trust mitigation strategy can block 94.8% of the composite attacks.
What carries the argument
Topology-guided adversarial propagation, which maps agent interaction structure, models how contamination travels along dependencies, and encapsulates payloads to survive reinterpretation by downstream agents.
If this is right
- The attack remains effective across multiple standard MAS frameworks and common communication topologies.
- Real-world MAS applications exhibit the same propagation vulnerability in diverse task scenarios.
- A mitigation that assigns trust according to agent position in the topology blocks nearly all instances of the attack.
Where Pith is reading between the lines
- MAS designers may need to add explicit checks on information provenance rather than assuming cooperative reinterpretation is safe.
- Similar dependency-based propagation risks could appear in other multi-component LLM pipelines that chain outputs without strong isolation.
Load-bearing premise
Downstream agents will reliably reinterpret and act on upstream outputs in ways that let adversarial contamination propagate to influence their behavior.
What would settle it
A controlled test in which agents in a representative MAS consistently sanitize or ignore adversarial content from upstream outputs without altering their downstream actions would show that propagation does not occur as modeled.
Figures
read the original abstract
The digital world is witnessing the rapid rise of LLM-based multi-agent systems (MASs) and their powerful applications. However, their security remains insufficiently understood, as existing evaluations are largely limited to narrow attack settings and may substantially underestimate the real risks of MAS deployments. Inspired by the MAS inter-agent dependencies, where upstream outputs are reinterpreted and executed by downstream agents, we propose a topology-aware attack scheme that propagates adversarial contamination from exposed edge agents to high-privilege agents to induce malicious behaviors. By combining topology reconnaissance, contamination propagation modeling, and hierarchical payload encapsulation, our approach overcomes the key challenges of black-box attacks and makes such multi-hop compromise practical. Experiments show that our approach achieves success rates of 40\%--78\% on three widely-used MAS frameworks under five topologies, and 85\% on two real-world MAS applications across 20 representative scenarios. The results reveal fundamental vulnerabilities in MASs that have been overlooked by prior studies. Based on these findings, we propose a topology-trust mitigation that blocks 94.8\% of such composite attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a topology-guided adversarial attack on LLM-based multi-agent systems that performs black-box reconnaissance to map inter-agent dependencies, then propagates contamination from exposed edge agents to high-privilege agents via hierarchical payload encapsulation and modeled propagation. Experiments report success rates of 40–78% across three standard MAS frameworks (AutoGen, CrewAI, LangGraph) under five topologies and 85% success on two real-world MAS applications over 20 scenarios; a topology-trust mitigation is introduced that blocks 94.8% of the composite attacks.
Significance. If the experimental results prove robust, the work identifies a previously under-examined attack surface arising from the default reinterpretation of upstream outputs by downstream agents in MAS topologies. The evaluation on unmodified, widely deployed frameworks plus real applications supplies concrete evidence of practical risk and supplies a concrete mitigation, which could inform both defensive design and future security evaluations of agentic systems.
major comments (2)
- [Attack Model and Experimental Evaluation] The central success-rate claims (40–78% and 85%) rest on the assumption that downstream agents will reinterpret and execute upstream outputs without filtering or summarization. No ablation is reported that inserts even lightweight validation, safety alignment, or output sanitization at intermediate agents; such a test is required to establish whether the reported rates survive realistic MAS deployments.
- [Experimental Evaluation] The experimental section supplies no description of trial counts, statistical tests, success criteria, or data-exclusion rules. Without these elements the numerical results cannot be evaluated for reproducibility or for support of the claim that the attack overcomes black-box challenges.
minor comments (1)
- [Abstract and §4] The abstract and evaluation sections refer to “five topologies” and “20 representative scenarios” without enumerating them or justifying their selection; an explicit list or appendix would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below and indicate the changes planned for the revised manuscript.
read point-by-point responses
-
Referee: [Attack Model and Experimental Evaluation] The central success-rate claims (40–78% and 85%) rest on the assumption that downstream agents will reinterpret and execute upstream outputs without filtering or summarization. No ablation is reported that inserts even lightweight validation, safety alignment, or output sanitization at intermediate agents; such a test is required to establish whether the reported rates survive realistic MAS deployments.
Authors: We agree that the reported success rates assume direct reinterpretation of upstream outputs by downstream agents, which reflects the default behavior in the evaluated MAS frameworks but may not hold in deployments with added safeguards. In the revised manuscript we will add a new ablation study that inserts lightweight validation, safety alignment, and output sanitization at intermediate agents. This will quantify how the attack success rates change under more realistic filtering conditions and clarify the scope of the vulnerability. revision: yes
-
Referee: [Experimental Evaluation] The experimental section supplies no description of trial counts, statistical tests, success criteria, or data-exclusion rules. Without these elements the numerical results cannot be evaluated for reproducibility or for support of the claim that the attack overcomes black-box challenges.
Authors: We acknowledge that the experimental methodology section lacked sufficient detail on reproducibility. In the revised manuscript we will expand the relevant sections to report the exact number of trials conducted for each configuration, the statistical tests and confidence intervals used, precise definitions of success criteria for each task and topology, and any data-exclusion rules applied. These additions will allow independent evaluation of the results. revision: yes
Circularity Check
No significant circularity; claims rest on direct experiments
full rationale
The paper presents a topology-aware attack on LLM multi-agent systems via reconnaissance, propagation modeling, and payload encapsulation, validated through empirical success rates (40-78% on frameworks, 85% on applications) rather than any mathematical derivation or fitted parameters. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text; the central claims are tested on unmodified standard MAS frameworks (AutoGen, CrewAI, LangGraph) and real-world scenarios, making the work self-contained against external benchmarks. This matches the reader's assessment of minimal circularity risk.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Upstream outputs are reinterpreted and executed by downstream agents in MAS
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the adversarial contamination propagation model (ACPM), which captures the dynamic diffusion of contamination across the MAS task network... Ti(t) = min [1, Ti(t−1) + (1−Ti(t−1))·Ii(t)]
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hierarchical payload encapsulation scheme (HPES)... recursive construction In = F(n)adv(ψ)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Conjunctive Prompt Attacks in Multi-Agent LLM Systems
Conjunctive prompt attacks split adversarial elements across agents and routing paths in multi-agent LLM systems, evading isolated defenses and succeeding through topology-aware optimization.
Reference graph
Works this paper leans on
-
[1]
Chateda: A large language model powered autonomous agent for eda,
H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “Chateda: A large language model powered autonomous agent for eda,”Trans. Comp.-Aided Des. Integ. Cir . Sys., vol. 43, 2024
work page 2024
-
[2]
PentestGPT: Evaluating and harnessing large language models for automated penetration testing,
G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, and S. Rass, “PentestGPT: Evaluating and harnessing large language models for automated penetration testing,” inProceedings of the USENIX Security Symposium (USENIX Security), 2024
work page 2024
-
[3]
A large language model-based multi-agent manufacturing system for intelligent shopfloors,
Z. Zhao, D. Tang, C. Liu, L. Wang, Z. Zhang, H. Zhu, K. Chen, Q. Nie, and Y . Ji, “A large language model-based multi-agent manufacturing system for intelligent shopfloors,”Advanced En- gineering Informatics, vol. 69, 2026
work page 2026
-
[4]
A. Ghafarollahi and M. J. Buehler, “Protagents: protein dis- covery via large language model multi-agent collaborations combining physics and machine learning,”Digital Discovery, vol. 3, 2024
work page 2024
-
[5]
Mdagents: An adaptive collaboration of llms for medical decision-making,
Y . Kim, C. Park, H. Jeong, Y . S. Chan, X. Xu, D. McDuff, H. Lee, M. Ghassemi, C. Breazeal, and H. W. Park, “Mdagents: An adaptive collaboration of llms for medical decision-making,” Advances in Neural Information Processing Systems, vol. 37, 2024
work page 2024
-
[6]
Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inProceedings of the Conference on Neural Infor- mation Processing Systems Datasets and Benchmarks Track (NeurIPS), 2024
work page 2024
-
[7]
Dissecting adversarial robustness of mul- timodal LM agents,
C. H. Wu, R. R. Shah, J. Y . Koh, R. Salakhutdinov, D. Fried, and A. Raghunathan, “Dissecting adversarial robustness of mul- timodal LM agents,” inProceedings of the International Con- ference on Learning Representations (ICLR), 2025
work page 2025
-
[8]
Attacking vision-language computer agents via pop-ups,
Y . Zhang, T. Yu, and D. Yang, “Attacking vision-language computer agents via pop-ups,” inProceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2025
work page 2025
-
[9]
NetSafe: Exploring the topological safety of multi-agent system,
M. Yu, S. Wang, G. Zhang, J. Mao, C. Yin, Q. Liu, K. Wang, Q. Wen, and Y . Wang, “NetSafe: Exploring the topological safety of multi-agent system,” inProceedings of the Findings of the Association for Computational Linguistics (Findings of ACL), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds., 2025
work page 2025
-
[10]
On the resilience of LLM-based multi-agent collaboration with faulty agents,
J. tse Huang, J. Zhou, T. Jin, X. Zhou, Z. Chen, W. Wang, Y . Yuan, M. Lyu, and M. Sap, “On the resilience of LLM-based multi-agent collaboration with faulty agents,” inProceedings of the International Conference on Machine Learning (ICML), 2025
work page 2025
-
[11]
Red- teaming llm multi-agent systems via communication attacks,
P. He, Y . Lin, S. Dong, H. Xu, Y . Xing, and H. Liu, “Red- teaming llm multi-agent systems via communication attacks,” inProceedings of the Findings of the Association for Compu- tational Linguistics (Findings of ACL), 2025
work page 2025
-
[12]
Ip leakage attacks targeting llm-based multi-agent systems,
L. Wang, W. Wang, S. Wang, Z. Li, Z. Ji, Z. Lyu, D. Wu, and S.- C. Cheung, “Ip leakage attacks targeting llm-based multi-agent systems,”arXiv preprint arXiv:2505.12442, 2025
-
[13]
Automating prompt leakage attacks on large language models using agentic approach,
T. Sternak, D. Runje, D. Grano ˇsa, and C. Wang, “Automating prompt leakage attacks on large language models using agentic approach,”arXiv preprint arXiv:2502.12630, 2025
-
[14]
Y . Wang, M. Zhang, J. Sun, C. Wang, M. Yang, H. Xue, J. Tao, R. Duan, and J. Liu, “Mirage in the eyes: hallucination attack on multi-modal large language models with only attention sink,” in Proceedings of the USENIX Conference on Security Symposium (USENIX Security), 2025
work page 2025
-
[15]
Web fraud attacks against llm-driven multi-agent systems,
D. Kong, H. Peng, Y . Zhang, L. Zhao, Z. Xu, S. Lin, C. Lin, and M. Han, “Web fraud attacks against llm-driven multi-agent systems,”arXiv preprint arXiv:2509.01211, 2025
-
[16]
Denial- of-service poisoning attacks against large language models,
K. Gao, T. Pang, C. Du, Y . Yang, S.-T. Xia, and M. Lin, “Denial- of-service poisoning attacks against large language models,” arXiv preprint arXiv:2410.10760, 2024
-
[17]
G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems,
S. Wang, G. Zhang, M. Yu, G. Wan, F. Meng, C. Guo, K. Wang, and Y . Wang, “G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems,” inProceed- ings of the Findings of the Association for Computational Linguistics (Findings of ACL), 2025
work page 2025
-
[18]
A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704, 2025
S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “A practical memory injection attack against llm agents,”arXiv preprint arXiv:2503.03704, 2025
-
[19]
Agentpoison: red-teaming llm agents via poisoning memory or knowledge bases,
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: red-teaming llm agents via poisoning memory or knowledge bases,” inProceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[20]
Unveiling privacy risks in LLM agent memory,
B. Wang, W. He, S. Zeng, Z. Xiang, Y . Xing, J. Tang, and P. He, “Unveiling privacy risks in LLM agent memory,” inProceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds., 2025
work page 2025
-
[21]
Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors,
W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qianet al., “Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors,” inProceedings of the International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[22]
Y . Kong, J. Ruan, Y . Chen, B. Zhang, T. Bao, S. Shiwei, X. Hu, H. Mao, Z. Li, X. Zenget al., “Tptu-v2: Boosting task planning and tool usage of large language model-based agents in real- world industry systems,” inProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
work page 2024
-
[23]
Toolformer: language models can teach themselves to use tools,
T. Schick, J. Dwivedi-Yu, R. Dess ´ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: language models can teach themselves to use tools,” inProceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[24]
Masrouter: Learning to route llms for multi-agent sys- tems,
Y . Yue, G. Zhang, B. Liu, G. Wan, K. Wang, D. Cheng, and Y . Qi, “Masrouter: Learning to route llms for multi-agent sys- tems,” inProceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2025
work page 2025
-
[25]
Metagpt: The multi-agent framework,
MetaGPT, “Metagpt: The multi-agent framework,” 2025. [Online]. Available: https://www.deepwisdom.ai/
work page 2025
-
[26]
Darwin-lfl, “Langmanus,” 2025. [Online]. Available: https: //github.com/Darwin-lfl/langmanus?tab=readme-ov-file
work page 2025
-
[27]
Autogen: Enabling next-gen llm applications via multi-agent conversations,
Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “Autogen: Enabling next-gen llm applications via multi-agent conversations,” inProceedings of the First Conference on Language Modeling (COLM), 2024
work page 2024
- [28]
-
[29]
Cut the crap: An economical com- munication pipeline for LLM-based multi-agent systems,
G. Zhang, Y . Yue, Z. Li, S. Yun, G. Wan, K. Wang, D. Cheng, J. X. Yu, and T. Chen, “Cut the crap: An economical com- munication pipeline for LLM-based multi-agent systems,” in Proceedings of the International Conference on Learning Rep- resentations (ICLR), 2025
work page 2025
-
[30]
Multi-agent design: Optimiz- ing agents with better prompts and topologies,
H. Zhou, X. Wan, R. Sun, H. Palangi, S. Iqbal, I. Vuli ´c, A. Korhonen, and S. ¨O. Arık, “Multi-agent design: Optimiz- ing agents with better prompts and topologies,”arXiv preprint arXiv:2502.02533, 2025
-
[31]
Z. Wang, Y . Wang, X. Liu, L. Ding, M. Zhang, J. Liu, and M. Zhang, “Agentdropout: Dynamic agent elimination for token-efficient and high-performance llm-based multi-agent col- laboration,” 2025. 14
work page 2025
-
[32]
Topological structure learning should be a research priority for llm-based multi-agent systems,
J. Yang, M. Zhang, Y . Jin, H. Chen, Q. Wen, L. Lin, Y . He, W. Xu, J. Evans, and J. Wang, “Topological structure learning should be a research priority for llm-based multi-agent systems,” arXiv preprint arXiv:2505.22467, 2025
-
[33]
X. Shen, Y . Liu, Y . Dai, Y . Wang, R. Miao, Y . Tan, S. Pan, and X. Wang, “Understanding the information propagation effects of communication topologies in LLM-based multi-agent systems,” inProceedings of the Conference on Empirical Methods in Nat- ural Language Processing (EMNLP), C. Christodoulopoulos, T. Chakraborty, C. Rose, and V . Peng, Eds., 2025
work page 2025
-
[34]
Agentoccam: A simple yet strong baseline for LLM-based web agents,
K. Yang, Y . Liu, S. Chaudhary, R. Fakoor, P. Chaudhari, G. Karypis, and H. Rangwala, “Agentoccam: A simple yet strong baseline for LLM-based web agents,” inProceedings of the International Conference on Learning Representations (ICLR), 2025
work page 2025
-
[35]
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIV ACY LEAKAGE,
Z. Liao, L. Mo, C. Xu, M. Kang, J. Zhang, C. Xiao, Y . Tian, B. Li, and H. Sun, “EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIV ACY LEAKAGE,” inProceedings of the International Conference on Learning Representations (ICLR), 2025
work page 2025
-
[36]
Agrail: A lifelong agent guardrail with effective and adaptive safety detection,
W. Luo, S. Dai, X. Liu, S. Banerjee, H. Sun, M. Chen, and C. Xiao, “Agrail: A lifelong agent guardrail with effective and adaptive safety detection,” inProceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2025
work page 2025
-
[37]
Agent smith: a single image can jailbreak one million multimodal llm agents exponentially fast,
X. Gu, X. Zheng, T. Pang, C. Du, Q. Liu, Y . Wang, J. Jiang, and M. Lin, “Agent smith: a single image can jailbreak one million multimodal llm agents exponentially fast,” inProceedings of the International Conference on Machine Learning (ICML), 2024
work page 2024
-
[38]
Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases,
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases,” inProceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[39]
Caution for the environment: Multimodal LLM agents are susceptible to environmental distractions,
X. Ma, Y . Wang, Y . Yao, T. Yuan, A. Zhang, Z. Zhang, and H. Zhao, “Caution for the environment: Multimodal LLM agents are susceptible to environmental distractions,” inProceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds., 2025
work page 2025
-
[40]
Great, now write an article about that: The crescendo{Multi-Turn}{LLM}jailbreak attack,
M. Russinovich, A. Salem, and R. Eldan, “Great, now write an article about that: The crescendo{Multi-Turn}{LLM}jailbreak attack,” inProceedings of the USENIX Security Symposium (USENIX Security), 2025
work page 2025
-
[41]
X. Shen, Z. Chen, M. Backes, Y . Shen, and Y . Zhang, “”do any- thing now”: Characterizing and evaluating in-the-wild jailbreak prompts on large language models,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2024
work page 2024
-
[42]
Osworld: benchmarking mul- timodal agents for open-ended tasks in real computer envi- ronments,
T. Xie, D. Zhang, J. Chen, X. Li, S. Zhao, R. Cao, T. J. Hua, Z. Cheng, D. Shin, F. Lei, Y . Liu, Y . Xu, S. Zhou, S. Savarese, C. Xiong, V . Zhong, and T. Yu, “Osworld: benchmarking mul- timodal agents for open-ended tasks in real computer envi- ronments,” inProceedings of the International Conference on Neural Information Processing Systems (NIPS), 2024
work page 2024
-
[43]
Infecting LLM agents via generalizable adversarial attack,
W. Yu, K. Hu, T. Pang, C. Du, M. Lin, and M. Fredrikson, “Infecting LLM agents via generalizable adversarial attack,” in Proceedings of the Conference on Neural Information Process- ing Systems Workshop (NeurIPS Workshop), 2025
work page 2025
-
[44]
A. Amayuelas, X. Yang, A. Antoniades, W. Hua, L. Pan, and W. Y . Wang, “Multiagent collaboration attack: Investigating adversarial attacks in large language model collaborations via debate,” inProceedings of the Findings of the Association for Computational Linguistics (EMNLP), 2024
work page 2024
-
[45]
Flooding spread of manipu- lated knowledge in llm-based multi-agent communities,
T. Ju, Y . Wang, X. Ma, P. Cheng, H. Zhao, Y . Wang, L. Liu, J. Xie, Z. Zhang, and G. Liu, “Flooding spread of manipu- lated knowledge in llm-based multi-agent communities,”arXiv preprint arXiv:2407.07791, 2024
-
[46]
OpenAI, “Gpt-4v(ision),” 2025. [Online]. Available: https: //openai.com/index/gpt-4v-system-card/
work page 2025
-
[47]
Aliyun, “Qwen-vl-max,” 2025. [Online]. Available: https: //modelscope.cn/studios/qwen/Qwen-VL-Max
work page 2025
-
[48]
V olcengine, “Doubao-vision-pro,” 2025. [Online]. Available: https://www.volcengine.com/product/doubao
work page 2025
-
[49]
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
A. Fourney, G. Bansal, H. Mozannar, C. Tan, E. Salinas, Erkang, Zhu, F. Niedtner, G. Proebsting, G. Bassman, J. Gerrits, J. Alber, P. Chang, R. Loynd, R. West, V . Dibia, A. Awadallah, E. Ka- mar, R. Hosn, and S. Amershi, “Magentic-one: A generalist multi-agent system for solving complex tasks,”arXiv preprint arXiv:2411.04468, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
Owl: Optimized workforce learning for general multi-agent assistance in real-world task automation,
camel ai.org, “Owl: Optimized workforce learning for general multi-agent assistance in real-world task automation,” 2025. [Online]. Available: https://github.com/camel-ai/owl
work page 2025
-
[51]
LIMA: Less is more for alignment,
C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y . Mao, X. Ma, A. Efrat, P. Yu, L. YU, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, and O. Levy, “LIMA: Less is more for alignment,” inPro- ceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[52]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schul- man, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” inProceedings of the Conference on Neural Information P...
work page 2022
-
[53]
Direct preference optimization: Your language model is secretly a reward model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” inProceedings of the Con- ference on Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[54]
Defending large language models against jailbreak attacks through chain of thought prompting,
Y . Cao, N. Gu, X. Shen, D. Yang, and X. Zhang, “Defending large language models against jailbreak attacks through chain of thought prompting,” inProceedings of the International Conference on Networking and Network Applications (NaNA), 2024
work page 2024
-
[55]
J. Mao, F. Meng, Y . Duan, M. Yu, X. Jia, J. Fang, Y . Liang, K. Wang, and Q. Wen, “Agentsafe: Safeguarding large language model-based multi-agent systems via hierarchical data manage- ment,”arXiv preprint arXiv:2503.04392, 2025
-
[56]
Pleak: Prompt leaking attacks against large language model appli- cations,
B. Hui, H. Yuan, N. Gong, P. Burlina, and Y . Cao, “Pleak: Prompt leaking attacks against large language model appli- cations,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2024
work page 2024
-
[57]
Novel universal bypass for all major llms,
C. McCauley, K. Yeung, J. Martin, and K. Schulz, “Novel universal bypass for all major llms,” 2025. [Online]. Available: https://hiddenlayer.com/innovation-hub/ novel-universal-bypass-for-all-major-llms/
work page 2025
-
[58]
Enhancing jail- break attacks on llms via persona prompts,
Z. Zhang, P. Zhao, D. Ye, and H. Wang, “Enhancing jail- break attacks on llms via persona prompts,”arXiv preprint arXiv:2507.22171, 2025
-
[59]
Universal and Transferable Adversarial Attacks on Aligned Language Models
A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[60]
A study of model iterations of fitts’ law and its application to human–computer interactions,
H. Xiao, Y . Sun, Z. Duan, Y . Huo, J. Liu, M. Luo, Y . Li, and Y . Zhang, “A study of model iterations of fitts’ law and its application to human–computer interactions,”Applied Sciences, vol. 14, 2024
work page 2024
- [61]
-
[62]
Anthropic, “claude-3-7-sonnet,” 2025. [Online]. Available: https://www.anthropic.com/news/claude-3-7-sonnet 15
work page 2025
-
[63]
Deepseek, “deepseek-r1,” 2025. [Online]. Available: https: //api-docs.deepseek.com/news/news250528
work page 2025
-
[64]
Microsoft, “playwright-mcp,” 2025. [Online]. Available: https: //github.com/microsoft/playwright-mcp
work page 2025
-
[65]
mark3labs, “filesystem-mcp,” 2025. [Online]. Available: https: //github.com/mark3labs/mcp-filesystem-server
work page 2025
-
[66]
SimCSE: Simple contrastive learning of sentence embeddings,
T. Gao, X. Yao, and D. Chen, “SimCSE: Simple contrastive learning of sentence embeddings,” inEmpirical Methods in Natural Language Processing (EMNLP), 2021
work page 2021
-
[67]
Vpi-bench: Visual prompt injection attacks for computer-use agents,
T. Cao, B. Lim, Y . Liu, Y . Sui, Y . Li, S. Deng, L. Lu, N. Oo, S. Yan, and B. Hooi, “Vpi-bench: Visual prompt injection attacks for computer-use agents,”arXiv preprint arXiv:2506.02456, 2025. Appendix A. Ethics Considerations Research Scope and Ethical Boundaries. This research strictly adheres to ethical standards for security and AI system evaluation....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.