pith. machine review for the scientific record. sign in

arxiv: 2410.07283 · v1 · submitted 2024-10-09 · 💻 cs.MA · cs.AI· cs.CR

Recognition: 2 theorem links

· Lean Theorem

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:28 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.CR
keywords prompt injectionmulti-agent systemsLLM securityprompt infectionAI agent vulnerabilitiesself-replicating attacks
0
0 comments X

The pith

Malicious prompts can self-replicate from one LLM agent to others in multi-agent systems, spreading like a virus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prompt injection attacks are no longer limited to single models but can jump between agents when one passes instructions to the next. A single infected prompt gets the receiving agent to execute harmful actions and then forward the same malicious instruction onward. This creates silent propagation that enables data theft, scams, and system disruption without direct access to every agent. Experiments confirm the effect persists even when agents do not share all communications publicly. The authors also show that adding LLM Tagging to existing safeguards reduces spread, highlighting that current single-agent defenses fall short for interconnected setups.

Core claim

Prompt Infection is an LLM-to-LLM attack in which a malicious prompt, once injected into one agent, causes that agent to execute the harmful task and then embed the same prompt into messages sent to peer agents, allowing the infection to replicate across the system without requiring direct external input to each agent.

What carries the argument

Prompt Infection, the self-replicating malicious instruction that exploits inter-agent message passing to propagate itself.

If this is right

  • A single entry point can compromise an entire multi-agent workflow through silent replication.
  • Standard single-agent prompt injection defenses fail to stop system-wide effects.
  • Data exfiltration and misinformation campaigns can scale automatically once one agent is reached.
  • Combining LLM Tagging with existing safeguards measurably limits further spread.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of agent networks may need mandatory message sanitization at every hop rather than at the edge only.
  • Testing protocols for new multi-agent applications should include deliberate infection attempts as a standard check.
  • The same replication pattern could appear in other structured communication systems such as tool-calling chains or workflow orchestrators.

Load-bearing premise

Agents will execute and forward malicious instructions received from other agents without built-in refusal or detection of the replication attempt.

What would settle it

Run a controlled multi-agent simulation in which every agent is given an explicit rule to refuse any message containing instructions to replicate or spread content to peers, then measure whether the original malicious prompt still propagates.

read the original abstract

As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces 'Prompt Infection,' a novel LLM-to-LLM prompt injection attack in multi-agent systems where malicious prompts self-replicate across interconnected agents like a computer virus. It claims this enables silent propagation leading to data theft, scams, misinformation, and system disruption. Extensive experiments demonstrate high susceptibility even in partially shared communication setups, and LLM Tagging is proposed as a defense that, combined with existing safeguards, significantly reduces spread.

Significance. If the empirical results hold under realistic conditions, this identifies a critical new attack surface in multi-agent LLM systems, which are rapidly being adopted. The work provides concrete evidence of propagation risks beyond single-agent prompt injection and offers a practical defense, highlighting the need for system-level security measures. The focus on partially shared communications is a strength, as is the framing of the attack as viral self-replication.

major comments (2)
  1. [§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.
  2. [§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly state the number of agents, LLM models (e.g., GPT-4, Llama variants), and exact propagation success rates from the experiments to allow readers to gauge the scale of the findings without reading the full experimental section.
  2. [Figure 1] Figure 1 (infection propagation diagram): The visual could be improved by adding arrows or labels distinguishing the initial injection step from subsequent forwarding steps, and by indicating whether communications are fully or partially shared in each panel.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. The comments highlight important gaps in our experimental evaluation and defense assessment. We address each point below and will revise the manuscript to incorporate additional experiments and quantitative metrics as suggested.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.

    Authors: We acknowledge that our initial experiments in Section 4 focused on baseline multi-agent configurations with varying levels of communication sharing to isolate the self-replication mechanism. Standard safety prompts were not explicitly added in those runs. We agree this limits direct generalization to fully safeguarded deployed systems. In the revised version, we will add new experiments that incorporate common safety system prompts (e.g., refusal instructions against propagation) and report the resulting infection rates and propagation hops across the same agent setups and LLM backends. revision: yes

  2. Referee: [§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.

    Authors: We agree that the current presentation of LLM Tagging in Section 5 would benefit from explicit quantitative comparisons. The manuscript states that the defense, when combined with safeguards, significantly reduces spread, but does not include side-by-side metrics. In the revision, we will add tables and figures reporting infection rate reductions (as percentages), average hops before containment, and success rates for tagged versus untagged conditions, evaluated across identical agent configurations and multiple LLM backends. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical demonstration of prompt infection attack

full rationale

The paper introduces Prompt Infection as an empirical attack vector and supports its claims through direct experiments on multi-agent LLM interactions rather than any derivation chain, fitted parameters, or first-principles predictions. No equations, self-definitional constructs, or load-bearing self-citations appear; the susceptibility results and proposed LLM Tagging defense follow from the reported experimental outcomes in shared and partially shared communication setups. The work is self-contained against external benchmarks as a demonstration study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on domain assumptions about LLM compliance with received prompts and introduces a new attack concept without independent prior evidence.

axioms (1)
  • domain assumption LLM agents will execute and forward malicious instructions received from peer agents without refusal
    This is required for the self-replication to occur as described in the abstract.
invented entities (1)
  • Prompt Infection no independent evidence
    purpose: To name and conceptualize the self-replicating prompt injection attack across agents
    New term coined for the observed phenomenon; no independent evidence provided outside the paper's experiments.

pith-pipeline@v0.9.0 · 5483 in / 1189 out tokens · 47650 ms · 2026-05-15T19:28:47.471286+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

    cs.CR 2026-04 accept novelty 8.0

    Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.

  2. Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries

    cs.CR 2026-05 unverdicted novelty 7.0

    Identifies concrete attacks from a malicious Provider on SAGA and proposes SAGA-BFT, SAGA-MON, SAGA-AUD, and SAGA-HYB mitigations offering different security-performance trade-offs.

  3. FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

    cs.CR 2026-05 unverdicted novelty 7.0

    FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

  4. The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

    cs.CR 2026-05 unverdicted novelty 7.0

    PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in Age...

  5. EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

    cs.AI 2026-05 unverdicted novelty 7.0

    EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...

  6. Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense

    cs.CR 2026-05 unverdicted novelty 7.0

    Autonomous LLM agents can host self-propagating worms via persistent state re-entry, demonstrated with automated analysis tools and blocked by a formal no-propagation defense on three frameworks.

  7. Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

    cs.CR 2026-03 conditional novelty 7.0

    Stage-level tracking of prompt injection reveals that write-node placement and model-specific behaviors determine attack outcomes more than initial exposure in LLM pipelines.

  8. When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

    cs.CR 2026-05 unverdicted novelty 6.0

    Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

  9. MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

    cs.LG 2026-05 unverdicted novelty 6.0

    MAGIQ introduces a post-quantum secure system for policy definition, enforcement, and accountability in multi-agent AI using novel cryptographic protocols and UC framework proofs.

  10. ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

    cs.CR 2026-05 unverdicted novelty 6.0

    ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

  11. When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

    cs.CR 2026-05 unverdicted novelty 6.0

    Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of sus...

  12. HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems

    cs.CR 2026-04 unverdicted novelty 6.0

    HDP is a lightweight protocol that binds human authorization to sessions via signed append-only token chains, enabling offline verification of delegation provenance using only an Ed25519 public key and session identifier.

  13. Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems

    cs.MA 2026-05 unverdicted novelty 5.0

    Safety constraints in LLM-based multi-agent systems commonly weaken during execution through memory, communication, and tool use, requiring them to be maintained as explicit state rather than asserted once.

  14. Insider Attacks in Multi-Agent LLM Consensus Systems

    cs.MA 2026-05 unverdicted novelty 5.0

    A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.

  15. A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents

    cs.AI 2026-05 unverdicted novelty 5.0

    Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.

  16. SoK: Security of Autonomous LLM Agents in Agentic Commerce

    cs.CR 2026-04 unverdicted novelty 5.0

    The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.

  17. Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

    cs.CL 2026-05 unverdicted novelty 4.0

    The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...

  18. CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

    cs.CR 2026-04 unverdicted novelty 4.0

    CASCADE is a cascaded hybrid detector that combines fast regex/entropy filtering, BGE embeddings with local LLM fallback, and output pattern checks to achieve 95.85% precision and 6.06% false-positive rate against pro...

Reference graph

Works this paper leans on

101 extracted references · 101 canonical work pages · cited by 18 Pith papers · 22 internal anchors

  1. [1]

    PsySafe : A Comprehensive Framework for Psychological -based Attack , Defense , and Evaluation of Multi -agent System Safety , August 2024 d

    Zhang, Zaibin and Zhang, Yongting and Li, Lijun and Gao, Hongzhi and Wang, Lijun and Lu, Huchuan and Zhao, Feng and Qiao, Yu and Shao, Jing , month = aug, year =. doi:10.48550/arXiv.2401.11880 , abstract =

  2. [3]

    Tian, Yu and Yang, Xiao and Zhang, Jingyuan and Dong, Yinpeng and Su, Hang , month = feb, year =. Evil

  3. [4]

    Not what you've signed up for:

    Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , month = may, year =. Not what you've signed up for:

  4. [5]

    , month = sep, year =

    Zhang, Wenxiao and Kong, Xiangrui and Dewitt, Conan and Braunl, Thomas and Hong, Jin B. , month = sep, year =. A

  5. [6]

    StruQ : Defending Against Prompt Injection with Structured Queries , September 2024

    Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , month = sep, year =. doi:10.48550/arXiv.2402.06363 , abstract =

  6. [7]

    and Cai, Carrie J

    Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , month = aug, year =. Generative

  7. [8]

    Breaking

    Zhang, Boyang and Tan, Yicong and Shen, Yun and Salem, Ahmed and Backes, Michael and Zannettou, Savvas and Zhang, Yang , month = jul, year =. Breaking

  8. [9]

    Liu, Yi and Deng, Gelei and Li, Yuekang and Wang, Kailong and Wang, Zihao and Wang, Xiaofeng and Zhang, Tianwei and Liu, Yepang and Wang, Haoyu and Zheng, Yan and Liu, Yang , month = mar, year =. Prompt

  9. [10]

    Formalizing and

    Liu, Yupei and Jia, Yuqi and Geng, Runpeng and Jia, Jinyuan and Gong, Neil Zhenqiang , month = jun, year =. Formalizing and

  10. [11]

    Gu, Xiangming and Zheng, Xiaosen and Pang, Tianyu and Du, Chao and Liu, Qian and Wang, Ye and Jiang, Jing and Lin, Min , month = jun, year =. Agent

  11. [12]

    Flooding

    Ju, Tianjie and Wang, Yiting and Ma, Xinbei and Cheng, Pengzhou and Zhao, Haodong and Wang, Yulong and Liu, Lifeng and Xie, Jian and Zhang, Zhuosheng and Liu, Gongshen , month = jul, year =. Flooding

  12. [13]

    , month = aug, year =

    Huang, Jen-tse and Zhou, Jiaxu and Jin, Tailin and Zhou, Xuhui and Chen, Zixi and Wang, Wenxuan and Yuan, Youliang and Sap, Maarten and Lyu, Michael R. , month = aug, year =. On the

  13. [14]

    Yuan, Youliang and Jiao, Wenxiang and Wang, Wenxuan and Huang, Jen-tse and He, Pinjia and Shi, Shuming and Tu, Zhaopeng , month = mar, year =

  14. [15]

    Automatic and

    Liu, Xiaogeng and Yu, Zhiyuan and Zhang, Yizhe and Zhang, Ning and Xiao, Chaowei , month = mar, year =. Automatic and

  15. [16]

    Defending

    Hines, Keegan and Lopez, Gary and Hall, Matthew and Zarfati, Federico and Zunger, Yonatan and Kiciman, Emre , month = mar, year =. Defending

  16. [17]

    Perez, Fábio and Ribeiro, Ian , month = nov, year =. Ignore. doi:10.48550/arXiv.2211.09527 , abstract =

  17. [18]

    Piet, Julien and Alrashed, Maha and Sitawarin, Chawin and Chen, Sizhe and Wei, Zeming and Sun, Elizabeth and Alomair, Basel and Wagner, David , month = jan, year =. Jatmo:. doi:10.48550/arXiv.2312.17673 , abstract =

  18. [19]

    Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

  19. [20]

    arXiv.org , author =

    Deep reinforcement learning from human preferences , url =. arXiv.org , author =. 2017 , file =

  20. [21]

    arXiv.org , author =

    Universal and. arXiv.org , author =. 2023 , file =

  21. [22]

    Mehrotra, Anay and Zampetakis, Manolis and Kassianik, Paul and Nelson, Blaine and Anderson, Hyrum and Singer, Yaron and Karbasi, Amin , month = feb, year =. Tree of. doi:10.48550/arXiv.2312.02119 , abstract =

  22. [23]

    Schulhoff, Sander , file =. Random

  23. [24]

    Sandwich

    Schulhoff, Sander , file =. Sandwich

  24. [25]

    arXiv.org , author =

    Large. arXiv.org , author =. 2023 , file =

  25. [26]

    Exploiting

    Kang, Daniel and Li, Xuechen and Stoica, Ion and Guestrin, Carlos and Zaharia, Matei and Hashimoto, Tatsunori , month = feb, year =. Exploiting

  26. [27]

    Jailbreaking

    Liu, Yi and Deng, Gelei and Xu, Zhengzi and Li, Yuekang and Zheng, Yaowen and Zhang, Ying and Zhao, Lida and Zhang, Tianwei and Wang, Kailong and Liu, Yang , month = mar, year =. Jailbreaking

  27. [28]

    Jailbroken:

    Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , month = jul, year =. Jailbroken:

  28. [29]

    These are

    Warren, Tom , year =. These are

  29. [30]

    Unidebugger: Hierarchical multi-agent framework for unified software debugging,

    Lee, Cheryl and Xia, Chunqiu Steven and Huang, Jen-tse and Zhu, Zhouruixin and Zhang, Lingming and Lyu, Michael R. , month = apr, year =. A. doi:10.48550/arXiv.2404.17153 , abstract =

  30. [31]

    Wu, Alexander , month = sep, year =. geekan/

  31. [32]

    Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-Rong , month = may, year =. Tool. doi:10.48550/arXiv.2405.17935 , abstract =

  32. [33]

    2023 , file =

    arXiv.org , author =. 2023 , file =

  33. [34]

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    Liang, Tian and He, Zhiwei and Jiao, Wenxiang and Wang, Xing and Wang, Rui and Yang, Yujiu and Tu, Zhaopeng and Shi, Shuming , month = jul, year =. Encouraging. doi:10.48550/arXiv.2305.19118 , abstract =

  34. [35]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi , month = oct, year =. doi:10.48550/arXiv.2308.08155 , abstract =

  35. [36]

    CrewAI , month = sep, year =

  36. [37]

    AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

    Chen, Weize and Su, Yusheng and Zuo, Jingwei and Yang, Cheng and Yuan, Chenfei and Chan, Chi-Min and Yu, Heyang and Lu, Yaxi and Hung, Yi-Hsin and Qian, Chen and Qin, Yujia and Cong, Xin and Xie, Ruobing and Liu, Zhiyuan and Sun, Maosong and Zhou, Jie , month = oct, year =. doi:10.48550/arXiv.2308.10848 , abstract =

  37. [38]

    CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

    Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard , month = nov, year =. doi:10.48550/arXiv.2303.17760 , abstract =

  38. [39]

    Benchmark

    Wang, Siyuan and Long, Zhuohan and Fan, Zhihao and Wei, Zhongyu and Huang, Xuanjing , month = feb, year =. Benchmark. doi:10.48550/arXiv.2402.11443 , abstract =

  39. [40]

    arXiv.org , author =

    Large. arXiv.org , author =. 2024 , keywords =

  40. [41]

    AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023

    Lin, Jiaju and Zhao, Haoran and Zhang, Aochi and Wu, Yiting and Ping, Huqiuyue and Chen, Qin , month = aug, year =. doi:10.48550/arXiv.2308.04026 , abstract =

  41. [42]

    Hua, Wenyue and Fan, Lizhou and Li, Lingyao and Mei, Kai and Ji, Jianchao and Ge, Yingqiang and Hemphill, Libby and Zhang, Yongfeng , month = jan, year =. War and. doi:10.48550/arXiv.2311.17227 , abstract =

  42. [43]

    Instruction tuning for large language models: A survey

    Zhang, Shengyu and Dong, Linfeng and Li, Xiaoya and Zhang, Sen and Sun, Xiaofei and Wang, Shuhe and Li, Jiwei and Hu, Runyi and Zhang, Tianwei and Wu, Fei and Wang, Guoyin , month = mar, year =. Instruction. doi:10.48550/arXiv.2308.10792 , abstract =

  43. [44]

    Instruction Tuning with GPT-4

    Peng, Baolin and Li, Chunyuan and He, Pengcheng and Galley, Michel and Gao, Jianfeng , month = apr, year =. Instruction. doi:10.48550/arXiv.2304.03277 , abstract =

  44. [45]

    Kim, To Eun and Diaz, Fernando , month = sep, year =. Towards. doi:10.48550/arXiv.2409.11598 , abstract =

  45. [47]

    ToolSword : Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages , August 2024

    Ye, Junjie and Li, Sixian and Li, Guanyu and Huang, Caishuang and Gao, Songyang and Wu, Yilong and Zhang, Qi and Gui, Tao and Huang, Xuanjing , month = aug, year =. doi:10.48550/arXiv.2402.10753 , abstract =

  46. [48]

    Cohen, Stav and Bitton, Ron and Nassi, Ben , month = mar, year =. Here. doi:10.48550/arXiv.2403.02817 , abstract =

  47. [49]

    ChatDev: Communicative Agents for Software Development

    Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , month = jun, year =. doi:10.48550/arXiv.2307.07924 , abstract =

  48. [50]

    Multiagent

    Weiss, Gerhard , year =. Multiagent

  49. [51]

    MemoryBank : Enhancing Large Language Models with Long - Term Memory , May 2023

    Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , month = may, year =. doi:10.48550/arXiv.2305.10250 , abstract =

  50. [52]

    Cognitive architectures for language agents

    Sumers, Theodore R. and Yao, Shunyu and Narasimhan, Karthik and Griffiths, Thomas L. , month = mar, year =. Cognitive. doi:10.48550/arXiv.2309.02427 , abstract =

  51. [53]

    StruQ : Defending Against Prompt Injection with Structured Queries , September 2024

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ : Defending Against Prompt Injection with Structured Queries , September 2024. URL http://arxiv.org/abs/2402.06363. arXiv:2402.06363 [cs]

  52. [54]

    Paul Francis Christiano, Jan Leike, Tom B

    Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, June 2017. URL https://arxiv.org/abs/1706.03741v4

  53. [55]

    Here Comes The AI Worm : Unleashing Zero -click Worms that Target GenAI - Powered Applications , March 2024

    Stav Cohen, Ron Bitton, and Ben Nassi. Here Comes The AI Worm : Unleashing Zero -click Worms that Target GenAI - Powered Applications , March 2024. URL http://arxiv.org/abs/2403.02817. arXiv:2403.02817 [cs]

  54. [56]

    crewAIInc / crewAI , September 2024

    CrewAI. crewAIInc / crewAI , September 2024. URL https://github.com/crewAIInc/crewAI. original-date: 2023-10-27T03:26:59Z

  55. [57]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising Real - World LLM - Integrated Applications with Indirect Prompt Injection , May 2023. URL http://arxiv.org/abs/2302.12173. arXiv:2302.12173 [cs]

  56. [58]

    Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast.arXiv preprint arXiv:2402.08567, 2024

    Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin. Agent Smith : A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast , June 2024. URL http://arxiv.org/abs/2402.08567. arXiv:2402.08567 [cs]

  57. [59]

    Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large Language Model based Multi - Agents : A Survey of Progress and Challenges , January 2024. URL https://arxiv.org/abs/2402.01680v2

  58. [60]

    Defending Against Indirect Prompt Injection Attacks With Spotlighting

    Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. Defending Against Indirect Prompt Injection Attacks With Spotlighting , March 2024. URL http://arxiv.org/abs/2403.14720. arXiv:2403.14720 [cs]

  59. [61]

    War and Peace ( WarAgent ): Large Language Model -based Multi - Agent Simulation of World Wars , January 2024

    Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and Peace ( WarAgent ): Large Language Model -based Multi - Agent Simulation of World Wars , January 2024. URL http://arxiv.org/abs/2311.17227. arXiv:2311.17227 [cs]

  60. [62]

    Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, and Michael R. Lyu. On the Resilience of Multi - Agent Systems with Malicious Agents , August 2024. URL http://arxiv.org/abs/2408.00989. arXiv:2408.00989 [cs]

  61. [63]

    Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024

    Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, and Gongshen Liu. Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024. URL http://arxiv.org/abs/2407.07791. arXiv:2407.07791 [cs]

  62. [64]

    Exploiting Programmatic Behavior of LLMs : Dual - Use Through Standard Security Attacks , February 2023

    Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, and Tatsunori Hashimoto. Exploiting Programmatic Behavior of LLMs : Dual - Use Through Standard Security Attacks , February 2023. URL http://arxiv.org/abs/2302.05733. arXiv:2302.05733 [cs]

  63. [65]

    Towards Fair RAG : On the Impact of Fair Ranking in Retrieval - Augmented Generation , September 2024

    To Eun Kim and Fernando Diaz. Towards Fair RAG : On the Impact of Fair Ranking in Retrieval - Augmented Generation , September 2024. URL http://arxiv.org/abs/2409.11598. arXiv:2409.11598 [cs]

  64. [66]

    LangGraph

    LangGraph. LangGraph . URL https://www.langchain.com/langgraph

  65. [67]

    Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael R. Lyu. A Unified Debugging Approach via LLM - Based Multi - Agent Synergy , April 2024. URL http://arxiv.org/abs/2404.17153. arXiv:2404.17153 [cs]

  66. [68]

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging Divergent Thinking in Large Language Models through Multi - Agent Debate , July 2024. URL http://arxiv.org/abs/2305.19118. arXiv:2305.19118 [cs]

  67. [69]

    AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023

    Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, and Qin Chen. AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023. URL http://arxiv.org/abs/2308.04026. arXiv:2308.04026 [cs]

  68. [70]

    Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a

    Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a . URL http://arxiv.org/abs/2403.04957. arXiv:2403.04957 [cs]

  69. [71]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt Injection attack against LLM -integrated Applications , March 2024 b . URL http://arxiv.org/abs/2306.05499. arXiv:2306.05499 [cs]

  70. [72]

    Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c . URL http://arxiv.org/abs/2310.12815. arXiv:2310.12815 [cs]

  71. [73]

    MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. MathVista : Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts , October 2023. URL https://arxiv.org/abs/2310.02255v3

  72. [74]

    Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024

    Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, and Amin Karbasi. Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024. URL http://arxiv.org/abs/2312.02119. arXiv:2312.02119 [cs, stat]

  73. [75]

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...

  74. [76]

    Generative Agents: Interactive Simulacra of Human Behavior

    Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative Agents : Interactive Simulacra of Human Behavior , August 2023. URL http://arxiv.org/abs/2304.03442. arXiv:2304.03442 [cs]

  75. [77]

    Instruction Tuning with GPT-4

    Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. Instruction Tuning with GPT -4, April 2023. URL http://arxiv.org/abs/2304.03277. arXiv:2304.03277 [cs]

  76. [78]

    Ignore Previous Prompt: Attack Techniques For Language Models

    Fábio Perez and Ian Ribeiro. Ignore Previous Prompt : Attack Techniques For Language Models , November 2022. URL http://arxiv.org/abs/2211.09527. arXiv:2211.09527 [cs]

  77. [79]

    ChatDev: Communicative Agents for Software Development

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. ChatDev : Communicative Agents for Software Development , June 2024. URL http://arxiv.org/abs/2307.07924. arXiv:2307.07924 [cs]

  78. [80]

    Tool Learning with Large Language Models : A Survey , May 2024

    Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool Learning with Large Language Models : A Survey , May 2024. URL http://arxiv.org/abs/2405.17935. arXiv:2405.17935 [cs]

  79. [81]

    Instruction Defense : Strengthen AI Prompts Against Hacking , a

    Sander Schulhoff. Instruction Defense : Strengthen AI Prompts Against Hacking , a . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/instruction

  80. [82]

    Random Sequence Enclosure : Safeguarding AI Prompts , b

    Sander Schulhoff. Random Sequence Enclosure : Safeguarding AI Prompts , b . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/random_sequence

Showing first 80 references.