arxiv: 2410.07283 · v1 · submitted 2024-10-09 · 💻 cs.MA · cs.AI· cs.CR

Recognition: 2 theorem links

· Lean Theorem

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee , Mo Tiwari

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:28 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.CR

keywords prompt injectionmulti-agent systemsLLM securityprompt infectionAI agent vulnerabilitiesself-replicating attacks

0 comments

The pith

Malicious prompts can self-replicate from one LLM agent to others in multi-agent systems, spreading like a virus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that prompt injection attacks are no longer limited to single models but can jump between agents when one passes instructions to the next. A single infected prompt gets the receiving agent to execute harmful actions and then forward the same malicious instruction onward. This creates silent propagation that enables data theft, scams, and system disruption without direct access to every agent. Experiments confirm the effect persists even when agents do not share all communications publicly. The authors also show that adding LLM Tagging to existing safeguards reduces spread, highlighting that current single-agent defenses fall short for interconnected setups.

Core claim

Prompt Infection is an LLM-to-LLM attack in which a malicious prompt, once injected into one agent, causes that agent to execute the harmful task and then embed the same prompt into messages sent to peer agents, allowing the infection to replicate across the system without requiring direct external input to each agent.

What carries the argument

Prompt Infection, the self-replicating malicious instruction that exploits inter-agent message passing to propagate itself.

If this is right

A single entry point can compromise an entire multi-agent workflow through silent replication.
Standard single-agent prompt injection defenses fail to stop system-wide effects.
Data exfiltration and misinformation campaigns can scale automatically once one agent is reached.
Combining LLM Tagging with existing safeguards measurably limits further spread.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of agent networks may need mandatory message sanitization at every hop rather than at the edge only.
Testing protocols for new multi-agent applications should include deliberate infection attempts as a standard check.
The same replication pattern could appear in other structured communication systems such as tool-calling chains or workflow orchestrators.

Load-bearing premise

Agents will execute and forward malicious instructions received from other agents without built-in refusal or detection of the replication attempt.

What would settle it

Run a controlled multi-agent simulation in which every agent is given an explicit rule to refuse any message containing instructions to replicate or spread content to peers, then measure whether the original malicious prompt still propagates.

read the original abstract

As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Self-replicating prompt injection spreads in open multi-agent LLM setups but likely fails against basic refusal instructions.

read the letter

The main takeaway is that this work demonstrates self-replicating prompt injection between LLM agents in multi-agent systems. It extends the single-agent prompt injection literature by showing how an attack can propagate across agents like a virus. The experiments indicate susceptibility even with partial communication sharing, and the LLM Tagging defense offers a way to combine with other safeguards to reduce spread. That's useful empirical evidence on a timely issue. One soft spot is that the propagation seems to assume agents will follow and forward the malicious instructions without refusal. Adding common safety prompts that instruct agents to ignore behavior changes or message forwarding could break the chain at the first step. The paper does not appear to test against such constraints, which limits how general the findings are. The approach is empirical rather than theoretical, with no obvious circularity or invented parameters. It builds straightforwardly on existing prompt injection research. The citation pattern is standard, referencing prior work on prompt injection without overclaiming. This paper is for researchers and practitioners working on secure multi-agent LLM deployments. Anyone concerned with scaling these systems would find the attack vector and mitigation idea relevant. I would send it to peer review to verify the experimental details and see how the defense performs under varied conditions.

Referee Report

2 major / 2 minor

Summary. The paper introduces 'Prompt Infection,' a novel LLM-to-LLM prompt injection attack in multi-agent systems where malicious prompts self-replicate across interconnected agents like a computer virus. It claims this enables silent propagation leading to data theft, scams, misinformation, and system disruption. Extensive experiments demonstrate high susceptibility even in partially shared communication setups, and LLM Tagging is proposed as a defense that, combined with existing safeguards, significantly reduces spread.

Significance. If the empirical results hold under realistic conditions, this identifies a critical new attack surface in multi-agent LLM systems, which are rapidly being adopted. The work provides concrete evidence of propagation risks beyond single-agent prompt injection and offers a practical defense, highlighting the need for system-level security measures. The focus on partially shared communications is a strength, as is the framing of the attack as viral self-replication.

major comments (2)

[§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.
[§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.

minor comments (2)

[Abstract] The abstract and introduction should explicitly state the number of agents, LLM models (e.g., GPT-4, Llama variants), and exact propagation success rates from the experiments to allow readers to gauge the scale of the findings without reading the full experimental section.
[Figure 1] Figure 1 (infection propagation diagram): The visual could be improved by adding arrows or labels distinguishing the initial injection step from subsequent forwarding steps, and by indicating whether communications are fully or partially shared in each panel.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. The comments highlight important gaps in our experimental evaluation and defense assessment. We address each point below and will revise the manuscript to incorporate additional experiments and quantitative metrics as suggested.

read point-by-point responses

Referee: [§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.

Authors: We acknowledge that our initial experiments in Section 4 focused on baseline multi-agent configurations with varying levels of communication sharing to isolate the self-replication mechanism. Standard safety prompts were not explicitly added in those runs. We agree this limits direct generalization to fully safeguarded deployed systems. In the revised version, we will add new experiments that incorporate common safety system prompts (e.g., refusal instructions against propagation) and report the resulting infection rates and propagation hops across the same agent setups and LLM backends. revision: yes
Referee: [§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.

Authors: We agree that the current presentation of LLM Tagging in Section 5 would benefit from explicit quantitative comparisons. The manuscript states that the defense, when combined with safeguards, significantly reduces spread, but does not include side-by-side metrics. In the revision, we will add tables and figures reporting infection rate reductions (as percentages), average hops before containment, and success rates for tagged versus untagged conditions, evaluated across identical agent configurations and multiple LLM backends. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical demonstration of prompt infection attack

full rationale

The paper introduces Prompt Infection as an empirical attack vector and supports its claims through direct experiments on multi-agent LLM interactions rather than any derivation chain, fitted parameters, or first-principles predictions. No equations, self-definitional constructs, or load-bearing self-citations appear; the susceptibility results and proposed LLM Tagging defense follow from the reported experimental outcomes in shared and partially shared communication setups. The work is self-contained against external benchmarks as a demonstration study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on domain assumptions about LLM compliance with received prompts and introduces a new attack concept without independent prior evidence.

axioms (1)

domain assumption LLM agents will execute and forward malicious instructions received from peer agents without refusal
This is required for the self-replication to occur as described in the abstract.

invented entities (1)

Prompt Infection no independent evidence
purpose: To name and conceptualize the self-replicating prompt injection attack across agents
New term coined for the observed phenomenon; no independent evidence provided outside the paper's experiments.

pith-pipeline@v0.9.0 · 5483 in / 1189 out tokens · 47650 ms · 2026-05-15T19:28:47.471286+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus... Recursive Collapse... P romptInf ection(N )(x, data)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Infection prompts propagate... logistic growth pattern... importance score manipulation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
cs.CR 2026-04 accept novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries
cs.CR 2026-05 unverdicted novelty 7.0

Identifies concrete attacks from a malicious Provider on SAGA and proposes SAGA-BFT, SAGA-MON, SAGA-AUD, and SAGA-HYB mitigations offering different security-performance trade-offs.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
cs.CR 2026-05 unverdicted novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck
cs.CR 2026-05 unverdicted novelty 7.0

PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in Age...
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
cs.AI 2026-05 unverdicted novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...
Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense
cs.CR 2026-05 unverdicted novelty 7.0

Autonomous LLM agents can host self-propagating worms via persistent state re-entry, demonstrated with automated analysis tools and blocked by a formal no-propagation defense on three frameworks.
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
cs.CR 2026-03 conditional novelty 7.0

Stage-level tracking of prompt injection reveals that write-node placement and model-specific behaviors determine attack outcomes more than initial exposure in LLM pipelines.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks
cs.CR 2026-05 unverdicted novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security
cs.LG 2026-05 unverdicted novelty 6.0

MAGIQ introduces a post-quantum secure system for policy definition, enforcement, and accountability in multi-agent AI using novel cryptographic protocols and UC framework proofs.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
cs.CR 2026-05 unverdicted novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
cs.CR 2026-05 unverdicted novelty 6.0

Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of sus...
HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems
cs.CR 2026-04 unverdicted novelty 6.0

HDP is a lightweight protocol that binds human authorization to sessions via signed append-only token chains, enabling offline verification of delegation provenance using only an Ed25519 public key and session identifier.
Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems
cs.MA 2026-05 unverdicted novelty 5.0

Safety constraints in LLM-based multi-agent systems commonly weaken during execution through memory, communication, and tool use, requiring them to be maintained as explicit state rather than asserted once.
Insider Attacks in Multi-Agent LLM Consensus Systems
cs.MA 2026-05 unverdicted novelty 5.0

A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents
cs.AI 2026-05 unverdicted novelty 5.0

Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.
SoK: Security of Autonomous LLM Agents in Agentic Commerce
cs.CR 2026-04 unverdicted novelty 5.0

The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
cs.CL 2026-05 unverdicted novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...
CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems
cs.CR 2026-04 unverdicted novelty 4.0

CASCADE is a cascaded hybrid detector that combines fast regex/entropy filtering, BGE embeddings with local LLM fallback, and output pattern checks to achieve 95.85% precision and 6.06% false-positive rate against pro...

Reference graph

Works this paper leans on

101 extracted references · 101 canonical work pages · cited by 18 Pith papers · 22 internal anchors

[1]

PsySafe : A Comprehensive Framework for Psychological -based Attack , Defense , and Evaluation of Multi -agent System Safety , August 2024 d

Zhang, Zaibin and Zhang, Yongting and Li, Lijun and Gao, Hongzhi and Wang, Lijun and Lu, Huchuan and Zhao, Feng and Qiao, Yu and Shao, Jing , month = aug, year =. doi:10.48550/arXiv.2401.11880 , abstract =

work page doi:10.48550/arxiv.2401.11880
[3]

Tian, Yu and Yang, Xiao and Zhang, Jingyuan and Dong, Yinpeng and Su, Hang , month = feb, year =. Evil

work page
[4]

Not what you've signed up for:

Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , month = may, year =. Not what you've signed up for:

work page
[5]

, month = sep, year =

Zhang, Wenxiao and Kong, Xiangrui and Dewitt, Conan and Braunl, Thomas and Hong, Jin B. , month = sep, year =. A

work page
[6]

StruQ : Defending Against Prompt Injection with Structured Queries , September 2024

Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , month = sep, year =. doi:10.48550/arXiv.2402.06363 , abstract =

work page doi:10.48550/arxiv.2402.06363
[7]

and Cai, Carrie J

Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , month = aug, year =. Generative

work page
[8]

Breaking

Zhang, Boyang and Tan, Yicong and Shen, Yun and Salem, Ahmed and Backes, Michael and Zannettou, Savvas and Zhang, Yang , month = jul, year =. Breaking

work page
[9]

Liu, Yi and Deng, Gelei and Li, Yuekang and Wang, Kailong and Wang, Zihao and Wang, Xiaofeng and Zhang, Tianwei and Liu, Yepang and Wang, Haoyu and Zheng, Yan and Liu, Yang , month = mar, year =. Prompt

work page
[10]

Formalizing and

Liu, Yupei and Jia, Yuqi and Geng, Runpeng and Jia, Jinyuan and Gong, Neil Zhenqiang , month = jun, year =. Formalizing and

work page
[11]

Gu, Xiangming and Zheng, Xiaosen and Pang, Tianyu and Du, Chao and Liu, Qian and Wang, Ye and Jiang, Jing and Lin, Min , month = jun, year =. Agent

work page
[12]

Flooding

Ju, Tianjie and Wang, Yiting and Ma, Xinbei and Cheng, Pengzhou and Zhao, Haodong and Wang, Yulong and Liu, Lifeng and Xie, Jian and Zhang, Zhuosheng and Liu, Gongshen , month = jul, year =. Flooding

work page
[13]

, month = aug, year =

Huang, Jen-tse and Zhou, Jiaxu and Jin, Tailin and Zhou, Xuhui and Chen, Zixi and Wang, Wenxuan and Yuan, Youliang and Sap, Maarten and Lyu, Michael R. , month = aug, year =. On the

work page
[14]

Yuan, Youliang and Jiao, Wenxiang and Wang, Wenxuan and Huang, Jen-tse and He, Pinjia and Shi, Shuming and Tu, Zhaopeng , month = mar, year =

work page
[15]

Automatic and

Liu, Xiaogeng and Yu, Zhiyuan and Zhang, Yizhe and Zhang, Ning and Xiao, Chaowei , month = mar, year =. Automatic and

work page
[16]

Defending

Hines, Keegan and Lopez, Gary and Hall, Matthew and Zarfati, Federico and Zunger, Yonatan and Kiciman, Emre , month = mar, year =. Defending

work page
[17]

Perez, Fábio and Ribeiro, Ian , month = nov, year =. Ignore. doi:10.48550/arXiv.2211.09527 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527
[18]

Piet, Julien and Alrashed, Maha and Sitawarin, Chawin and Chen, Sizhe and Wei, Zeming and Sun, Elizabeth and Alomair, Basel and Wagner, David , month = jan, year =. Jatmo:. doi:10.48550/arXiv.2312.17673 , abstract =

work page doi:10.48550/arxiv.2312.17673
[19]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155
[20]

arXiv.org , author =

Deep reinforcement learning from human preferences , url =. arXiv.org , author =. 2017 , file =

work page 2017
[21]

arXiv.org , author =

Universal and. arXiv.org , author =. 2023 , file =

work page 2023
[22]

Mehrotra, Anay and Zampetakis, Manolis and Kassianik, Paul and Nelson, Blaine and Anderson, Hyrum and Singer, Yaron and Karbasi, Amin , month = feb, year =. Tree of. doi:10.48550/arXiv.2312.02119 , abstract =

work page doi:10.48550/arxiv.2312.02119
[23]

Schulhoff, Sander , file =. Random

work page
[24]

Sandwich

Schulhoff, Sander , file =. Sandwich

work page
[25]

arXiv.org , author =

Large. arXiv.org , author =. 2023 , file =

work page 2023
[26]

Exploiting

Kang, Daniel and Li, Xuechen and Stoica, Ion and Guestrin, Carlos and Zaharia, Matei and Hashimoto, Tatsunori , month = feb, year =. Exploiting

work page
[27]

Jailbreaking

Liu, Yi and Deng, Gelei and Xu, Zhengzi and Li, Yuekang and Zheng, Yaowen and Zhang, Ying and Zhao, Lida and Zhang, Tianwei and Wang, Kailong and Liu, Yang , month = mar, year =. Jailbreaking

work page
[28]

Jailbroken:

Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , month = jul, year =. Jailbroken:

work page
[29]

These are

Warren, Tom , year =. These are

work page
[30]

Unidebugger: Hierarchical multi-agent framework for unified software debugging,

Lee, Cheryl and Xia, Chunqiu Steven and Huang, Jen-tse and Zhu, Zhouruixin and Zhang, Lingming and Lyu, Michael R. , month = apr, year =. A. doi:10.48550/arXiv.2404.17153 , abstract =

work page doi:10.48550/arxiv.2404.17153
[31]

Wu, Alexander , month = sep, year =. geekan/

work page
[32]

Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-Rong , month = may, year =. Tool. doi:10.48550/arXiv.2405.17935 , abstract =

work page doi:10.48550/arxiv.2405.17935
[33]

2023 , file =

arXiv.org , author =. 2023 , file =

work page 2023
[34]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Liang, Tian and He, Zhiwei and Jiao, Wenxiang and Wang, Xing and Wang, Rui and Yang, Yujiu and Tu, Zhaopeng and Shi, Shuming , month = jul, year =. Encouraging. doi:10.48550/arXiv.2305.19118 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.19118
[35]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi , month = oct, year =. doi:10.48550/arXiv.2308.08155 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155
[36]

CrewAI , month = sep, year =

work page
[37]

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Chen, Weize and Su, Yusheng and Zuo, Jingwei and Yang, Cheng and Yuan, Chenfei and Chan, Chi-Min and Yu, Heyang and Lu, Yaxi and Hung, Yi-Hsin and Qian, Chen and Qin, Yujia and Cong, Xin and Xie, Ruobing and Liu, Zhiyuan and Sun, Maosong and Zhou, Jie , month = oct, year =. doi:10.48550/arXiv.2308.10848 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.2308.10848
[38]

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard , month = nov, year =. doi:10.48550/arXiv.2303.17760 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17760
[39]

Benchmark

Wang, Siyuan and Long, Zhuohan and Fan, Zhihao and Wei, Zhongyu and Huang, Xuanjing , month = feb, year =. Benchmark. doi:10.48550/arXiv.2402.11443 , abstract =

work page doi:10.48550/arxiv.2402.11443
[40]

arXiv.org , author =

Large. arXiv.org , author =. 2024 , keywords =

work page 2024
[41]

AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023

Lin, Jiaju and Zhao, Haoran and Zhang, Aochi and Wu, Yiting and Ping, Huqiuyue and Chen, Qin , month = aug, year =. doi:10.48550/arXiv.2308.04026 , abstract =

work page doi:10.48550/arxiv.2308.04026
[42]

Hua, Wenyue and Fan, Lizhou and Li, Lingyao and Mei, Kai and Ji, Jianchao and Ge, Yingqiang and Hemphill, Libby and Zhang, Yongfeng , month = jan, year =. War and. doi:10.48550/arXiv.2311.17227 , abstract =

work page doi:10.48550/arxiv.2311.17227
[43]

Instruction tuning for large language models: A survey

Zhang, Shengyu and Dong, Linfeng and Li, Xiaoya and Zhang, Sen and Sun, Xiaofei and Wang, Shuhe and Li, Jiwei and Hu, Runyi and Zhang, Tianwei and Wu, Fei and Wang, Guoyin , month = mar, year =. Instruction. doi:10.48550/arXiv.2308.10792 , abstract =

work page doi:10.48550/arxiv.2308.10792
[44]

Instruction Tuning with GPT-4

Peng, Baolin and Li, Chunyuan and He, Pengcheng and Galley, Michel and Gao, Jianfeng , month = apr, year =. Instruction. doi:10.48550/arXiv.2304.03277 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.03277
[45]

Kim, To Eun and Diaz, Fernando , month = sep, year =. Towards. doi:10.48550/arXiv.2409.11598 , abstract =

work page doi:10.48550/arxiv.2409.11598
[47]

ToolSword : Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages , August 2024

Ye, Junjie and Li, Sixian and Li, Guanyu and Huang, Caishuang and Gao, Songyang and Wu, Yilong and Zhang, Qi and Gui, Tao and Huang, Xuanjing , month = aug, year =. doi:10.48550/arXiv.2402.10753 , abstract =

work page doi:10.48550/arxiv.2402.10753
[48]

Cohen, Stav and Bitton, Ron and Nassi, Ben , month = mar, year =. Here. doi:10.48550/arXiv.2403.02817 , abstract =

work page doi:10.48550/arxiv.2403.02817
[49]

ChatDev: Communicative Agents for Software Development

Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , month = jun, year =. doi:10.48550/arXiv.2307.07924 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.07924
[50]

Multiagent

Weiss, Gerhard , year =. Multiagent

work page
[51]

MemoryBank : Enhancing Large Language Models with Long - Term Memory , May 2023

Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , month = may, year =. doi:10.48550/arXiv.2305.10250 , abstract =

work page doi:10.48550/arxiv.2305.10250
[52]

Cognitive architectures for language agents

Sumers, Theodore R. and Yao, Shunyu and Narasimhan, Karthik and Griffiths, Thomas L. , month = mar, year =. Cognitive. doi:10.48550/arXiv.2309.02427 , abstract =

work page doi:10.48550/arxiv.2309.02427
[53]

StruQ : Defending Against Prompt Injection with Structured Queries , September 2024

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ : Defending Against Prompt Injection with Structured Queries , September 2024. URL http://arxiv.org/abs/2402.06363. arXiv:2402.06363 [cs]

work page arXiv 2024
[54]

Paul Francis Christiano, Jan Leike, Tom B

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, June 2017. URL https://arxiv.org/abs/1706.03741v4

work page arXiv 2017
[55]

Here Comes The AI Worm : Unleashing Zero -click Worms that Target GenAI - Powered Applications , March 2024

Stav Cohen, Ron Bitton, and Ben Nassi. Here Comes The AI Worm : Unleashing Zero -click Worms that Target GenAI - Powered Applications , March 2024. URL http://arxiv.org/abs/2403.02817. arXiv:2403.02817 [cs]

work page arXiv 2024
[56]

crewAIInc / crewAI , September 2024

CrewAI. crewAIInc / crewAI , September 2024. URL https://github.com/crewAIInc/crewAI. original-date: 2023-10-27T03:26:59Z

work page 2024
[57]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising Real - World LLM - Integrated Applications with Indirect Prompt Injection , May 2023. URL http://arxiv.org/abs/2302.12173. arXiv:2302.12173 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[58]

Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast.arXiv preprint arXiv:2402.08567, 2024

Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin. Agent Smith : A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast , June 2024. URL http://arxiv.org/abs/2402.08567. arXiv:2402.08567 [cs]

work page arXiv 2024
[59]

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large Language Model based Multi - Agents : A Survey of Progress and Challenges , January 2024. URL https://arxiv.org/abs/2402.01680v2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[60]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. Defending Against Indirect Prompt Injection Attacks With Spotlighting , March 2024. URL http://arxiv.org/abs/2403.14720. arXiv:2403.14720 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

War and Peace ( WarAgent ): Large Language Model -based Multi - Agent Simulation of World Wars , January 2024

Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and Peace ( WarAgent ): Large Language Model -based Multi - Agent Simulation of World Wars , January 2024. URL http://arxiv.org/abs/2311.17227. arXiv:2311.17227 [cs]

work page arXiv 2024
[62]

Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, and Michael R. Lyu. On the Resilience of Multi - Agent Systems with Malicious Agents , August 2024. URL http://arxiv.org/abs/2408.00989. arXiv:2408.00989 [cs]

work page arXiv 2024
[63]

Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024

Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, and Gongshen Liu. Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024. URL http://arxiv.org/abs/2407.07791. arXiv:2407.07791 [cs]

work page arXiv 2024
[64]

Exploiting Programmatic Behavior of LLMs : Dual - Use Through Standard Security Attacks , February 2023

Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, and Tatsunori Hashimoto. Exploiting Programmatic Behavior of LLMs : Dual - Use Through Standard Security Attacks , February 2023. URL http://arxiv.org/abs/2302.05733. arXiv:2302.05733 [cs]

work page arXiv 2023
[65]

Towards Fair RAG : On the Impact of Fair Ranking in Retrieval - Augmented Generation , September 2024

To Eun Kim and Fernando Diaz. Towards Fair RAG : On the Impact of Fair Ranking in Retrieval - Augmented Generation , September 2024. URL http://arxiv.org/abs/2409.11598. arXiv:2409.11598 [cs]

work page arXiv 2024
[66]

LangGraph

LangGraph. LangGraph . URL https://www.langchain.com/langgraph

work page
[67]

Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael R. Lyu. A Unified Debugging Approach via LLM - Based Multi - Agent Synergy , April 2024. URL http://arxiv.org/abs/2404.17153. arXiv:2404.17153 [cs]

work page arXiv 2024
[68]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging Divergent Thinking in Large Language Models through Multi - Agent Debate , July 2024. URL http://arxiv.org/abs/2305.19118. arXiv:2305.19118 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[69]

AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023

Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, and Qin Chen. AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023. URL http://arxiv.org/abs/2308.04026. arXiv:2308.04026 [cs]

work page arXiv 2023
[70]

Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a . URL http://arxiv.org/abs/2403.04957. arXiv:2403.04957 [cs]

work page arXiv 2024
[71]

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt Injection attack against LLM -integrated Applications , March 2024 b . URL http://arxiv.org/abs/2306.05499. arXiv:2306.05499 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[72]

Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c . URL http://arxiv.org/abs/2310.12815. arXiv:2310.12815 [cs]

work page arXiv 2024
[73]

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. MathVista : Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts , October 2023. URL https://arxiv.org/abs/2310.02255v3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[74]

Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024

Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, and Amin Karbasi. Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024. URL http://arxiv.org/abs/2312.02119. arXiv:2312.02119 [cs, stat]

work page arXiv 2024
[75]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[76]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative Agents : Interactive Simulacra of Human Behavior , August 2023. URL http://arxiv.org/abs/2304.03442. arXiv:2304.03442 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[77]

Instruction Tuning with GPT-4

Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. Instruction Tuning with GPT -4, April 2023. URL http://arxiv.org/abs/2304.03277. arXiv:2304.03277 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[78]

Ignore Previous Prompt: Attack Techniques For Language Models

Fábio Perez and Ian Ribeiro. Ignore Previous Prompt : Attack Techniques For Language Models , November 2022. URL http://arxiv.org/abs/2211.09527. arXiv:2211.09527 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2022
[79]

ChatDev: Communicative Agents for Software Development

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. ChatDev : Communicative Agents for Software Development , June 2024. URL http://arxiv.org/abs/2307.07924. arXiv:2307.07924 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[80]

Tool Learning with Large Language Models : A Survey , May 2024

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool Learning with Large Language Models : A Survey , May 2024. URL http://arxiv.org/abs/2405.17935. arXiv:2405.17935 [cs]

work page arXiv 2024
[81]

Instruction Defense : Strengthen AI Prompts Against Hacking , a

Sander Schulhoff. Instruction Defense : Strengthen AI Prompts Against Hacking , a . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/instruction

work page
[82]

Random Sequence Enclosure : Safeguarding AI Prompts , b

Sander Schulhoff. Random Sequence Enclosure : Safeguarding AI Prompts , b . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/random_sequence

work page

Showing first 80 references.