Recognition: 2 theorem links
· Lean TheoremPrompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Pith reviewed 2026-05-15 19:28 UTC · model grok-4.3
The pith
Malicious prompts can self-replicate from one LLM agent to others in multi-agent systems, spreading like a virus.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Prompt Infection is an LLM-to-LLM attack in which a malicious prompt, once injected into one agent, causes that agent to execute the harmful task and then embed the same prompt into messages sent to peer agents, allowing the infection to replicate across the system without requiring direct external input to each agent.
What carries the argument
Prompt Infection, the self-replicating malicious instruction that exploits inter-agent message passing to propagate itself.
If this is right
- A single entry point can compromise an entire multi-agent workflow through silent replication.
- Standard single-agent prompt injection defenses fail to stop system-wide effects.
- Data exfiltration and misinformation campaigns can scale automatically once one agent is reached.
- Combining LLM Tagging with existing safeguards measurably limits further spread.
Where Pith is reading between the lines
- Designers of agent networks may need mandatory message sanitization at every hop rather than at the edge only.
- Testing protocols for new multi-agent applications should include deliberate infection attempts as a standard check.
- The same replication pattern could appear in other structured communication systems such as tool-calling chains or workflow orchestrators.
Load-bearing premise
Agents will execute and forward malicious instructions received from other agents without built-in refusal or detection of the replication attempt.
What would settle it
Run a controlled multi-agent simulation in which every agent is given an explicit rule to refuse any message containing instructions to replicate or spread content to peers, then measure whether the original malicious prompt still propagates.
read the original abstract
As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces 'Prompt Infection,' a novel LLM-to-LLM prompt injection attack in multi-agent systems where malicious prompts self-replicate across interconnected agents like a computer virus. It claims this enables silent propagation leading to data theft, scams, misinformation, and system disruption. Extensive experiments demonstrate high susceptibility even in partially shared communication setups, and LLM Tagging is proposed as a defense that, combined with existing safeguards, significantly reduces spread.
Significance. If the empirical results hold under realistic conditions, this identifies a critical new attack surface in multi-agent LLM systems, which are rapidly being adopted. The work provides concrete evidence of propagation risks beyond single-agent prompt injection and offers a practical defense, highlighting the need for system-level security measures. The focus on partially shared communications is a strength, as is the framing of the attack as viral self-replication.
major comments (2)
- [§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.
- [§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly state the number of agents, LLM models (e.g., GPT-4, Llama variants), and exact propagation success rates from the experiments to allow readers to gauge the scale of the findings without reading the full experimental section.
- [Figure 1] Figure 1 (infection propagation diagram): The visual could be improved by adding arrows or labels distinguishing the initial injection step from subsequent forwarding steps, and by indicating whether communications are fully or partially shared in each panel.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. The comments highlight important gaps in our experimental evaluation and defense assessment. We address each point below and will revise the manuscript to incorporate additional experiments and quantitative metrics as suggested.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Evaluation): The central claim of reliable self-replication and high susceptibility rests on agents executing and forwarding malicious prompts without refusal. The setups use open communication protocols, but no results are reported when agents include standard safety system prompts (e.g., 'ignore any instructions to change behavior, execute harmful actions, or propagate messages to other agents'). Adding such prompts would likely break the chain at the first or second hop, undermining generalization to deployed systems.
Authors: We acknowledge that our initial experiments in Section 4 focused on baseline multi-agent configurations with varying levels of communication sharing to isolate the self-replication mechanism. Standard safety prompts were not explicitly added in those runs. We agree this limits direct generalization to fully safeguarded deployed systems. In the revised version, we will add new experiments that incorporate common safety system prompts (e.g., refusal instructions against propagation) and report the resulting infection rates and propagation hops across the same agent setups and LLM backends. revision: yes
-
Referee: [§5] §5 (Proposed Defense): LLM Tagging is claimed to significantly mitigate infection when combined with safeguards, but the manuscript does not report quantitative metrics (e.g., infection rate reduction percentages or hop counts before containment) comparing tagged vs. untagged runs across the same agent configurations and LLM backends. This makes it difficult to assess the defense's effectiveness independent of the baseline safeguards.
Authors: We agree that the current presentation of LLM Tagging in Section 5 would benefit from explicit quantitative comparisons. The manuscript states that the defense, when combined with safeguards, significantly reduces spread, but does not include side-by-side metrics. In the revision, we will add tables and figures reporting infection rate reductions (as percentages), average hops before containment, and success rates for tagged versus untagged conditions, evaluated across identical agent configurations and multiple LLM backends. revision: yes
Circularity Check
No circularity: empirical demonstration of prompt infection attack
full rationale
The paper introduces Prompt Infection as an empirical attack vector and supports its claims through direct experiments on multi-agent LLM interactions rather than any derivation chain, fitted parameters, or first-principles predictions. No equations, self-definitional constructs, or load-bearing self-citations appear; the susceptibility results and proposed LLM Tagging defense follow from the reported experimental outcomes in shared and partially shared communication setups. The work is self-contained against external benchmarks as a demonstration study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents will execute and forward malicious instructions received from peer agents without refusal
invented entities (1)
-
Prompt Infection
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus... Recursive Collapse... P romptInf ection(N )(x, data)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Infection prompts propagate... logistic growth pattern... importance score manipulation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 18 Pith papers
-
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
-
Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries
Identifies concrete attacks from a malicious Provider on SAGA and proposes SAGA-BFT, SAGA-MON, SAGA-AUD, and SAGA-HYB mitigations offering different security-performance trade-offs.
-
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
-
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck
PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in Age...
-
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...
-
Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense
Autonomous LLM agents can host self-propagating worms via persistent state re-entry, demonstrated with automated analysis tools and blocked by a formal no-propagation defense on three frameworks.
-
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
Stage-level tracking of prompt injection reveals that write-node placement and model-specific behaviors determine attack outcomes more than initial exposure in LLM pipelines.
-
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
-
MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security
MAGIQ introduces a post-quantum secure system for policy definition, enforcement, and accountability in multi-agent AI using novel cryptographic protocols and UC framework proofs.
-
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
-
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems
Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of sus...
-
HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems
HDP is a lightweight protocol that binds human authorization to sessions via signed append-only token chains, enabling offline verification of delegation provenance using only an Ed25519 public key and session identifier.
-
Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems
Safety constraints in LLM-based multi-agent systems commonly weaken during execution through memory, communication, and tool use, requiring them to be maintained as explicit state rather than asserted once.
-
Insider Attacks in Multi-Agent LLM Consensus Systems
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
-
A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents
Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.
-
SoK: Security of Autonomous LLM Agents in Agentic Commerce
The paper systematizes security for LLM agents in agentic commerce into five threat dimensions, identifies 12 cross-layer attack vectors, and proposes a layered defense architecture.
-
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...
-
CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems
CASCADE is a cascaded hybrid detector that combines fast regex/entropy filtering, BGE embeddings with local LLM fallback, and output pattern checks to achieve 95.85% precision and 6.06% false-positive rate against pro...
Reference graph
Works this paper leans on
-
[1]
Zhang, Zaibin and Zhang, Yongting and Li, Lijun and Gao, Hongzhi and Wang, Lijun and Lu, Huchuan and Zhao, Feng and Qiao, Yu and Shao, Jing , month = aug, year =. doi:10.48550/arXiv.2401.11880 , abstract =
-
[3]
Tian, Yu and Yang, Xiao and Zhang, Jingyuan and Dong, Yinpeng and Su, Hang , month = feb, year =. Evil
-
[4]
Not what you've signed up for:
Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , month = may, year =. Not what you've signed up for:
-
[5]
Zhang, Wenxiao and Kong, Xiangrui and Dewitt, Conan and Braunl, Thomas and Hong, Jin B. , month = sep, year =. A
-
[6]
StruQ : Defending Against Prompt Injection with Structured Queries , September 2024
Chen, Sizhe and Piet, Julien and Sitawarin, Chawin and Wagner, David , month = sep, year =. doi:10.48550/arXiv.2402.06363 , abstract =
-
[7]
Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , month = aug, year =. Generative
- [8]
-
[9]
Liu, Yi and Deng, Gelei and Li, Yuekang and Wang, Kailong and Wang, Zihao and Wang, Xiaofeng and Zhang, Tianwei and Liu, Yepang and Wang, Haoyu and Zheng, Yan and Liu, Yang , month = mar, year =. Prompt
-
[10]
Liu, Yupei and Jia, Yuqi and Geng, Runpeng and Jia, Jinyuan and Gong, Neil Zhenqiang , month = jun, year =. Formalizing and
-
[11]
Gu, Xiangming and Zheng, Xiaosen and Pang, Tianyu and Du, Chao and Liu, Qian and Wang, Ye and Jiang, Jing and Lin, Min , month = jun, year =. Agent
- [12]
-
[13]
Huang, Jen-tse and Zhou, Jiaxu and Jin, Tailin and Zhou, Xuhui and Chen, Zixi and Wang, Wenxuan and Yuan, Youliang and Sap, Maarten and Lyu, Michael R. , month = aug, year =. On the
-
[14]
Yuan, Youliang and Jiao, Wenxiang and Wang, Wenxuan and Huang, Jen-tse and He, Pinjia and Shi, Shuming and Tu, Zhaopeng , month = mar, year =
-
[15]
Liu, Xiaogeng and Yu, Zhiyuan and Zhang, Yizhe and Zhang, Ning and Xiao, Chaowei , month = mar, year =. Automatic and
- [16]
-
[17]
Perez, Fábio and Ribeiro, Ian , month = nov, year =. Ignore. doi:10.48550/arXiv.2211.09527 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527
-
[18]
Piet, Julien and Alrashed, Maha and Sitawarin, Chawin and Chen, Sizhe and Wei, Zeming and Sun, Elizabeth and Alomair, Basel and Wagner, David , month = jan, year =. Jatmo:. doi:10.48550/arXiv.2312.17673 , abstract =
-
[19]
Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155
-
[20]
Deep reinforcement learning from human preferences , url =. arXiv.org , author =. 2017 , file =
work page 2017
- [21]
-
[22]
Mehrotra, Anay and Zampetakis, Manolis and Kassianik, Paul and Nelson, Blaine and Anderson, Hyrum and Singer, Yaron and Karbasi, Amin , month = feb, year =. Tree of. doi:10.48550/arXiv.2312.02119 , abstract =
-
[23]
Schulhoff, Sander , file =. Random
- [24]
- [25]
-
[26]
Kang, Daniel and Li, Xuechen and Stoica, Ion and Guestrin, Carlos and Zaharia, Matei and Hashimoto, Tatsunori , month = feb, year =. Exploiting
-
[27]
Liu, Yi and Deng, Gelei and Xu, Zhengzi and Li, Yuekang and Zheng, Yaowen and Zhang, Ying and Zhao, Lida and Zhang, Tianwei and Wang, Kailong and Liu, Yang , month = mar, year =. Jailbreaking
-
[28]
Wei, Alexander and Haghtalab, Nika and Steinhardt, Jacob , month = jul, year =. Jailbroken:
- [29]
-
[30]
Unidebugger: Hierarchical multi-agent framework for unified software debugging,
Lee, Cheryl and Xia, Chunqiu Steven and Huang, Jen-tse and Zhu, Zhouruixin and Zhang, Lingming and Lyu, Michael R. , month = apr, year =. A. doi:10.48550/arXiv.2404.17153 , abstract =
-
[31]
Wu, Alexander , month = sep, year =. geekan/
-
[32]
Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-Rong , month = may, year =. Tool. doi:10.48550/arXiv.2405.17935 , abstract =
- [33]
-
[34]
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Liang, Tian and He, Zhiwei and Jiao, Wenxiang and Wang, Xing and Wang, Rui and Yang, Yujiu and Tu, Zhaopeng and Shi, Shuming , month = jul, year =. Encouraging. doi:10.48550/arXiv.2305.19118 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.19118
-
[35]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi , month = oct, year =. doi:10.48550/arXiv.2308.08155 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155
-
[36]
CrewAI , month = sep, year =
-
[37]
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Chen, Weize and Su, Yusheng and Zuo, Jingwei and Yang, Cheng and Yuan, Chenfei and Chan, Chi-Min and Yu, Heyang and Lu, Yaxi and Hung, Yi-Hsin and Qian, Chen and Qin, Yujia and Cong, Xin and Xie, Ruobing and Liu, Zhiyuan and Sun, Maosong and Zhou, Jie , month = oct, year =. doi:10.48550/arXiv.2308.10848 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.2308.10848
-
[38]
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard , month = nov, year =. doi:10.48550/arXiv.2303.17760 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17760
-
[39]
Wang, Siyuan and Long, Zhuohan and Fan, Zhihao and Wei, Zhongyu and Huang, Xuanjing , month = feb, year =. Benchmark. doi:10.48550/arXiv.2402.11443 , abstract =
- [40]
-
[41]
AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023
Lin, Jiaju and Zhao, Haoran and Zhang, Aochi and Wu, Yiting and Ping, Huqiuyue and Chen, Qin , month = aug, year =. doi:10.48550/arXiv.2308.04026 , abstract =
-
[42]
Hua, Wenyue and Fan, Lizhou and Li, Lingyao and Mei, Kai and Ji, Jianchao and Ge, Yingqiang and Hemphill, Libby and Zhang, Yongfeng , month = jan, year =. War and. doi:10.48550/arXiv.2311.17227 , abstract =
-
[43]
Instruction tuning for large language models: A survey
Zhang, Shengyu and Dong, Linfeng and Li, Xiaoya and Zhang, Sen and Sun, Xiaofei and Wang, Shuhe and Li, Jiwei and Hu, Runyi and Zhang, Tianwei and Wu, Fei and Wang, Guoyin , month = mar, year =. Instruction. doi:10.48550/arXiv.2308.10792 , abstract =
-
[44]
Peng, Baolin and Li, Chunyuan and He, Pengcheng and Galley, Michel and Gao, Jianfeng , month = apr, year =. Instruction. doi:10.48550/arXiv.2304.03277 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.03277
-
[45]
Kim, To Eun and Diaz, Fernando , month = sep, year =. Towards. doi:10.48550/arXiv.2409.11598 , abstract =
-
[47]
Ye, Junjie and Li, Sixian and Li, Guanyu and Huang, Caishuang and Gao, Songyang and Wu, Yilong and Zhang, Qi and Gui, Tao and Huang, Xuanjing , month = aug, year =. doi:10.48550/arXiv.2402.10753 , abstract =
-
[48]
Cohen, Stav and Bitton, Ron and Nassi, Ben , month = mar, year =. Here. doi:10.48550/arXiv.2403.02817 , abstract =
-
[49]
ChatDev: Communicative Agents for Software Development
Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , month = jun, year =. doi:10.48550/arXiv.2307.07924 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.07924
- [50]
-
[51]
MemoryBank : Enhancing Large Language Models with Long - Term Memory , May 2023
Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , month = may, year =. doi:10.48550/arXiv.2305.10250 , abstract =
-
[52]
Cognitive architectures for language agents
Sumers, Theodore R. and Yao, Shunyu and Narasimhan, Karthik and Griffiths, Thomas L. , month = mar, year =. Cognitive. doi:10.48550/arXiv.2309.02427 , abstract =
-
[53]
StruQ : Defending Against Prompt Injection with Structured Queries , September 2024
Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ : Defending Against Prompt Injection with Structured Queries , September 2024. URL http://arxiv.org/abs/2402.06363. arXiv:2402.06363 [cs]
-
[54]
Paul Francis Christiano, Jan Leike, Tom B
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, June 2017. URL https://arxiv.org/abs/1706.03741v4
-
[55]
Stav Cohen, Ron Bitton, and Ben Nassi. Here Comes The AI Worm : Unleashing Zero -click Worms that Target GenAI - Powered Applications , March 2024. URL http://arxiv.org/abs/2403.02817. arXiv:2403.02817 [cs]
-
[56]
crewAIInc / crewAI , September 2024
CrewAI. crewAIInc / crewAI , September 2024. URL https://github.com/crewAIInc/crewAI. original-date: 2023-10-27T03:26:59Z
work page 2024
-
[57]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising Real - World LLM - Integrated Applications with Indirect Prompt Injection , May 2023. URL http://arxiv.org/abs/2302.12173. arXiv:2302.12173 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[58]
Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin. Agent Smith : A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast , June 2024. URL http://arxiv.org/abs/2402.08567. arXiv:2402.08567 [cs]
-
[59]
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. Large Language Model based Multi - Agents : A Survey of Progress and Challenges , January 2024. URL https://arxiv.org/abs/2402.01680v2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[60]
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. Defending Against Indirect Prompt Injection Attacks With Spotlighting , March 2024. URL http://arxiv.org/abs/2403.14720. arXiv:2403.14720 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[61]
Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, and Yongfeng Zhang. War and Peace ( WarAgent ): Large Language Model -based Multi - Agent Simulation of World Wars , January 2024. URL http://arxiv.org/abs/2311.17227. arXiv:2311.17227 [cs]
- [62]
-
[63]
Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024
Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, and Gongshen Liu. Flooding Spread of Manipulated Knowledge in LLM - Based Multi - Agent Communities , July 2024. URL http://arxiv.org/abs/2407.07791. arXiv:2407.07791 [cs]
-
[64]
Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, and Tatsunori Hashimoto. Exploiting Programmatic Behavior of LLMs : Dual - Use Through Standard Security Attacks , February 2023. URL http://arxiv.org/abs/2302.05733. arXiv:2302.05733 [cs]
-
[65]
To Eun Kim and Fernando Diaz. Towards Fair RAG : On the Impact of Fair Ranking in Retrieval - Augmented Generation , September 2024. URL http://arxiv.org/abs/2409.11598. arXiv:2409.11598 [cs]
- [66]
- [67]
-
[68]
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging Divergent Thinking in Large Language Models through Multi - Agent Debate , July 2024. URL http://arxiv.org/abs/2305.19118. arXiv:2305.19118 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[69]
AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023
Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, and Qin Chen. AgentSims : An Open - Source Sandbox for Large Language Model Evaluation , August 2023. URL http://arxiv.org/abs/2308.04026. arXiv:2308.04026 [cs]
-
[70]
Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and Universal Prompt Injection Attacks against Large Language Models , March 2024 a . URL http://arxiv.org/abs/2403.04957. arXiv:2403.04957 [cs]
-
[71]
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt Injection attack against LLM -integrated Applications , March 2024 b . URL http://arxiv.org/abs/2306.05499. arXiv:2306.05499 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[72]
Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and Benchmarking Prompt Injection Attacks and Defenses , June 2024 c . URL http://arxiv.org/abs/2310.12815. arXiv:2310.12815 [cs]
-
[73]
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. MathVista : Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts , October 2023. URL https://arxiv.org/abs/2310.02255v3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[74]
Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024
Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, and Amin Karbasi. Tree of Attacks : Jailbreaking Black - Box LLMs Automatically , February 2024. URL http://arxiv.org/abs/2312.02119. arXiv:2312.02119 [cs, stat]
-
[75]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[76]
Generative Agents: Interactive Simulacra of Human Behavior
Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative Agents : Interactive Simulacra of Human Behavior , August 2023. URL http://arxiv.org/abs/2304.03442. arXiv:2304.03442 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[77]
Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. Instruction Tuning with GPT -4, April 2023. URL http://arxiv.org/abs/2304.03277. arXiv:2304.03277 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[78]
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez and Ian Ribeiro. Ignore Previous Prompt : Attack Techniques For Language Models , November 2022. URL http://arxiv.org/abs/2211.09527. arXiv:2211.09527 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[79]
ChatDev: Communicative Agents for Software Development
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. ChatDev : Communicative Agents for Software Development , June 2024. URL http://arxiv.org/abs/2307.07924. arXiv:2307.07924 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[80]
Tool Learning with Large Language Models : A Survey , May 2024
Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool Learning with Large Language Models : A Survey , May 2024. URL http://arxiv.org/abs/2405.17935. arXiv:2405.17935 [cs]
-
[81]
Instruction Defense : Strengthen AI Prompts Against Hacking , a
Sander Schulhoff. Instruction Defense : Strengthen AI Prompts Against Hacking , a . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/instruction
-
[82]
Random Sequence Enclosure : Safeguarding AI Prompts , b
Sander Schulhoff. Random Sequence Enclosure : Safeguarding AI Prompts , b . URL https://learnprompting.org/docs/prompt_hacking/defensive_measures/random_sequence
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.