arxiv: 2404.08144 · v2 · pith:CRII4LV4new · submitted 2024-04-11 · 💻 cs.CR · cs.AI

LLM Agents can Autonomously Exploit One-day Vulnerabilities

Richard Fang , Rohan Bindu , Akul Gupta , Daniel Kang This is my paper

Pith reviewed 2026-05-18 04:13 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentsone-day vulnerabilitiesautonomous exploitationcybersecurityCVEGPT-4

0 comments

The pith

GPT-4 agents autonomously exploit 87 percent of tested one-day vulnerabilities when given their CVE descriptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLM agents can find and exploit real one-day vulnerabilities in actual systems without human help beyond an initial description. On a collection of 15 such vulnerabilities, including critical ones, a GPT-4 agent succeeds on 87 percent when supplied with the CVE text. Every other model tested, along with standard vulnerability scanners, succeeds on none. Removing the CVE description causes the GPT-4 agent's success rate to fall to 7 percent. These results suggest that current high-capability agents already carry concrete offensive capabilities against unpatched software.

Core claim

When given the CVE description, a GPT-4 agent autonomously exploits 87 percent of the 15 one-day vulnerabilities, while GPT-3.5, open-source LLMs, ZAP, and Metasploit exploit 0 percent. The same GPT-4 agent exploits only 7 percent when the CVE description is withheld.

What carries the argument

An LLM agent that receives a CVE description and uses tools to probe and modify a target system in order to trigger the described vulnerability.

If this is right

Malicious actors could use similar agents to automate attacks on recently disclosed but unpatched systems.
Organizations running internet-facing software would need faster patching cycles than current norms.
Access controls on tool-using LLM agents become a direct security requirement rather than an optional safeguard.
The dependence on CVE descriptions limits but does not remove the risk for future agent versions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenders might need new monitoring that flags unusual sequences of system calls or web requests generated by automated agents.
Vulnerability disclosure processes could face pressure to delay public CVE text if agents improve at using it.
Testing regimes for new LLM agents should include standardized one-day exploitation benchmarks before public release.

Load-bearing premise

The 15 chosen vulnerabilities and the exact agent tools and prompts used are representative of real-world one-day vulnerabilities and typical LLM agent deployments.

What would settle it

Running the same GPT-4 agent setup on a larger, independently chosen set of one-day vulnerabilities and measuring whether the 87 percent exploitation rate holds or whether success without the CVE description rises substantially above 7 percent.

read the original abstract

LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabilities. In this work, we show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description. When given the CVE description, GPT-4 is capable of exploiting 87% of these vulnerabilities compared to 0% for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit). Fortunately, our GPT-4 agent requires the CVE description for high performance: without the description, GPT-4 can exploit only 7% of the vulnerabilities. Our findings raise questions around the widespread deployment of highly capable LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. It collects a dataset of 15 one-day vulnerabilities (including critical-severity ones) and reports that a GPT-4 agent, when given the CVE description, exploits 87% of them—versus 0% for GPT-3.5, open-source LLMs, and scanners ZAP/Metasploit—while GPT-4 without the CVE description succeeds on only 7%.

Significance. If the empirical results hold under more rigorous controls, the work is significant for demonstrating a concrete performance gap between frontier LLMs and prior systems on autonomous exploitation of real CVEs. It supplies falsifiable, measurable evidence on a specific task and raises timely questions about safe deployment of LLM agents in security-sensitive settings.

major comments (3)

[§3] §3 (Dataset construction): No explicit selection criteria, diversity metrics (vuln type, software, severity distribution, or exploit complexity), or confirmation that test environments match production deployments are provided. This directly undermines the general claim that the 87% rate reflects a property of one-day vulnerabilities rather than a curated sample.
[§4] §4 (Agent architecture and evaluation): The manuscript gives insufficient detail on exact agent architecture, tool access, prompting strategy, success criteria, number of trials, or controls for prompt-engineering variations. Without these, the 87% figure cannot be assessed for robustness or reproducibility.
[§5] Baseline comparison (throughout §5): It is unclear whether ZAP and Metasploit were supplied equivalent CVE descriptions or run under the same environmental constraints as the LLM agent; the 0% result may therefore not constitute a fair head-to-head evaluation.

minor comments (2)

[Abstract] Abstract: 'LLMs have becoming increasingly powerful' contains a grammatical error and should read 'have become'.
[§5] Results lack error bars, confidence intervals, or multiple-run statistics for the reported success rates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important areas for improving the description of our methodology and evaluation. We respond to each major comment in turn and commit to revisions that address the concerns raised.

read point-by-point responses

Referee: §3 (Dataset construction): No explicit selection criteria, diversity metrics (vuln type, software, severity distribution, or exploit complexity), or confirmation that test environments match production deployments are provided. This directly undermines the general claim that the 87% rate reflects a property of one-day vulnerabilities rather than a curated sample.

Authors: We agree with the referee that the manuscript would benefit from more explicit details on how the dataset was constructed. In the revised manuscript, we will add to §3 a description of the selection criteria, which focused on vulnerabilities disclosed within the last year that have public CVE descriptions and affect commonly used software. We selected a mix of vulnerability types including remote code execution, SQL injection, and cross-site scripting to ensure diversity. We will also provide metrics on the distribution of severity levels and exploit complexity. Additionally, we will confirm that the test environments were set up to match production deployments using standard configurations from the software vendors. These changes will better support the generalizability of our 87% success rate to one-day vulnerabilities in real-world systems. revision: yes
Referee: §4 (Agent architecture and evaluation): The manuscript gives insufficient detail on exact agent architecture, tool access, prompting strategy, success criteria, number of trials, or controls for prompt-engineering variations. Without these, the 87% figure cannot be assessed for robustness or reproducibility.

Authors: We recognize that the current description in §4 is insufficient for full reproducibility. We will revise this section to provide comprehensive details on the agent architecture, including the specific tools available to the agent such as command execution and browsing capabilities. The prompting strategy will be described in full, including how the CVE description is integrated into the agent's instructions. We will define the success criteria clearly and report the number of trials conducted per vulnerability along with any measures taken to control for variations in prompting. These additions will allow readers to better evaluate the robustness of the 87% success rate. revision: yes
Referee: Baseline comparison (throughout §5): It is unclear whether ZAP and Metasploit were supplied equivalent CVE descriptions or run under the same environmental constraints as the LLM agent; the 0% result may therefore not constitute a fair head-to-head evaluation.

Authors: We thank the referee for pointing out this potential ambiguity in the baseline evaluation. The ZAP and Metasploit baselines were run in the exact same test environments as the LLM agent, with the vulnerable services deployed identically. However, these tools do not accept CVE descriptions as direct input; they rely on their internal vulnerability databases and scanning logic. In the revised manuscript, we will clarify this in §5 by detailing the configuration parameters used for each tool (e.g., ZAP's active scan on the target URL with specific policy settings, and Metasploit's use of relevant exploit modules matched to the CVE). We will also add text explaining that while the comparison is not identical in input format, it demonstrates the LLM agent's ability to leverage the CVE information effectively where traditional tools fail. This addresses the fairness concern while acknowledging the methodological differences. revision: partial

Circularity Check

0 steps flagged

No circularity: straightforward empirical comparison

full rationale

The paper reports direct experimental results from testing LLM agents on a fixed set of 15 one-day vulnerabilities, measuring success rates when provided CVE descriptions versus baselines (other models and scanners). No derivations, equations, fitted parameters, predictions, or self-citation chains are present that reduce any claim to its inputs by construction. The 87% figure is a measured outcome against external systems, not a tautology or renamed fit. The study is self-contained against its chosen benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical evaluation paper. No mathematical derivations or fitted constants. Relies on assumptions about what constitutes a successful autonomous exploit and on the representativeness of the chosen vulnerabilities.

axioms (1)

domain assumption Providing the official CVE description is a valid test of autonomous exploitation capability.
Performance drops sharply without the description, so the claim depends on this input being acceptable.

pith-pipeline@v0.9.0 · 5717 in / 1209 out tokens · 39866 ms · 2026-05-18T04:13:31.613224+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that LLM agents can autonomously exploit one-day vulnerabilities... GPT-4 is capable of exploiting 87%... ReAct agent framework... 91 lines of code

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks
cs.CR 2026-05 unverdicted novelty 8.0

APIOT is the first LLM framework to complete the full autonomous discovery-to-remediation cycle on bare-metal OT devices, reaching 90% success across 290 runs on Zephyr RTOS.
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
cs.CR 2026-05 unverdicted novelty 7.0

LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
Agentic Vulnerability Reasoning on Windows COM Binaries
cs.CR 2026-05 accept novelty 7.0

SLYP agentic pipeline discovers race condition vulnerabilities in Windows COM binaries and generates debugger-verified PoCs, scoring 0.973 F1 on a 40-case benchmark and finding 28 new confirmed vulnerabilities in prod...
PHANTOM: Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry
cs.CR 2026-05 unverdicted novelty 7.0

PHANTOM raises honeytoken believability from 0.576 to 0.778 by adding organization-specific mimicry, lifting human acceptance to 100% and detection resistance to 0.870.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning
cs.CR 2026-04 unverdicted novelty 7.0

LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
SoK: Honeypots & LLMs, More Than the Sum of Their Parts?
cs.CR 2025-10 unverdicted novelty 7.0

A systematization of knowledge paper that taxonomizes honeypot detection vectors, synthesizes LLM-honeypot literature into canonical architecture and evaluation methods, and proposes a roadmap for autonomous deception...
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
cs.CR 2026-05 unverdicted novelty 6.0

An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.
Towards Optimal Agentic Architectures for Offensive Security Tasks
cs.CR 2026-04 unverdicted novelty 6.0

Empirical comparison of agentic topologies for offensive security shows MAS-Indep reaching 64.2% validated detection while simpler baselines remain competitive on efficiency, with whitebox and web targets outperformin...
An Independent Safety Evaluation of Kimi K2.5
cs.CR 2026-04 conditional novelty 6.0

Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software
cs.SE 2025-12 unverdicted novelty 6.0

RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
cs.SE 2026-04 unverdicted novelty 5.0

Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models
cs.CR 2025-09 unverdicted novelty 5.0

xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion o...
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics
cs.AI 2026-05 unverdicted novelty 4.0

The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using esta...
Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the Mittelstand
cs.CR 2026-05 unverdicted novelty 4.0

Agentic AI lowers the cost and speed of cyber attacks, requiring immediate improvements in identity management, phishing-resistant authentication, patching, and agent governance for large enterprises and the Mittelstand.
CyberAId: AI-Driven Cybersecurity for Financial Service Providers
cs.AI 2026-05 unverdicted novelty 4.0

CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
cs.AI 2025-10 unverdicted novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Large Language Model-Based Agents for Software Engineering: A Survey
cs.SE 2024-09 unverdicted novelty 4.0

A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 18 Pith papers · 13 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Emergent autonomous scientific research capabilities of large language models

Daniil A Boiko, Robert MacKnight, and Gabe Gomes. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332,

work page arXiv
[3]

Augmenting large language models with chemistry tools

Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew White, and Philippe Schwaller. Augmenting large language models with chemistry tools. In NeurIPS 2023 AI for Science Workshop,

work page 2023
[4]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901,

work page 1901
[5]

Getting pwn’d by ai: Penetration testing with large language models

Andreas Happe and J ¨urgen Cito. Getting pwn’d by ai: Penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2082–2086,

work page 2082
[6]

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Dong Huang, Qingwen Bu, Jie M Zhang, Michael Luck, and Heming Cui. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation. arXiv preprint arXiv:2312.13010,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Mistral 7B

Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Mixtral of Experts

Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Exploiting programmatic behavior of llms: Dual-use through standard security attacks

Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, and Tatsunori Hashimoto. Exploiting programmatic behavior of llms: Dual-use through standard security attacks. arXiv preprint arXiv:2302.05733,

work page arXiv
[11]

Augmented Language Models: a Survey

Gr´egoire Mialon, Roberto Dess`ı, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozi`ere, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victo- ria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, et al

URL https://github.com/gpt-engineer-org/ gpt-engineer. Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victo- ria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, et al. Evaluating frontier models for dangerous capabilities. arXiv preprint arXiv:2403.13793,

work page arXiv
[13]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Hen- derson. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Automated vulnerability detection in source code using deep representation learning

Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp. 757–762. IEEE,

work page 2018
[15]

Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Llama 2: Open Foundation and Fine-Tuned Chat Models

URL https://huggingface.co/teknium/ OpenHermes-2.5-Mistral-7B . Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Openchat: Advancing open-source language models with mixed-quality data

Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. Open- chat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235,

work page arXiv
[18]

Tdag: A multi-agent frame- work based on dynamic task decomposition and agent generation

Yaoxiang Wang, Zhiyong Wu, Junfeng Yao, and Jinsong Su. Tdag: A multi-agent frame- work based on dynamic task decomposition and agent generation. arXiv preprint arXiv:2402.10178,

work page arXiv
[19]

Acidrain: Concurrency-related attacks on database- backed web applications

Todd Warszawski and Peter Bailis. Acidrain: Concurrency-related attacks on database- backed web applications. In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 5–20,

work page 2017
[20]

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

{prompt}

Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, and Dahua Lin. Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949,

work page arXiv
[22]

ReAct: Synergizing Reasoning and Acting in Language Models

12 Preprint Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629,

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Benchmarking and defending against indirect prompt injection attacks on large language models

Jingwei Yi, Yueqi Xie, Bin Zhu, Keegan Hines, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197,

work page arXiv
[24]

Removing rlhf protections in gpt-4 via fine-tuning

Qiusi Zhan, Richard Fang, Rohan Bindu, Akul Gupta, Tatsunori Hashimoto, and Daniel Kang. Removing rlhf protections in gpt-4 via fine-tuning. arXiv preprint arXiv:2311.05553,

work page arXiv
[25]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. arXiv preprint arXiv:2403.02691,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Path sensitive static analysis of web applications for remote code execution vulnerability detection

Yunhui Zheng and Xiangyu Zhang. Path sensitive static analysis of web applications for remote code execution vulnerability detection. In 2013 35th International Conference on Software Engineering (ICSE), pp. 652–661. IEEE,

work page 2013
[27]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv