arxiv: 2512.22753 · v2 · submitted 2025-12-28 · 💻 cs.SE

Recognition: no theorem link

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

Moustapha Awwalou Diouf , Maimouna Tamah Diao , Iyiola Emmanuel Olatunji , Abdoul Kader Kabor\'e , Jordan Samhi , Gervais Mendy , Samuel Ouya , Jacques Klein

show 1 more author

Tegawend\'e F. Bissyand\'e

Authors on Pith no claims yet

Pith reviewed 2026-05-16 20:00 UTC · model grok-4.3

classification 💻 cs.SE

keywords LLM manipulationvulnerability exploitationautomated attacksenterprise softwareOdooprompt engineeringsecurity threatsCVE exploitation

0 comments

The pith

Publicly available LLMs can be manipulated to generate functional exploits for every tested CVE in Odoo enterprise software.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a prompting strategy called RSA enables LLMs to produce working exploit code for known vulnerabilities in Odoo, a widely used ERP platform. Five mainstream models were tested, and at least one succeeded for each CVE after only three to five rounds of interaction. This outcome directly challenges long-standing security assumptions that exploitation requires technical expertise or deep code understanding. The authors argue that the same accessibility that lets non-programmers build software now equally enables attacks, forcing a redesign of threat models and defenses.

Core claim

We propose RSA, a pretexting strategy of role-assignment, scenario-pretexting, and action-solicitation that bypasses LLM safety mechanisms to elicit functional exploits. When applied to Odoo CVEs, at least one of GPT-4o, Gemini, Claude, Microsoft Copilot, or DeepSeek produced a working exploit for every tested case within 3-5 prompting rounds, removing the manual effort previously required for such attacks.

What carries the argument

RSA pretexting strategy: a three-component prompting method that assigns the model a role, establishes a justifying scenario, and directly solicits exploit-generating actions.

If this is right

Security models relying on a technical expertise barrier between developers and attackers are invalidated.
The technical complexity of vulnerability descriptions no longer provides meaningful protection.
Traditional boundaries between software-building tools and attack tools dissolve.
Security practices must be redesigned around the reality that prompt crafting alone enables exploitation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

LLM providers may need targeted safeguards that detect and block requests for exploit code even when framed innocuously.
Enterprise deployments could benefit from monitoring or restricting LLM interactions that involve vulnerability details.
The approach might extend to other platforms, raising questions about whether similar prompting succeeds against unknown vulnerabilities.

Load-bearing premise

The generated code consists of original, functional exploits created from the prompts rather than recalled public CVE information.

What would settle it

Execute each generated exploit on a clean Odoo installation with no internet access to CVE databases and confirm whether the exploit succeeds without external references.

Figures

Figures reproduced from arXiv: 2512.22753 by Abdoul Kader Kabor\'e, Gervais Mendy, Iyiola Emmanuel Olatunji, Jacques Klein, Jordan Samhi, Maimouna Tamah Diao, Moustapha Awwalou Diouf, Samuel Ouya, Tegawend\'e F. Bissyand\'e.

**Figure 1.** Figure 1: Geographic Distribution of Odoo Instances in Africa. Data collected via Shodan (August 2025) shows 700+ publicly accessible ERP deployments across 32 countries, illustrating the widespread adoption of this open-source framework in the region. , Vol. 1, No. 1, Article . Publication date: December 2025 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: RSA Prompt Template for Generating Attack Script. 𝑃𝑎 indicates a prompt that explicitly uses the word attack, while 𝑃𝑖 uses the word idea to frame the request more subtly. Our method, Role assignment-Scenario pretexting-Action solicitation (RSA), consists of three phases, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The Process of Reproducing a Vulnerable Odoo Instance for ach CVE. From the odoo GitHub project 1 , we identify the latest version affected by the vulnerability and the corrective commit 2 . Going back to the parent commit, we obtain the vulnerable version, which we then checkout 3 . This unpatched version is then deployed 4 in order to prepare an environment conducive to exploiting the CVE 5 . 3.2.2 Opera… view at source ↗

**Figure 4.** Figure 4: Rookie Workflow: Language models (LLMs) generate attack scripts 2 from specific prompts 1 . A rookie agent executes these scripts on an Odoo instance 3 , then transmits the output to the LLM so that it can refine the script in case the exploit fails 4 . This process is repeated iteratively until the LLM succeeds in exploiting the vulnerability, provides overly general responses, or deviates from the attack… view at source ↗

**Figure 5.** Figure 5: Attack Success Rate of CVE Exploitation by LLM Models. The Venn diagram shows the number of CVEs successfully exploited by each model and their intersections. The percentage of successful exploits for each LLM is shown in parentheses. To assess the practical impact of LLMgenerated exploits, we executed attack scripts produced by different models against Odoo instances configured to replicate real-world d… view at source ↗

**Figure 6.** Figure 6: Format of Attack Script Generated by Claude Opus 4.1 when Prompted with RSA on CVE2018-14885 [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Results of Exploiting CVE-2018-14885 on the Odoo ERP System 11.0. The attack enabled the cloning of the main database, creating new databases with known administrator credentials, and restoring databases from downloaded backup files. Authenticated vs. Unauthenticated Exploits. Contrary to intuition, authenticated CVEs achieve higher attack success rates than unauthenticated ones. Across the five authentica… view at source ↗

**Figure 8.** Figure 8: Number of Queries Required by Each LLM to Generate a Functional Exploit. For a given CVE, the absence of a bar (indicated by ✗) for an LLM model indicates that the model failed to produce a functional attack script. , Vol. 1, No. 1, Article . Publication date: December 2025 [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

LLMs democratize software engineering by enabling non-programmers to create applications, but this same accessibility fundamentally undermines security assumptions that have guided software engineering for decades. We show in this work how publicly available LLMs can be socially engineered to transform novices into capable attackers, challenging the foundational principle that exploitation requires technical expertise. To that end, we propose RSA (Role-assignment, Scenario-pretexting, and Action-solicitation), a pretexting strategy that manipulates LLMs into generating functional exploits despite their safety mechanisms. Testing against Odoo -- a widely used ERP platform, we evaluated five mainstream LLMs (GPT-4o, Gemini, Claude, Microsoft Copilot, and DeepSeek) and successfully exploited every tested CVE: at least one LLM produced a functional exploit for each within 3-5 prompting rounds. While prior work~\cite{jin2025good} found LLM-assisted attacks difficult and requiring manual effort, we demonstrate that this overhead can be eliminated entirely. Our findings invalidate core software engineering security principles: the distinction between technical and non-technical actors no longer provides valid threat models; technical complexity of vulnerability descriptions offers no protection when LLMs can abstract it away; and traditional security boundaries dissolve when the same tools that build software can be manipulated to break it. This represents a paradigm shift in software engineering -- we must redesign security practices for an era where exploitation requires only the ability to craft prompts, not understand code. Artifacts available at: https://anonymous.4open.science/r/From-Rookie-to-Attacker-D8B3.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RSA prompting lets LLMs generate Odoo CVE exploits with no manual effort claimed, but the paper gives no evidence the code actually runs or exploits anything.

read the letter

The main point is that the authors present RSA as a simple three-part prompt pattern that gets mainstream LLMs to output working exploits for real CVEs in Odoo, and they say this removes the manual work reported in earlier studies. They tested five models and report success on every CVE within a few rounds. That is the concrete new piece: a named, repeatable prompting recipe plus the multi-model comparison on an enterprise platform. The artifacts link is also useful for anyone who wants to inspect the prompts or outputs directly. The paper does engage with the prior result it cites and tries to show a practical difference. The central weakness is the missing verification. The abstract states that the generated exploits are functional, yet there is no description of a controlled Odoo instance, no reproduction steps, no success criteria such as observed privilege escalation or data access, and no comparison to public exploit code for the same CVEs. Without those steps it is impossible to rule out simple regurgitation of training data or syntactically plausible but non-working code. That gap directly undercuts the stronger claims about threat models and paradigm shifts. The work is aimed at security researchers and practitioners who care about LLM misuse in real systems. A reader who wants to see prompt patterns and model comparisons will find something here, but anyone needing reproducible attack evidence will not. It deserves peer review so that referees can require the missing validation details and reproduction package; the topic is timely enough that the effort is worth it even if major revisions follow.

Referee Report

2 major / 1 minor

Summary. The paper proposes the RSA (Role-assignment, Scenario-pretexting, and Action-solicitation) prompting strategy to socially engineer publicly available LLMs into generating functional exploits for CVEs in the Odoo ERP platform. It evaluates five mainstream LLMs and claims that at least one LLM produced a working exploit for every tested CVE within 3-5 rounds, arguing that this eliminates prior manual overhead and invalidates core software-engineering security assumptions about the need for technical expertise.

Significance. If the results hold with proper verification, the work would be significant for demonstrating that prompt-based manipulation can reliably bypass LLM safety mechanisms to produce exploits, lowering the barrier for non-experts and challenging threat models that assume exploitation requires code understanding. It extends prior findings on LLM-assisted attacks by claiming zero manual effort and provides an artifact link for reproducibility.

major comments (2)

[Evaluation] Evaluation section: the central claim that 'at least one LLM produced a functional exploit for each' CVE is not supported by any description of the verification procedure. The manuscript supplies no details on the controlled vulnerable Odoo instance, success criteria (e.g., observed shell access, privilege escalation, or data exfiltration), reproduction steps, or controls distinguishing new exploit generation from regurgitation of public CVE information. This is load-bearing for the 'functional' and 'paradigm shift' assertions.
[Results] §3 (RSA strategy) and Results: the paper asserts that RSA eliminates manual effort compared to Jin et al. (2025), yet provides no quantitative breakdown of success rates per LLM per CVE, failure modes, or number of CVEs tested. Without these data, the 'every tested CVE' claim cannot be assessed for robustness or generalizability.

minor comments (1)

[Abstract] The anonymous artifact link should be updated in the camera-ready version to a permanent repository containing the exact prompts, CVE list, and verification scripts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for strengthening the manuscript. We address each major comment below and will revise the paper to provide the requested details on evaluation procedures and quantitative results.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim that 'at least one LLM produced a functional exploit for each' CVE is not supported by any description of the verification procedure. The manuscript supplies no details on the controlled vulnerable Odoo instance, success criteria (e.g., observed shell access, privilege escalation, or data exfiltration), reproduction steps, or controls distinguishing new exploit generation from regurgitation of public CVE information. This is load-bearing for the 'functional' and 'paradigm shift' assertions.

Authors: We agree that the current manuscript does not provide sufficient procedural details in the Evaluation section. In the revision, we will add a dedicated subsection describing the controlled vulnerable Odoo instance (including exact version, configuration, and isolation setup), explicit success criteria (e.g., verified shell access via command execution, privilege escalation confirmed by specific outputs, or data exfiltration via file retrieval), full reproduction steps, and controls such as cross-checking generated exploits against public CVE repositories and LLM knowledge cutoffs to distinguish novel generation from regurgitation. The artifacts at the provided anonymous link already include interaction logs, verification scripts, and environment snapshots; we will reference these more explicitly and include excerpts in the text. revision: yes
Referee: [Results] §3 (RSA strategy) and Results: the paper asserts that RSA eliminates manual effort compared to Jin et al. (2025), yet provides no quantitative breakdown of success rates per LLM per CVE, failure modes, or number of CVEs tested. Without these data, the 'every tested CVE' claim cannot be assessed for robustness or generalizability.

Authors: We acknowledge the need for quantitative detail to support the claims. In the revised Results section, we will include a table reporting the total number of CVEs tested, per-LLM success rates (e.g., which LLMs succeeded on which CVEs within 3-5 rounds), and a breakdown of failure modes (such as refusals, incomplete code, or non-functional outputs). This will directly substantiate the 'at least one LLM per CVE' result and allow evaluation of robustness. The comparison to Jin et al. will be expanded with explicit notes on the absence of manual post-processing in our approach. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results independent of inputs

full rationale

The paper reports an empirical evaluation: the RSA prompting strategy is applied to five LLMs, and success is claimed as an observed outcome ('at least one LLM produced a functional exploit for each within 3-5 prompting rounds') on specific Odoo CVEs. No equations, parameters, derivations, or self-referential definitions appear. The single citation to prior work (jin2025good) is external, not self-citation by these authors, and is not used to justify uniqueness or load-bearing premises. No fitted inputs are renamed as predictions, no ansatzes are smuggled, and no known results are merely relabeled. The central claim is an experimental report rather than a derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the RSA strategy and the assumption that LLMs can produce working code for real vulnerabilities based on the prompting alone.

axioms (1)

domain assumption Publicly available LLMs possess the capability to generate functional exploit code when appropriately prompted despite safety alignments.
This is invoked to explain why RSA works on the tested models.

invented entities (1)

RSA (Role-assignment, Scenario-pretexting, and Action-solicitation) no independent evidence
purpose: A pretexting strategy to manipulate LLMs into generating exploits
Introduced as a new combination of techniques for this purpose.

pith-pipeline@v0.9.0 · 5642 in / 1199 out tokens · 57422 ms · 2026-05-16T20:00:16.747705+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

[1]

Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, et al. 2024. Many-shot jailbreaking.Advances in Neural Information Processing Systems37 (2024), 129696–129742

work page 2024
[2]

Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, and Peter Garraghan. 2023. Model leeching: An extraction attack targeting llms.arXiv preprint arXiv:2309.10544(2023)

work page arXiv 2023
[3]

Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, and Kellin Pelrine. 2025. Scaling trends for data poisoning in LLMs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27206–27214

work page 2025
[4]

Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2024. {PentestGPT}: Evaluating and harnessing large language models for automated penetration testing. In33rd USENIX Security Symposium (USENIX Security 24). 847–864

work page 2024
[5]

Nesara Dissanayake, Mansooreh Zahedi, Asangi Jayatilaka, and Muhammad Ali Babar. 2022. Why, how and where of delays in software security patch management: An empirical investigation in the healthcare sector.Proceedings of the ACM on Human-computer Interaction6, CSCW2 (2022), 1–29

work page 2022
[6]

Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, and Hannaneh Hajishirzi. 2024. Do membership inference attacks work on large language models?arXiv preprint arXiv:2402.07841(2024)

work page arXiv 2024
[7]

Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. 2024. Llm agents can autonomously exploit one-day vulnerabilities.arXiv preprint arXiv:2404.08144(2024)

work page arXiv 2024
[8]

Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang. 2024. Llm agents can autonomously hack websites.arXiv preprint arXiv:2402.06664(2024)

work page arXiv 2024
[9]

Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, and Tao Jiang. 2024. Membership inference attacks against fine-tuned large language models via self-prompt calibration.Advances in Neural Information Processing Systems37 (2024), 134981–135010

work page 2024
[10]

Luca Gioacchini, Marco Mellia, Idilio Drago, Alexander Delsanto, Giuseppe Siracusano, and Roberto Bifulco. 2024. Autopenbench: Benchmarking generative agents for penetration testing.arXiv preprint arXiv:2410.03225(2024)

work page arXiv 2024
[11]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–79

work page 2024
[12]

Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, et al. 2024. Position: Trustllm: Trustworthiness in large language models. InInternational Conference on Machine Learning. PMLR, 20166–20270

work page 2024
[13]

David Jin, Qian Fu, and Yuekang Li. 2025. Good News for Script Kiddies? Evaluating Large Language Models for Automated Exploit Generation. In2025 IEEE Security and Privacy Workshops (SPW). IEEE, 278–282

work page 2025
[14]

Raz Lapid, Ron Langberg, and Moshe Sipper. 2024. Open sesame! universal black-box jailbreaking of large language models.Applied Sciences14, 16 (2024), 7150

work page 2024
[15]

Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, and Bo Han. 2023. Deepinception: Hypnotize large language model to be jailbreaker.arXiv preprint arXiv:2311.03191(2023)

work page arXiv 2023
[16]

Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, and Kai Chen. 2024. Making them ask and answer: Jailbreaking large language models in few queries via disguise and reconstruction. In33rd USENIX Security Symposium (USENIX Security 24). 4711–4728

work page 2024
[17]

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. 2024. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957(2024)

work page arXiv 2024
[18]

Yi Liu, Gelei Deng, Zhengzi Xu, Yuekang Li, Yaowen Zheng, Ying Zhang, Lida Zhao, Tianwei Zhang, Kailong Wang, and Yang Liu. 2023. Jailbreaking chatgpt via prompt engineering: An empirical study.arXiv preprint arXiv:2305.13860 (2023)

work page internal anchor Pith review arXiv 2023
[19]

Shivam Lohani. 2019. Social engineering: Hacking into humans.International Journal of Advanced Studies of Scientific Research4, 1 (2019)

work page 2019
[20]

Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, and Amin Karbasi

work page
[21]

Tree of attacks: Jailbreaking black-box llms automatically.Advances in Neural Information Processing Systems37 (2024), 61065–61105

work page 2024
[22]

Serge Lionel Nikiema, Jordan Samhi, Abdoul Kader Kaboré, Jacques Klein, and Tegawendé F Bissyandé. 2025. The Code Barrier: What LLMs Actually Understand?arXiv preprint arXiv:2504.10557(2025)

work page arXiv 2025
[23]

Jason RC Nurse. 2025. To Patch or Not to Patch: Motivations, Challenges, and Implications for Cybersecurity. In International Conference on Human-Computer Interaction. Springer, 265–281

work page 2025
[24]

2025.official Odoo website

Odoo. 2025.official Odoo website. Retrieved August 20, 2025 from https://www.odoo.com/partners/ , Vol. 1, No. 1, Article . Publication date: December 2025. From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software 19

work page 2025
[25]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining zero-shot vulnerability repair with large language models. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2339–2356

work page 2023
[26]

Mark Russinovich, Ahmed Salem, and Ronen Eldan. 2024. Great, now write an article about that: The crescendo multi-turn llm jailbreak attack.arXiv preprint arXiv:2404.018332, 6 (2024), 17

work page arXiv 2024
[27]

Salimata Sawadogo, Aminata Sabane, Rodrique Kafando, Abdoul Kader Kabore, and Tegawendé F Bissyande. 2025. Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model. In2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 36–44

work page 2025
[28]

Zeyang Sha and Yang Zhang. 2024. Prompt stealing attacks against large language models.arXiv preprint arXiv:2402.12959(2024)

work page arXiv 2024
[29]

Haoye Tian, Weiqi Lu, Tsz On Li, Xunzhu Tang, Shing-Chi Cheung, Jacques Klein, and Tegawendé F Bissyandé. 2023. Is ChatGPT the ultimate programming assistant–how far is it?arXiv preprint arXiv:2304.11938(2023)

work page arXiv 2023
[30]

Norbert Tihanyi, Yiannis Charalambous, Ridhi Jain, Mohamed Amine Ferrag, and Lucas C Cordeiro. 2025. A new era in software security: Towards self-healing software via large language models and formal verification. In2025 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 136–147

work page 2025
[31]

Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. 2023. Poisoning language models during instruction tuning. InInternational Conference on Machine Learning. PMLR, 35413–35425

work page 2023
[32]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems36 (2023), 80079–80110

work page 2023
[33]

Zeguan Xiao, Yan Yang, Guanhua Chen, and Yun Chen. 2024. Distract Large Language Models for Automatic Jailbreak Attack. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 16230–16244

work page 2024
[34]

Jiacen Xu, Jack W Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, and Zhou Li. 2024. Autoattacker: A large language model guided system to implement automatic cyber-attacks.arXiv preprint arXiv:2403.01038(2024)

work page arXiv 2024
[35]

Jiahao Yu, Xingwei Lin, Zheng Yu, and Xinyu Xing. 2023. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts.arXiv preprint arXiv:2309.10253(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi. 2024. How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 14322–14350

work page 2024
[37]

Xiaoyu Zhang, Cen Zhang, Tianlin Li, Yihao Huang, Xiaojun Jia, Ming Hu, Jie Zhang, Yang Liu, Shiqing Ma, and Chao Shen. 2025. Jailguard: A universal detection framework for prompt-based attacks on llm systems.ACM Transactions on Software Engineering and Methodology(2025)

work page 2025
[38]

Zheng Zhang, Peilin Zhao, Deheng Ye, and Hao Wang. 2025. Enhancing Jailbreak Attacks on LLMs via Persona Prompts.arXiv preprint arXiv:2507.22171(2025)

work page arXiv 2025
[39]

Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, and Yushun Dong. 2025. A survey on model extraction attacks and defenses for large language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 6227–6236

work page 2025
[40]

Yuxuan Zhu, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, and Daniel Kang. 2024. Teams of llm agents can exploit zero-day vulnerabilities.arXiv preprint arXiv:2406.01637(2024). , Vol. 1, No. 1, Article . Publication date: December 2025

work page arXiv 2024