A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

Bo Tang; Seyed Bagher Hashemi Natanzi

arxiv: 2606.31639 · v1 · pith:LL5EG7L6new · submitted 2026-06-30 · 💻 cs.CR · cs.AI· cs.GT· cs.LO

A Lifecycle and Application-Stack Survey of Large Language Model Vulnerabilities: Attacks, Risks, Defenses, and Open Problems

Seyed Bagher Hashemi Natanzi , Bo Tang This is my paper

Pith reviewed 2026-07-01 04:41 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.GTcs.LO

keywords large language modelsvulnerability surveylifecycle securityLLM attackssecurity objectivestrust boundariesagent systemsdefenses

0 comments

The pith

LLM vulnerabilities must be analyzed across an eight-stage lifecycle and application stack because attacks and risks affect eight distinct security objectives and point defenses rarely compose.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper organizes attacks on large language model systems by dividing their operation into eight stages from data collection through deployment and maintenance. At each stage it examines attacker capabilities, the security objectives that are threatened, representative attacks, and available defenses. The survey maps these vulnerabilities onto eight objectives including confidentiality, integrity, availability, safety, privacy, fairness, accountability, and agency-control. A reader would care because modern LLMs are embedded in retrieval pipelines, coding tools, robotic systems, and autonomous agents that handle private data and execute actions, turning isolated model flaws into system-wide failures.

Core claim

The paper claims that LLM system vulnerabilities are best systematized by organizing attacks across eight stages—data collection, pretraining, post-training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, and deployment/maintenance—and by mapping LLM-specific vulnerabilities to the objectives of confidentiality, integrity, availability, safety, privacy, fairness, accountability, and agency-control. This lens reveals where trust boundaries fail, how untrusted data becomes executable instruction, how delegated authority amplifies model errors, and why point defenses rarely compose into overall security.

What carries the argument

The eight-stage lifecycle model of LLM systems paired with the mapping of vulnerabilities to eight security objectives.

If this is right

Compositional security analysis is required rather than stage-isolated defenses.
Provenance-aware retrieval mechanisms are needed to stop untrusted data from becoming instructions.
Tool-call containment is necessary to limit the impact of errors in agent execution.
Long-horizon evaluation of agents and realistic red teaming are needed to assess real deployment risks.
Privacy-preserving adaptation and deployment-grade incident response remain open research needs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be used to audit a deployed coding assistant by tracing a single data-poisoning attack through retrieval, prompting, and tool execution stages.
Fairness failures in pretraining may interact with agency-control failures during long-horizon agent tasks in ways the current mapping leaves unexamined.
Stage-specific logging could support accountability by allowing incident responders to identify the exact lifecycle point where control was lost.
New benchmarks that simulate cross-stage attacks would be a direct way to test whether the claimed non-composition of defenses holds in practice.

Load-bearing premise

That the chosen eight-stage lifecycle division and the mapping to security objectives accurately reveal where trust boundaries fail and why point defenses rarely compose.

What would settle it

An empirical demonstration that a small number of point defenses do compose to protect an end-to-end LLM application across all eight stages, or a study showing that observed attacks cluster in patterns that the eight-stage division does not predict.

Figures

Figures reproduced from arXiv: 2606.31639 by Bo Tang, Seyed Bagher Hashemi Natanzi.

**Figure 1.** Figure 1: Lifecycle and application-stack view of LLM vulnerabilities. The same attack technique can appear at multiple stages, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Defense-in-depth architecture for LLM applications. Reliable mitigation cannot rely only on model behavior; [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

Large language models are no longer only text generators. They are increasingly embedded in retrieval pipelines, enterprise assistants, coding environments, robotic systems, security-operation workflows, and autonomous agents that can read private data, call tools, write files, execute code, and act across organizational boundaries. This shift changes the security problem: risks do not arise from the model weights alone, but from the full lifecycle and application stack through which data, prompts, model outputs, tools, memories, and user authority interact. This paper systematizes the literature on vulnerabilities in large language model systems through a lifecycle and application-stack lens. We organize attacks across eight stages: data collection, pretraining, post-training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, and deployment/maintenance. For each stage, we analyze attacker capabilities, affected security objectives, representative attacks, practical risks, evaluation practices, and defenses. We further map LLM-specific vulnerabilities to confidentiality, integrity, availability, safety, privacy, fairness, accountability, and agency-control objectives. Unlike taxonomies that list isolated attack names, the proposed systematization emphasizes where trust boundaries fail, how untrusted data becomes executable instruction, how delegated authority amplifies model errors, and why point defenses rarely compose. We close with a research agenda for secure LLM systems, including compositional security, provenance-aware retrieval, tool-call containment, long-horizon agent evaluation, privacy-preserving adaptation, realistic red teaming, and deployment-grade incident response.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes LLM vulnerabilities across an eight-stage lifecycle and maps them to security objectives, giving a practical way to see where point defenses fall short.

read the letter

The main takeaway is that this paper is a survey that organizes existing LLM security work around an eight-stage lifecycle (data collection through deployment) and links vulnerabilities to objectives like confidentiality, integrity, privacy, and agency-control. The framing aims to show trust boundary failures and why defenses rarely compose across the stack.

It does a solid job breaking down each stage with attacker capabilities, representative attacks, practical risks, evaluation methods, and defenses. The sections on retrieval, tool execution, and prompting pull together literature in a way that highlights how untrusted data turns into instructions or how delegated authority spreads errors. The research agenda at the end, covering compositional security, tool-call containment, and realistic red teaming, gives readers clear directions for follow-up work.

Soft spots are limited and typical for surveys. The value depends on how complete and accurate the coverage of cited papers turns out to be; any thin sections on a particular stage or overlooked recent results would reduce its usefulness. The claim that the chosen stages best reveal non-composition is reasonable but will need concrete cross-stage examples in the text to carry weight. No load-bearing inconsistencies show up in the structure.

This is for researchers and engineers working on LLM applications or security who want a reference to locate gaps rather than a source of new theorems or experiments. A reader focused on agent systems or enterprise deployment would find the mapping to objectives helpful.

I would send it to peer review so referees can check balance and suggest additions.

Referee Report

0 major / 2 minor

Summary. The paper claims to systematize the literature on LLM system vulnerabilities via an eight-stage lifecycle (data collection, pretraining, post-training alignment, model packaging and supply chain, retrieval and memory, prompting and inference, tool/agent execution, deployment/maintenance) and a mapping of vulnerabilities to eight security objectives (confidentiality, integrity, availability, safety, privacy, fairness, accountability, agency-control). It argues that this lens reveals trust-boundary failures, how untrusted data becomes executable, how delegated authority amplifies errors, and why point defenses rarely compose, and closes with a research agenda on compositional security, provenance-aware retrieval, tool-call containment, and related topics.

Significance. If the survey delivers balanced coverage and accurate representation of cited works, the lifecycle-plus-objectives framework would be a useful organizing contribution for the field. It shifts focus from isolated model attacks to full-stack interactions and supplies a concrete research agenda with falsifiable directions such as long-horizon agent evaluation and deployment-grade incident response. The absence of new derivations or fitted parameters is appropriate for a survey; the value lies in the systematization itself.

minor comments (2)

Abstract: the eight stages are listed but the mapping to the eight security objectives is only named; adding one concrete cross-stage example (e.g., how a data-collection poisoning attack affects both integrity and agency-control) would improve reader orientation without lengthening the abstract excessively.
Throughout: ensure every stage section explicitly states the attacker capabilities, affected objectives, representative attacks, risks, evaluation practices, and defenses as promised; if any stage omits one of these six elements, the claimed uniformity of treatment is weakened.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary and positive assessment of the paper's contribution. The recommendation for minor revision is noted, but the report contains no specific major comments requiring response or changes.

Circularity Check

0 steps flagged

No significant circularity: survey organizes literature without derivations or self-referential reductions

full rationale

The paper is a literature survey that proposes an eight-stage lifecycle framework and a mapping to security objectives as an organizing lens. No equations, fitted parameters, predictions, or derivations appear in the abstract or described structure. The central contribution is systematization of existing attacks and defenses; the framework is presented explicitly as an organizing device rather than a theorem or empirical result derived from its own inputs. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results as new derivations are detectable. The assumption that the chosen stages reveal trust-boundary failures is a modeling choice, not a circular reduction. This matches the default expectation for non-circular survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper the contribution is organizational. No free parameters, new axioms, or invented entities are introduced; the structure rests on standard security concepts and the cited body of LLM vulnerability literature.

pith-pipeline@v0.9.1-grok · 5818 in / 1123 out tokens · 30842 ms · 2026-07-01T04:41:00.972156+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 30 canonical work pages · 20 internal anchors

[1]

Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brun- skill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–1901,

1901
[4]

X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks on deep learning systems using data poisoning.arXiv preprint arXiv:1712.05526,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

K. Chu. A systematic survey of security threats and defenses in LLM-based AI agents: A layered attack surface framework. arXiv preprint arXiv:2604.23338,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

SoK: The attack surface of agentic AI — tools, and autonomy,

A. Dehghantanha et al. SoK: The attack surface of agentic AI – tools, and autonomy.arXiv preprint arXiv:2603.22928,

work page arXiv
[7]

G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, and S. Rass. PentestGPT: An LLM-empowered automatic penetration testing tool.arXiv preprint arXiv:2308.06782,

work page arXiv
[8]

Derczynski, E

L. Derczynski, E. Galinkin, J. Martin, S. Majumdar, N. Inie, et al. garak: A framework for security probing large language models.arXiv preprint arXiv:2406.11036,

work page arXiv
[9]

Derner and K

E. Derner and K. Batistic. A security risk taxonomy for large language models.arXiv preprint arXiv:2309.06899,

work page arXiv
[10]

Checkmarks indicate primary coverage; partial coverage indicates that the topic is discussed but not used as an organizing axis

16 Table 7: Positioning of this survey relative to representative surveys and practitioner frameworks. Checkmarks indicate primary coverage; partial coverage indicates that the topic is discussed but not used as an organizing axis. Work / framework Lifecycle RAG Agents Privacy Supply chain Defenses EvaluationMain distinction LLM security/privacy surveys (...

2025
[11]

partial partial partial partial✓ ✓ emphasize model capabilities, alignment, and evaluation Agent-security surveys (Chu, 2026; Ferrag et al., 2025; Ling et al.,

2026
[12]

✓partial partial✓partial✓partial general adversarial ML terminology and lifecycle stages This survey✓ ✓ ✓ ✓ ✓ ✓ ✓unifies lifecycle stage, application stack, security objective, attacker capability, and defense layer M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah. From prompt injections to protocol ex- ploits: Threats in LLM-pow...

work page arXiv
[13]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y . Bai, S. Ka- davath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team. Gemini: A family of highly capable multimodal models.arXiv preprint arXiv:2312.11805,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, and T. Hashimoto. Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks.arXiv preprint arXiv:2302.05733,

work page arXiv
[16]

Y . Ling, S. Yu, Z. Chen, and C. Fang. Toward secure LLM agents: Threat surfaces, attacks, defenses, and evaluation. arXiv preprint arXiv:2606.10749,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Y . Liu, G. Deng, Z. Xu, Y . Li, Y . Zheng, Y . Zhang, L. Zhao, T. Zhang, K. Wang, and Y . Liu. Prompt injection at- tack against LLM-integrated applications.arXiv preprint arXiv:2306.05499, 2023a. Y . Liu, Y . Yao, J. Ton, X. Zhang, R. Cheng, Y . Klochkov, M. F. Taufiq, and H. Li. Trustworthy LLMs: A survey and guideline for evaluating large language mod...

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Accessed 2026-06-29

URL https:// atlas.mitre.org/. Accessed 2026-06-29. M. Nasr, R. Shokri, and A. Houmansadr. Comprehensive pri- vacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. IEEE Symposium on Security and Privacy,

2026
[19]

A Comprehensive Overview of Large Language Models

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Us- man, N. Akhtar, N. Barnes, and A. Mian. A comprehen- sive overview of large language models.arXiv preprint arXiv:2307.06435,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

GPT-4 Technical Report

OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

OWASP top 10 for large language model applications 2025,

OWASP GenAI Security Project. OWASP top 10 for large language model applications 2025,

2025
[22]

Ignore Previous Prompt: Attack Techniques For Language Models

URL https:// genai.owasp.org/llm-top-10/. Accessed 2026- 06-29. F. Perez and I. Ribeiro. Ignore previous prompt: At- tack techniques for language models.arXiv preprint arXiv:2211.09527,

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

X. Qi, Y . Zeng, T. Xie, P.-Y . Chen, R. Jia, P. Mittal, and P. Hen- derson. Fine-tuning aligned language models compromises safety, even when users do not intend to!arXiv preprint arXiv:2310.03693,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Er- mon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model.arXiv preprint arXiv:2305.18290,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

A. Wei, N. Haghtalab, and J. Steinhardt. Jailbroken: How does LLM safety training fail?arXiv preprint arXiv:2307.02483,

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Ethical and social risks of harm from Language Models

L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, A. Glaese, B. Balle, A. Kasirzadeh, et al. Ethical and social risks of harm from language models.arXiv preprint arXiv:2112.04359,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Xiong, X

K. Xiong, X. Liu, P. Zhang, et al. Towards understanding the security risks of retrieval-augmented generation.arXiv preprint arXiv:2404.13093,

work page arXiv
[29]

Xu et al

W. Xu et al. A survey of attacks on large language models. arXiv preprint arXiv:2505.12567,

work page arXiv
[30]

Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang. A survey on large language model security and privacy: The good, the bad, and the ugly.arXiv preprint arXiv:2312.02003,

work page arXiv
[31]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

R. Zhang et al. AgentDojo: A dynamic environment to eval- uate attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352,

work page internal anchor Pith review Pith/arXiv arXiv
[32]

W. Zhao, Z. Li, P. Zhang, and J. Sun. ClawGuard: A runtime security framework for tool-augmented LLM agents against indirect prompt injection.arXiv preprint arXiv:2604.11790,

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Y . Zhao, C. Wu, B. Li, et al. Poisoning retrieval cor- pora for retrieval-augmented generation.arXiv preprint arXiv:2402.07867,

work page arXiv
[34]

18 D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Rad- ford, D. Amodei, P. Christiano, and G. Irving. Fine-tuning language models from human preferences.arXiv preprint arXiv:1909.08593,

work page internal anchor Pith review Pith/arXiv arXiv 1909
[35]

A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Y . Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brun- skill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–1901,

1901

[4] [4]

X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks on deep learning systems using data poisoning.arXiv preprint arXiv:1712.05526,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

K. Chu. A systematic survey of security threats and defenses in LLM-based AI agents: A layered attack surface framework. arXiv preprint arXiv:2604.23338,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

SoK: The attack surface of agentic AI — tools, and autonomy,

A. Dehghantanha et al. SoK: The attack surface of agentic AI – tools, and autonomy.arXiv preprint arXiv:2603.22928,

work page arXiv

[7] [7]

G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, and S. Rass. PentestGPT: An LLM-empowered automatic penetration testing tool.arXiv preprint arXiv:2308.06782,

work page arXiv

[8] [8]

Derczynski, E

L. Derczynski, E. Galinkin, J. Martin, S. Majumdar, N. Inie, et al. garak: A framework for security probing large language models.arXiv preprint arXiv:2406.11036,

work page arXiv

[9] [9]

Derner and K

E. Derner and K. Batistic. A security risk taxonomy for large language models.arXiv preprint arXiv:2309.06899,

work page arXiv

[10] [10]

Checkmarks indicate primary coverage; partial coverage indicates that the topic is discussed but not used as an organizing axis

16 Table 7: Positioning of this survey relative to representative surveys and practitioner frameworks. Checkmarks indicate primary coverage; partial coverage indicates that the topic is discussed but not used as an organizing axis. Work / framework Lifecycle RAG Agents Privacy Supply chain Defenses EvaluationMain distinction LLM security/privacy surveys (...

2025

[11] [11]

partial partial partial partial✓ ✓ emphasize model capabilities, alignment, and evaluation Agent-security surveys (Chu, 2026; Ferrag et al., 2025; Ling et al.,

2026

[12] [12]

✓partial partial✓partial✓partial general adversarial ML terminology and lifecycle stages This survey✓ ✓ ✓ ✓ ✓ ✓ ✓unifies lifecycle stage, application stack, security objective, attacker capability, and defense layer M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah. From prompt injections to protocol ex- ploits: Threats in LLM-pow...

work page arXiv

[13] [13]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y . Bai, S. Ka- davath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team. Gemini: A family of highly capable multimodal models.arXiv preprint arXiv:2312.11805,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, and T. Hashimoto. Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks.arXiv preprint arXiv:2302.05733,

work page arXiv

[16] [16]

Y . Ling, S. Yu, Z. Chen, and C. Fang. Toward secure LLM agents: Threat surfaces, attacks, defenses, and evaluation. arXiv preprint arXiv:2606.10749,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Y . Liu, G. Deng, Z. Xu, Y . Li, Y . Zheng, Y . Zhang, L. Zhao, T. Zhang, K. Wang, and Y . Liu. Prompt injection at- tack against LLM-integrated applications.arXiv preprint arXiv:2306.05499, 2023a. Y . Liu, Y . Yao, J. Ton, X. Zhang, R. Cheng, Y . Klochkov, M. F. Taufiq, and H. Li. Trustworthy LLMs: A survey and guideline for evaluating large language mod...

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Accessed 2026-06-29

URL https:// atlas.mitre.org/. Accessed 2026-06-29. M. Nasr, R. Shokri, and A. Houmansadr. Comprehensive pri- vacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. IEEE Symposium on Security and Privacy,

2026

[19] [19]

A Comprehensive Overview of Large Language Models

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Us- man, N. Akhtar, N. Barnes, and A. Mian. A comprehen- sive overview of large language models.arXiv preprint arXiv:2307.06435,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

GPT-4 Technical Report

OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

OWASP top 10 for large language model applications 2025,

OWASP GenAI Security Project. OWASP top 10 for large language model applications 2025,

2025

[22] [22]

Ignore Previous Prompt: Attack Techniques For Language Models

URL https:// genai.owasp.org/llm-top-10/. Accessed 2026- 06-29. F. Perez and I. Ribeiro. Ignore previous prompt: At- tack techniques for language models.arXiv preprint arXiv:2211.09527,

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

X. Qi, Y . Zeng, T. Xie, P.-Y . Chen, R. Jia, P. Mittal, and P. Hen- derson. Fine-tuning aligned language models compromises safety, even when users do not intend to!arXiv preprint arXiv:2310.03693,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Er- mon, and C. Finn. Direct preference optimization: Your language model is secretly a reward model.arXiv preprint arXiv:2305.18290,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

A. Wei, N. Haghtalab, and J. Steinhardt. Jailbroken: How does LLM safety training fail?arXiv preprint arXiv:2307.02483,

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Ethical and social risks of harm from Language Models

L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang, M. Cheng, A. Glaese, B. Balle, A. Kasirzadeh, et al. Ethical and social risks of harm from language models.arXiv preprint arXiv:2112.04359,

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

Xiong, X

K. Xiong, X. Liu, P. Zhang, et al. Towards understanding the security risks of retrieval-augmented generation.arXiv preprint arXiv:2404.13093,

work page arXiv

[29] [29]

Xu et al

W. Xu et al. A survey of attacks on large language models. arXiv preprint arXiv:2505.12567,

work page arXiv

[30] [30]

Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang. A survey on large language model security and privacy: The good, the bad, and the ugly.arXiv preprint arXiv:2312.02003,

work page arXiv

[31] [31]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

R. Zhang et al. AgentDojo: A dynamic environment to eval- uate attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352,

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

W. Zhao, Z. Li, P. Zhang, and J. Sun. ClawGuard: A runtime security framework for tool-augmented LLM agents against indirect prompt injection.arXiv preprint arXiv:2604.11790,

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Y . Zhao, C. Wu, B. Li, et al. Poisoning retrieval cor- pora for retrieval-augmented generation.arXiv preprint arXiv:2402.07867,

work page arXiv

[34] [34]

18 D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Rad- ford, D. Amodei, P. Christiano, and G. Irving. Fine-tuning language models from human preferences.arXiv preprint arXiv:1909.08593,

work page internal anchor Pith review Pith/arXiv arXiv 1909

[35] [35]

A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv