Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Haoyang Li; Jason Chen Zhang; Lei Chen; Mingtao Zhang; Nicole Hu; Qing Li; Yongqi Zhang; Yuming Xu; Zhiyuan Wen; Zhuohan Ge

arxiv: 2604.08304 · v2 · pith:VD5RVSSKnew · submitted 2026-04-09 · 💻 cs.CR · cs.AI

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Yuming Xu , Mingtao Zhang , Zhuohan Ge , Haoyang Li , Nicole Hu , Yongqi Zhang , Zhiyuan Wen , Jason Chen Zhang

show 2 more authors

Qing Li Lei Chen

This is my paper

Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords Retrieval-Augmented GenerationRAG SecurityKnowledge PipelineTaxonomy of AttacksTrust BoundariesContext ExploitationLLM VulnerabilitiesDefense Mechanisms

0 comments

The pith

Secure RAG is fundamentally about protecting the external knowledge-access pipeline rather than the language model alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that retrieval-augmented generation introduces distinct security risks through its access to external knowledge sources, which must be separated from flaws already present in the base language model. It draws an operational boundary to isolate RAG-specific threats and then maps the entire workflow into six stages grouped by three trust boundaries and four security surfaces. This structure organizes known attacks such as knowledge corruption before retrieval, manipulation during retrieval, exploitation of retrieved context, and exfiltration of knowledge. The review shows that existing defenses remain mostly reactive and fragmented across these surfaces, and it outlines the need for protection that spans the full knowledge-access lifecycle.

Core claim

Secure RAG requires treating the external knowledge-access pipeline as the primary security concern. The workflow is abstracted into six stages and organized around three trust boundaries plus four primary security surfaces—pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration—to separate RAG-introduced or RAG-amplified threats from inherent LLM weaknesses. The survey of attacks, defenses, and benchmarks under this view reveals that current protections are largely reactive and incomplete, motivating future work on layered, boundary-aware safeguards across the entire pipeline.

What carries the argument

The six-stage RAG workflow abstraction organized by three trust boundaries and four security surfaces that classifies attacks and defenses specific to the external knowledge pipeline.

If this is right

Attacks on RAG can be systematically placed into pre-retrieval corruption, retrieval manipulation, context exploitation, or exfiltration categories.
Defenses must address each security surface rather than relying on isolated fixes inside the language model.
Evaluation benchmarks should test resilience at the identified trust boundaries across the full pipeline.
Remediation should shift from reactive patches to coordinated protection spanning the entire knowledge-access lifecycle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline-boundary approach could help secure other external-tool or memory-augmented AI systems beyond RAG.
Production deployments could reduce risk by enforcing controls at each trust boundary instead of adding post-hoc filters.
A next step would be to build automated tools that verify whether a given threat truly respects the four-surface taxonomy.
Design-time choices in how retrieval indexes are built and updated may prove more effective than runtime detection alone.

Load-bearing premise

The proposed division of the RAG workflow into six stages and its mapping onto three trust boundaries and four security surfaces fully captures every RAG-specific threat without important omissions or overlaps.

What would settle it

A documented attack that targets the knowledge pipeline yet fits none of the four security surfaces or crosses the stated operational boundary between LLM flaws and RAG-introduced risks.

Figures

Figures reproduced from arXiv: 2604.08304 by Haoyang Li, Jason Chen Zhang, Lei Chen, Mingtao Zhang, Nicole Hu, Qing Li, Yongqi Zhang, Yuming Xu, Zhiyuan Wen, Zhuohan Ge.

**Figure 2.** Figure 2: RAG knowledge-access pipeline, security surfaces, and trust boundaries. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Taxonomy of RAG Attack Methods. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Taxonomy of RAG Defense and Remediation Mechanisms. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces security risks that existing work often conflates with inherent LLM flaws. We frame secure RAG as securing external knowledge access and organize the literature with SLOT, a taxonomy along four axes: the attack Surface (S) where an adversary acts, the defense Layer (L) that controls the same point, the Objective (O) it breaks following the CIA properties, and the Target (T) it pursues, from a single known query (T1) to target-claim manipulation across a query distribution (T2). Mapping attacks, defenses, remediation, and evaluation onto a six-stage knowledge-access pipeline, we expose two structural mismatches. Finally, we discuss directions for more realistic targets, no-blind-spot and adaptively evaluated defenses, stronger confidentiality, and evaluation for multimodal and agentic RAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a clear taxonomy for RAG security risks by drawing a boundary around the knowledge pipeline, but it is mainly a survey of existing work.

read the letter

The paper gives a practical taxonomy for RAG security by separating RAG-specific threats from general LLM weaknesses. It does this with an operational boundary and breaks the process into six stages under three trust boundaries and four security surfaces. What is new is the way they frame secure RAG as pipeline security. This leads to a clean grouping of attacks: knowledge corruption before retrieval, manipulation at retrieval time, exploiting the context downstream, and exfiltrating knowledge. They review the corresponding attacks, defenses, and benchmarks, and note that defenses tend to be reactive. The paper does this review well. The categories line up with how RAG systems are built, so the taxonomy feels usable. It pulls together work that was scattered and points to future work on more proactive, layered protections. The main limitation is that it is a literature organization exercise. The strength of the taxonomy depends on how complete their coverage of the papers is, and the abstract does not show the search method or total count. The reactive nature of defenses is stated but not backed by a quantitative breakdown or comparison to non-RAG cases. No new experiments test whether the boundary actually helps in practice. This paper is for people in LLM security who want a structured map of RAG risks. A reader who needs to cite an overview or identify gaps will get value from it. It deserves a serious referee because the framing is clear and the topic matters as RAG becomes common. I would send it for peer review.

Referee Report

2 major / 3 minor

Summary. The paper claims that secure RAG is fundamentally about the security of the external knowledge-access pipeline. It establishes an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. The RAG workflow is abstracted into six stages and organized around three trust boundaries and four primary security surfaces (pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration). Through a systematic review of attacks, defenses, remediation mechanisms, and evaluation benchmarks, the paper reveals that current defenses remain largely reactive and fragmented, and discusses future directions toward layered, boundary-aware protection across the knowledge-access lifecycle.

Significance. If the taxonomy comprehensively captures RAG-specific threats without substantial overlap or omission, the work supplies a useful organizing framework that clarifies the distinction between LLM-inherent and RAG-specific risks. This separation can help focus future security efforts on the external knowledge pipeline. The survey of attacks, defenses, and benchmarks, together with the identification of reactive and fragmented defenses, could usefully guide research toward more integrated, boundary-aware protections. The paper's conceptual contribution is its operational boundary and staged workflow abstraction.

major comments (2)

[RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.
[Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).

minor comments (3)

[Abstract] The abstract states that the literature is 'systematically reviewed' but does not indicate the search terms, databases, or date range used; adding this information would allow readers to assess coverage.
[Introduction and taxonomy sections] The terms 'trust boundaries' and 'security surfaces' are used throughout without an initial consolidated definition or glossary; a brief formal definition at first use would improve accessibility.
[Future directions section] The future-directions discussion lists high-level recommendations but does not map them back to specific security surfaces or stages; adding such a mapping would strengthen the link to the proposed taxonomy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment point by point below, providing clarification on our methodology while committing to targeted revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.

Authors: We appreciate this observation. The six workflow stages are derived by decomposing the canonical RAG pipeline (query encoding, retrieval, context augmentation, generation, post-processing, and output) as established in foundational RAG literature, with each stage mapped to points where external knowledge crosses into the LLM. The three trust boundaries are defined operationally as the interfaces separating the untrusted external knowledge store, the retrieval mechanism, and the LLM generation process. The four security surfaces emerge directly from these boundaries by identifying where RAG-specific threats (as opposed to inherent LLM vulnerabilities) can be introduced or amplified: pre-retrieval corruption of the knowledge base, retrieval-time manipulation of access or ranking, post-retrieval context exploitation within the prompt, and post-generation exfiltration of sensitive retrieved content. While the manuscript presents the resulting taxonomy, it does not include an explicit subsection detailing this derivation process or inclusion criteria. We will add such a subsection in the revision, including a table that enumerates each category with its derivation rationale and explicit criteria for assigning attacks and defenses, thereby making the separation of RAG-introduced threats repeatable and transparent. revision: partial
Referee: [Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).

Authors: The referee correctly identifies that our assessment of defenses as 'largely reactive and fragmented' is a qualitative synthesis drawn from reviewing the literature across the four security surfaces. Reactive defenses are those that detect or mitigate threats after they have manifested (e.g., output filtering or anomaly detection on generated responses), while proactive ones intervene at the trust boundaries before threats propagate (e.g., knowledge sanitization or retrieval-time access controls). Fragmentation is evidenced by the concentration of existing work on isolated surfaces without cross-boundary integration. We acknowledge that the manuscript does not provide explicit classification criteria or quantitative metrics such as overlap counts. In the revised version, we will add a dedicated table that enumerates all reviewed defense categories with their assigned classification (reactive/proactive), the security surface they address, and a brief note on observed overlaps or gaps. This will make the synthesis more rigorous while preserving the central observation that motivates the future directions toward layered protections. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a taxonomy and literature-organization paper with no derivations, equations, predictions, or fitted parameters. The central proposal (secure RAG as protection of the external knowledge-access pipeline, plus an operational boundary separating LLM-inherent from RAG-specific threats) is a definitional framing used to structure the survey around six workflow stages, three trust boundaries, and four security surfaces. These abstractions are presented as an organizing lens rather than derived results; the paper surveys existing attacks, defenses, and benchmarks without reducing any claim to self-referential inputs or load-bearing self-citations. The structure is self-contained against external literature and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that RAG threats can be cleanly separated from inherent LLM flaws via an operational boundary and that the workflow decomposes into six stages without loss of security-relevant interactions; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption The RAG workflow can be abstracted into six distinct stages that align with three trust boundaries and four primary security surfaces.
Invoked to organize the literature review and separate RAG-specific risks.

pith-pipeline@v0.9.0 · 5491 in / 1280 out tokens · 36288 ms · 2026-05-10T17:48:42.890085+00:00 · methodology

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)