Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Franck Dernoncourt; Haoyu Han; Li Ma; Mahantesh Halappanavar; Nesreen Ahmed; Ryan Rossi; Utkarsh Sahu; Yue Zhao; Yushun Dong; Yu Wang

arxiv: 2602.09319 · v3 · pith:KTOIJREHnew · submitted 2026-02-10 · 💻 cs.CR

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

Zhisheng Qi , Utkarsh Sahu , Li Ma , Haoyu Han , Ryan Rossi , Franck Dernoncourt , Mahantesh Halappanavar , Nesreen Ahmed

show 4 more authors

Yushun Dong Yue Zhao Yu Zhang Yu Wang

This is my paper

classification 💻 cs.CR

keywords attackbenchmarkdefensegenerationknowledge-extractionattacksdatasetsdiverse

0 comments

read the original abstract

Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications, including enterprise chatbots, healthcare assistants, and agentic memory management. However, recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries, raising serious intellectual property and privacy concerns. While prior work has explored individual attack and defense techniques, the research landscape remains fragmented, spanning heterogeneous retrieval embeddings, diverse generation models, and evaluations based on non-standardized metrics and inconsistent datasets. To address this gap, we introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems. Our benchmark covers broad attack/defense strategies, representative retrieval embedding models, open/closed-source generators, (non) graph-based indexing, all evaluated under a unified experimental framework with standardized protocols across multiple datasets spanning diverse languages. By consolidating the experimental landscape and enabling reproducible, comparable evaluation, this benchmark provides actionable insights and a practical foundation for developing privacy-preserving RAG systems in the face of emerging knowledge extraction threats.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?
cs.CR 2026-05 accept novelty 8.0

GraphIP-Bench shows stealing GNNs is easy at moderate query budgets, most defenses fail to block or reliably trace extraction, and watermarks lose verification power on surrogates while heterophilic graphs are harder ...
GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?
cs.CR 2026-05 unverdicted novelty 7.0

GraphIP-Bench is a new unified benchmark showing GNN model extraction succeeds at moderate query budgets while most defenses fail to prevent it or retain verification signals on surrogates.
GRADE: Graph Representation of LLM Agent Dependency and Execution
cs.LG 2026-06 unverdicted novelty 5.0

GRADE models any LLM agent run as a graph with execution and graded dependency edge layers to enable failure prediction and fault localization across tool, coding, and web agent corpora.