Recognition: unknown
E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems
Pith reviewed 2026-05-09 19:23 UTC · model grok-4.3
The pith
Building an exam from a document's verifiable facts creates a clear signal for whether it belongs in a RAG corpus.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
E-MIA converts verifiable hard evidence in the target document, such as fine-grained details, proper nouns, definitional statements, metadata cues, and causal or constraint relations, into an exam with four objectively gradable question types: fill-in-the-blank, short answer, multiple choice, and true/false. The aggregated exam score across multiple evidence-targeted questions serves as the membership signal.
What carries the argument
An exam constructed from four objective question types drawn from the document's hard evidence, whose total score functions as the membership indicator.
If this is right
- Member and non-member documents exhibit improved separability in score distributions under stringent evaluation conditions.
- Queries remain natural and stealthy, reducing the chance of refusal or detection compared with explicit confirmation probes.
- Attack performance varies with the composition of question types and the overall length of the exam.
- Successful inference reveals corpus coverage and the presence of sensitive topics without direct access.
Where Pith is reading between the lines
- RAG operators may need to limit how specifically responses echo document facts even when retrieval occurs.
- The same evidence-to-exam pattern could apply to membership testing in other retrieval-based or knowledge-augmented models.
- Adding controlled noise or paraphrasing to retrieved content might reduce the reliability of score-based separation.
Load-bearing premise
Verifiable hard evidence in the target document can be reliably converted into objectively gradable questions whose answer patterns differ enough between member and non-member cases to produce stable thresholds.
What would settle it
A test on a controlled RAG system where the same collection of evidence-derived questions produces statistically overlapping accuracy scores for documents known to be inside versus outside the corpus.
Figures
read the original abstract
Retrieval-Augmented Generation (RAG) equips large language models (LLMs) with external evidence by retrieving documents at inference time, but it also turns the retrieval corpusinto a sensitive asset. Under a black-box setting, an adversary given a candidate document can infer whether it has been ingested into the RAG knowledge base (i.e., document-level membership inference) solely from query response interactions, thereby leaking corpus coverage and the existence of sensitive topics. Existing RAG MIA methods either rely on soft signals such as semantic similarity, which often yield overlapping member/non-member score distributions and unstable thresholds, or employ explicit confirmation probes whose intent is conspicuous and thus prone to refusal and detection. We propose E-MIA, which converts verifiable hard evidence in the target document (e.g., fine-grained details, proper nouns/technical terms, definitional statements, metadata cues, and causal/constraint relations) into an exam with four objectively gradable question types (FB/SC/MC/T/F), and uses the aggregated exam score across multiple evidence targeted questions as the membership signal. Experiments across multiple datasets and diverse RAG configurations demonstrate that E-MIA improves member/non-member separability in stringent settings while preserving natural, stealthy queries, and we further analyze the impact of question composition and exam length on attack effectiveness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes E-MIA, a black-box membership inference attack on RAG systems. It extracts verifiable hard evidence (fine-grained details, proper nouns, definitional statements, causal relations) from a candidate document and converts it into an exam consisting of four objectively gradable question types (FB, SC, MC, T/F). The aggregated exam score from the RAG LLM's responses is used as the membership signal to distinguish whether the document is in the retrieval corpus. Experiments across multiple datasets and RAG configurations are claimed to show improved member/non-member separability relative to prior soft-signal or explicit-probe methods while preserving natural, stealthy queries; the paper also analyzes the effects of question composition and exam length.
Significance. If the central empirical claims hold with rigorous quantification, E-MIA would constitute a meaningful advance in black-box privacy attacks against RAG pipelines by replacing unstable similarity-based signals with structured, gradable probes. This would strengthen the case that retrieval corpora constitute a sensitive asset and could guide the design of query-monitoring or answer-consistency defenses.
major comments (3)
- [§3] §3 (Method), paragraph on question generation: the central claim that verifiable hard evidence can be reliably turned into questions whose answer patterns differ stably between member and non-member cases is load-bearing, yet the manuscript supplies no ablation on conversion success rate, retrieval failure modes, or cases where the base LLM answers correctly from parametric knowledge alone. Without such data the membership signal's reliability remains unverified.
- [§5] §5 (Experiments): the abstract asserts 'improved member/non-member separability' and 'stable thresholds' but the experimental reporting must include concrete metrics (AUC or separation distance with error bars), dataset sizes, number of documents per split, and statistical tests against baselines; the current description leaves the magnitude and robustness of the improvement unquantified.
- [§4.2] §4.2 (Analysis of question composition and exam length): the paper states it analyzes impact on attack effectiveness, yet no table or figure reports how performance varies with the proportion of each question type or with exam length; this omission weakens the practical takeaway on optimal exam design.
minor comments (2)
- [Abstract] Notation for the four question types (FB/SC/MC/T/F) is introduced without an explicit table mapping abbreviations to full names on first use.
- [§2] The manuscript should cite the specific prior RAG MIA works it compares against (semantic-similarity and explicit-confirmation baselines) with full references in the related-work section.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate additional analyses, metrics, and visualizations as suggested.
read point-by-point responses
-
Referee: [§3] §3 (Method), paragraph on question generation: the central claim that verifiable hard evidence can be reliably turned into questions whose answer patterns differ stably between member and non-member cases is load-bearing, yet the manuscript supplies no ablation on conversion success rate, retrieval failure modes, or cases where the base LLM answers correctly from parametric knowledge alone. Without such data the membership signal's reliability remains unverified.
Authors: We agree that an explicit ablation on question generation reliability would strengthen the central claim. In the revised version we will add a new subsection (or appendix) reporting: (i) conversion success rate measured via manual verification on a random sample of 200 generated questions across datasets, (ii) observed retrieval failure modes with their frequencies, and (iii) the fraction of cases in which the base LLM answers correctly from parametric knowledge alone (measured by comparing RAG responses against a no-retrieval baseline). These results will directly quantify the stability of the membership signal. revision: yes
-
Referee: [§5] §5 (Experiments): the abstract asserts 'improved member/non-member separability' and 'stable thresholds' but the experimental reporting must include concrete metrics (AUC or separation distance with error bars), dataset sizes, number of documents per split, and statistical tests against baselines; the current description leaves the magnitude and robustness of the improvement unquantified.
Authors: We acknowledge that the current experimental section relies on qualitative descriptions and should be augmented with precise quantitative evidence. The revision will expand §5 to report: AUC (and separation distance) with error bars over at least five random seeds, exact dataset sizes and member/non-member split counts for every experiment, and statistical significance tests (paired t-tests or Wilcoxon signed-rank tests) against all baselines. These additions will make the magnitude and robustness of the claimed improvements fully verifiable. revision: yes
-
Referee: [§4.2] §4.2 (Analysis of question composition and exam length): the paper states it analyzes impact on attack effectiveness, yet no table or figure reports how performance varies with the proportion of each question type or with exam length; this omission weakens the practical takeaway on optimal exam design.
Authors: While §4.2 contains textual discussion of these factors, we agree that the absence of summarized tabular and visual results limits practical utility. We will add (i) a table reporting AUC for different mixtures of FB/SC/MC/TF questions (e.g., 25/25/25/25, 40/20/20/20, etc.) and (ii) a line plot of AUC versus exam length (from 4 to 20 questions). These will be placed in §4.2 and will directly support recommendations for optimal exam design. revision: yes
Circularity Check
No circularity: empirical attack validated by experiments
full rationale
The paper describes a black-box attack that converts document evidence into four types of exam questions and aggregates scores as a membership signal. No equations, parameter fits, or derivations appear in the abstract or method outline. The central claim is supported by experimental results on multiple datasets and RAG configurations rather than any self-definitional reduction, fitted-input prediction, or load-bearing self-citation. The conversion step is presented as a practical heuristic whose effectiveness is measured externally, not assumed by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption RAG systems retrieve and condition generation on documents from the indexed corpus
Reference graph
Works this paper leans on
-
[1]
R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chenet al., “Palm 2 technical report,” arXiv preprint arXiv:2305.10403, 2023
work page internal anchor Pith review arXiv 2023
-
[2]
Language Models are Few-Shot Learners
T. B. Brown, “Language models are few-shot learners,”arXiv preprint arXiv:2005.14165, 2020
work page internal anchor Pith review arXiv 2005
-
[3]
Llama 2: Open Foundation and Fine-Tuned Chat Models
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Survey of hallucination in natural language generation,
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,”ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023
2023
-
[5]
Retrieval- augmented generation for knowledge-intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in NeurIPS, vol. 33, pp. 9459–9474, 2020
2020
-
[6]
Improving language models by retrieving from trillions of tokens,
S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clarket al., “Improving language models by retrieving from trillions of tokens,” in ICML, 2022, pp. 2206–2240
2022
-
[7]
Core techniques of question answering systems over knowledge bases: a survey,
D. Diefenbach, V . Lopez, K. Singh, and P. Maret, “Core techniques of question answering systems over knowledge bases: a survey,”Knowledge and Information systems, vol. 55, no. 3, pp. 529–569, 2018
2018
-
[8]
Privacy leakage vs. protection measures: the growing disconnect,
B. Krishnamurthy, K. Naryshkin, and C. Wills, “Privacy leakage vs. protection measures: the growing disconnect,” inProceedings of the Web, vol. 2, no. 2011, 2011, pp. 1–10
2011
-
[9]
Propile: Probing privacy leakage in large language models,
S. Kim, S. Yun, H. Lee, M. Gubri, S. Yoon, and S. J. Oh, “Propile: Probing privacy leakage in large language models,”Advances in NeurIPS, vol. 36, pp. 20 750–20 762, 2023
2023
-
[10]
Did the neurons read your book? document-level membership inference for large language models,
M. Meeus, S. Jain, M. Rei, and Y .-A. de Montjoye, “Did the neurons read your book? document-level membership inference for large language models,” inUSENIX, 2024, pp. 2369–2385
2024
-
[11]
Falcon: A universal text- only membership inference attack framework against in-context learning,
H. Su, Y . Qin, Z. Li, X. Miao, and Y . Zhou, “Falcon: A universal text- only membership inference attack framework against in-context learning,” TIFS, vol. 20, pp. 12 964–12 979, 2025
2025
-
[12]
Generating is believing: Membership inference attacks against retrieval-augmented generation,
Y . Li, G. Liu, C. Wang, and Y . Yang, “Generating is believing: Membership inference attacks against retrieval-augmented generation,” inICASSP, 2025, pp. 1–5
2025
-
[13]
M. Anderson, G. Amit, and A. Goldsteen, “Is my data in your retrieval database? membership inference attacks against retrieval augmented generation,” inICISSP, R. D. Pietro, K. Renaud, and P. Mori, Eds. SCITEPRESS, 2025, pp. 474–485. [Online]. Available: https://doi.org/10.5220/0013108300003899
-
[14]
Defenses to membership inference attacks: A survey,
L. Hu, A. Yan, H. Yan, J. Li, T. Huang, Y . Zhang, C. Dong, and C. Yang, “Defenses to membership inference attacks: A survey,”ACM Computing Surveys, vol. 56, no. 4, pp. 1–34, 2023
2023
-
[15]
Doc2query–: when less is more,
M. Gospodinov, S. MacAvaney, and C. Macdonald, “Doc2query–: when less is more,” inECIR, 2023, pp. 414–422
2023
-
[16]
Retrieval-Augmented Generation for Large Language Models: A Survey
Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, M. Wang, and H. Wang, “Retrieval-augmented generation for large language models: A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2312.10997
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Rag security and privacy: Formalizing the threat model and attack surface,
A. Arzanipour, R. Behnia, R. Ebrahimi, and K. Dutta, “Rag security and privacy: Formalizing the threat model and attack surface,” 2025. [Online]. Available: https://arxiv.org/abs/2509.20324
-
[18]
Membership inference attacks on machine learning: A survey,
H. Hu, Z. Salcic, L. Sun, G. Dobbie, P. S. Yu, and X. Zhang, “Membership inference attacks on machine learning: A survey,”ACM Computing Surveys (CSUR), vol. 54, no. 11s, pp. 1–37, 2022
2022
-
[19]
Label- only membership inference attacks,
C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label- only membership inference attacks,” inICML, 2021, pp. 1964–1974
2021
-
[20]
Towards label-only membership inference attack against pre-trained large language models,
Y . He, B. Li, L. Liu, Z. Ba, W. Dong, Y . Li, Z. Qin, K. Ren, and C. Chen, “Towards label-only membership inference attack against pre-trained large language models,” inUSENIX Security, 2025
2025
-
[21]
Mask-based membership inference attacks for retrieval-augmented generation,
M. Liu, S. Zhang, and C. Long, “Mask-based membership inference attacks for retrieval-augmented generation,” inProceedings of the ACM on Web Conference 2025, 2025, pp. 2894–2907
2025
-
[22]
Riddle me this! stealthy membership inference for retrieval-augmented generation,
A. Naseh, Y . Peng, A. Suri, H. Chaudhari, A. Oprea, and A. Houmansadr, “Riddle me this! stealthy membership inference for retrieval-augmented generation,” inProceedings of ACM SIGSAC, 2025, pp. 1245–1259
2025
-
[23]
Dcmi: A differential calibration membership inference attack against retrieval-augmented generation,
X. Gao, X. Meng, Y . Dong, Z. Li, and S. Guo, “Dcmi: A differential calibration membership inference attack against retrieval-augmented generation,” inProceedings of ACM SIGSAC, 2025, pp. 4184–4198
2025
-
[24]
Budgetleak: Membership inference attacks on rag systems via the generation budget side channel,
H. Li, J. He, G. Wang, D. Feng, Z. Li, and M. Zhang, “Budgetleak: Membership inference attacks on rag systems via the generation budget side channel,”arXiv preprint arXiv:2511.12043, 2025
-
[25]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” inIEEE symposium on security and privacy. IEEE, 2017, pp. 3–18
2017
-
[26]
Ragleak: Membership inference attacks on rag-based large language models,
K. Feng, G. Zhang, H. Tian, H. Xu, Y . Zhang, T. Zhu, M. Ding, and B. Liu, “Ragleak: Membership inference attacks on rag-based large language models,” inAustralasian Conference on Information Security and Privacy, 2025, pp. 147–166
2025
-
[27]
Rag-leaks: difficulty- calibrated membership inference attacks on retrieval-augmented gener- ation,
G. Wang, J. He, H. Li, M. Zhang, and D. Feng, “Rag-leaks: difficulty- calibrated membership inference attacks on retrieval-augmented gener- ation,”Science China Information Sciences, vol. 68, no. 6, p. 160102, 2025
2025
-
[28]
The art of defending: A systematic evaluation and analysis of llm defense strategies on safety and over-defensiveness,
N. Varshney, P. Dolin, A. Seth, and C. Baral, “The art of defending: A systematic evaluation and analysis of llm defense strategies on safety and over-defensiveness,” inFindings of the ACL, 2024, pp. 13 111–13 128
2024
-
[29]
Mrm: Black-box membership inference attacks against multimodal rag systems,
P. Yang, J. Yin, H. Zheng, X. Bai, H. Wang, Y . Sun, X. Li, S. Wang, Y . Huang, and T. Qi, “Mrm: Black-box membership inference attacks against multimodal rag systems,”arXiv preprint arXiv:2506.07399, 2025
-
[30]
Qa-rag: Exploring llm reliance on external knowledge,
A. Mansurova, A. Mansurova, and A. Nugumanova, “Qa-rag: Exploring llm reliance on external knowledge,”Big Data and Cognitive Computing, vol. 8, no. 9, p. 115, 2024
2024
-
[31]
BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models,
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models,” inThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [Online]. Available: https://openreview.net/forum?id=wCu6T5xFjeJ
2021
-
[32]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Gemma 2: Improving Open Language Models at a Practical Size
G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Raméet al., “Gemma 2: Improving open language models at a practical size,”arXiv preprint arXiv:2408.00118, 2024
work page internal anchor Pith review arXiv 2024
-
[34]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
A. Yang, B. Zhang, B. Hui, B. Gao, B. Yu, C. Li, D. Liu, J. Tu, J. Zhou, J. Linet al., “Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement,”arXiv preprint arXiv:2409.12122, 2024
work page internal anchor Pith review arXiv 2024
-
[35]
The llama 3 herd of models,
A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv e-prints, pp. arXiv–2407, 2024
2024
-
[36]
Retrieve anything to augment large language models
P. Zhang, S. Xiao, Z. Liu, Z. Dou, and J.-Y . Nie, “Retrieve anything to augment large language models,”arXiv preprint arXiv:2310.07554, 2023
-
[37]
Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,
W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, “Minilm: Deep self-attention distillation for task-agnostic compression of pre- trained transformers,”Advances in NeurIPS, vol. 33, pp. 5776–5788, 2020
2020
-
[38]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Linet al., “Qwen3 embedding: Advancing text embedding and reranking through foundation models,”arXiv preprint arXiv:2506.05176, 2025
work page internal anchor Pith review arXiv 2025
-
[39]
Query rewriting in retrieval-augmented large language models,
X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query rewriting in retrieval-augmented large language models,” inProceedings of EMNLP, 2023, pp. 5303–5315. 12
2023
-
[40]
Query rewriting via llms,
S. Dharwada, H. Devrani, J. Haritsa, and H. Doraiswamy, “Query rewriting via llms,” 2025. [Online]. Available: https://arxiv.org/abs/2502. 12918
2025
-
[41]
Maferw: Query rewriting with multi-aspect feedbacks for retrieval-augmented large language models,
Y . Wang, H. Zhang, L. Pang, B. Guo, H. Zheng, and Z. Zheng, “Maferw: Query rewriting with multi-aspect feedbacks for retrieval-augmented large language models,” inProceedings of the AAAI, vol. 39, no. 24, 2025, pp. 25 434–25 442
2025
-
[42]
Piguard: Prompt injection guardrail via mitigating overdefense for free,
H. Li, X. Liu, N. Zhang, and C. Xiao, “Piguard: Prompt injection guardrail via mitigating overdefense for free,” inProceedings of ACL, 2025, pp. 30 420–30 437
2025
-
[43]
Ft-llama-prompt-guard-2: Fine-tuned prompt injection and jail break detector,
A. Security, “Ft-llama-prompt-guard-2: Fine-tuned prompt injection and jail break detector,” 2024. [Online]. Available: https://huggingface.co/ Aira-security/FT-Llama-Prompt-Guard-2
2024
-
[44]
Chinesewebtext: Large-scale high-quality chinese web text extracted with effective evaluation model,
J. Chen, P. Jian, T. Xi, D. Yi, Q. Du, C. Ding, G. Zhu, C. Zong, J. Wang, and J. Zhang, “Chinesewebtext: Large-scale high-quality chinese web text extracted with effective evaluation model,”arXiv preprint arXiv:2311.01149, 2023
-
[45]
G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosenet al., “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,”arXiv preprint arXiv:2507.06261, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[46]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthramet al., “Openai gpt-5 system card,”arXiv preprint arXiv:2601.03267, 2025. Zelin Guanreceived the B.A. degree in computer science and technology from Guangzhou University, Guangzhou, China in 2025. He is currently pursuing the M.S. degree in Cyber...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
degree in Cyberspace Security at Jinan University, Guangzhou, China
He is currently pursuing the Ph.D. degree in Cyberspace Security at Jinan University, Guangzhou, China. He is a joint Ph.D. student at The Hong Kong Polytechnic University (PloyU) from December 2024 to December 2025. His research interests include Machine Learning, Recommendation Systems, and Data Mining. Zeyan Li(Student Member, IEEE) is a dual-degree st...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.