Recognition: unknown
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks
Pith reviewed 2026-05-10 00:14 UTC · model grok-4.3
The pith
A Sentinel-Strategist setup detects risky queries in RAG systems and activates only the defenses each query needs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Sentinel-Strategist architecture detects anomalous retrieval behavior, after which the Strategist selectively deploys only the defenses required by query context. This eliminates MBA-style membership inference leakage, reduces data-poisoning attack success to near zero in the best cases, and restores contextual recall to more than 75 percent of the undefended baseline while avoiding the utility cost of static defense stacks.
What carries the argument
The Sentinel component, which identifies anomalous retrieval behavior, together with the Strategist component, which selects and applies only context-appropriate defenses.
If this is right
- RAG systems can maintain high retrieval utility while blocking specific attack vectors instead of accepting a blanket 40 percent recall penalty.
- Defense activation becomes query-dependent rather than fixed, so only warranted protections are used for each input.
- Membership inference leakage can be eliminated without sacrificing most of the original retrieval quality.
- Data-poisoning success can be driven near zero while still recovering the majority of undefended contextual recall.
Where Pith is reading between the lines
- The same detection-plus-selective-activation pattern could be applied to other generative systems that combine retrieval with private data.
- Improving the Sentinel's accuracy on edge-case queries would further narrow the remaining gap to full undefended performance.
- Model choice for the Strategist remains a practical tuning knob that can be adjusted per deployment domain.
Load-bearing premise
The Sentinel must detect anomalous retrieval behavior accurately across varied real-world queries without missing attacks or triggering defenses that still degrade utility.
What would settle it
A test set of real-world queries in which either a data-poisoning attack succeeds or contextual recall falls below 60 percent of the undefended baseline because of missed detections or over-triggering of defenses.
Figures
read the original abstract
Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private, domain-specific knowledge. This capability introduces significant security risks, including membership inference, data poisoning, and unintended content leakage. A straightforward mitigation is to enable all relevant defenses simultaneously, but doing so incurs a substantial utility cost. In our experiments, an always-on defense stack reduces contextual recall by more than 40%, indicating that retrieval degradation is the primary failure mode. To mitigate this trade-off in RAG systems, we propose the Sentinel-Strategist architecture, a context-aware framework for risk analysis and defense selection. A Sentinel detects anomalous retrieval behavior, after which a Strategist selectively deploys only the defenses warranted by the query context. Evaluated across three benchmark datasets and five orchestration models, ADO is shown to eliminate MBA-style membership inference leakage while substantially recovering retrieval utility relative to a fully static defense stack, approaching undefended baseline levels. Under data poisoning, the strongest ADO variants reduce attack success to near zero while restoring contextual recall to more than 75% of the undefended baseline, although robustness remains sensitive to model choice. Overall, these findings show that adaptive, query-aware defense can substantially reduce the security-utility trade-off in RAG systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Sentinel-Strategist (ADO) architecture for RAG systems, in which a Sentinel component detects anomalous retrieval behavior and a Strategist selectively activates only the warranted defenses against multi-vector attacks such as MBA-style membership inference and data poisoning. It claims that, across three benchmark datasets and five orchestration models, ADO eliminates membership-inference leakage, reduces poisoning attack success to near zero, and restores contextual recall to more than 75% of the undefended baseline, while a static always-on defense stack reduces recall by more than 40%.
Significance. If the empirical results can be reproduced with full methodological transparency, the work would be significant for secure RAG deployment in sensitive domains: it directly quantifies the utility penalty of static defenses and demonstrates that query-aware, selective orchestration can materially shrink the security-utility trade-off. The identification of retrieval degradation as the dominant failure mode of static stacks is a useful observation.
major comments (2)
- Abstract and evaluation summary: the headline claims (elimination of MBA leakage, near-zero poisoning success, >75% recall recovery) rest on the Sentinel's ability to classify queries correctly, yet no precision, recall, F1, or confusion-matrix statistics for the Sentinel detector are supplied on any of the three benchmarks. Without these intermediate metrics it is impossible to determine whether the reported gains arise from accurate selective defense or from an unverified detection oracle.
- Evaluation section (implicit in the abstract's quantitative statements): no attack implementations, baseline defense configurations, error bars, statistical significance tests, or hyper-parameter details for the five orchestration models are described. The absence of these elements renders the central quantitative outcomes unverifiable from the provided text.
Simulated Author's Rebuttal
Thank you for the referee's constructive and balanced review. We appreciate the recognition of the potential significance of the Sentinel-Strategist architecture for reducing the security-utility trade-off in RAG systems. We address each major comment below and will make the requested revisions to improve methodological transparency.
read point-by-point responses
-
Referee: [—] Abstract and evaluation summary: the headline claims (elimination of MBA leakage, near-zero poisoning success, >75% recall recovery) rest on the Sentinel's ability to classify queries correctly, yet no precision, recall, F1, or confusion-matrix statistics for the Sentinel detector are supplied on any of the three benchmarks. Without these intermediate metrics it is impossible to determine whether the reported gains arise from accurate selective defense or from an unverified detection oracle.
Authors: We agree that explicit performance metrics for the Sentinel detector are necessary to substantiate the source of the reported gains. In the revised manuscript we will add a new subsection (and associated table) in the evaluation section that reports precision, recall, F1 scores, and confusion matrices for the Sentinel on all three benchmarks. This will allow readers to assess detection reliability and confirm that the utility recovery stems from accurate, context-aware orchestration rather than an assumed oracle. revision: yes
-
Referee: [—] Evaluation section (implicit in the abstract's quantitative statements): no attack implementations, baseline defense configurations, error bars, statistical significance tests, or hyper-parameter details for the five orchestration models are described. The absence of these elements renders the central quantitative outcomes unverifiable from the provided text.
Authors: We acknowledge that the current manuscript omits several elements required for full reproducibility and verification. We will expand the evaluation section to include: complete descriptions and references for the MBA-style membership-inference and data-poisoning attack implementations; explicit specifications of all baseline defense configurations; error bars together with statistical significance tests (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values); and detailed hyper-parameter settings, training procedures, and model-selection criteria for the five orchestration models. These additions will make the quantitative claims directly verifiable. revision: yes
Circularity Check
No circularity; purely empirical evaluation
full rationale
The manuscript proposes the Sentinel-Strategist architecture and reports end-to-end experimental results on three benchmarks against baselines. No equations, derivations, fitted parameters, or self-citations appear as load-bearing steps in the provided text. All claims reduce to observed performance differences rather than any definitional loop, renamed ansatz, or prediction that is forced by construction from the inputs.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Sentinel-Strategist architecture
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cognitive mirage: A review of hallucinations in large language models
Hongbin Ye, Tong Liu, Aijia Zhang, Wei Hua, and Weiqiang Jia. Cognitive mirage: A review of hallucinations in large language models. In Ningyu Zhang, Tianxing Wu, Meng Wang, Guilin Qi, Haofen Wang, and Huajun Chen, editors,Proceedings of the First International OpenKG Workshop: Large Knowledge-Enhanced Models co-located with The International Joint Confer...
2024
-
[2]
S iren ' s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, and Shuming Shi. Siren’s song in the ai ocean: A survey on hallucination in large language models.Computational Linguistics, 51(4):1373–1418, 12 2025. ISSN 0891-2017. doi: 10.1162/COLI.a.16. URLh...
-
[3]
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages =
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, pages 6491–6501, New York, NY , USA, 2024. Association for Computing Machiner...
-
[4]
Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
2020
-
[5]
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational conference on machine learning, pages 2206–2240. PMLR, 2022
2022
-
[6]
Rag security and privacy: Formalizing the threat model and attack surface,
Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, and Kaushik Dutta. Rag security and privacy: Formalizing the threat model and attack surface, 2025. URLhttps://arxiv.org/abs/2509.20324
-
[7]
Maya Anderson, Guy Amit, and Abigail Goldsteen. Is my data in your retrieval database? membership inference attacks against retrieval augmented generation. InProceedings of the 11th International Conference on Information 18 APREPRINT- APRIL24, 2026 Systems Security and Privacy, pages 474–485. SCITEPRESS - Science and Technology Publications, 2025. doi: 1...
-
[8]
Mask-based membership inference attacks for retrieval-augmented generation
Mingrui Liu, Sixiao Zhang, and Cheng Long. Mask-based membership inference attacks for retrieval-augmented generation. InProceedings of the ACM on Web Conference 2025, WWW ’25, pages 2894–2907, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400712746. doi: 10.1145/3696410.3714771. URL https://doi.org/10.1145/3696410.3714771
-
[9]
{PoisonedRAG}: Knowledge corruption attacks to {Retrieval-Augmented} generation of large language models
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. {PoisonedRAG}: Knowledge corruption attacks to {Retrieval-Augmented} generation of large language models. In34th USENIX Security Symposium (USENIX Security 25), pages 3827–3844, 2025
2025
-
[10]
Xing, Sham M
Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, and Himabindu Lakkaraju. Follow my instruction and spill the beans: Scalable data extraction from retrieval-augmented generation systems. InICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models, 2024. URL https://openreview.net/ forum?id=el5wbHYKeS
2024
-
[11]
Ragleak: Membership inference attacks on rag-based large language models
Kaiyue Feng, Guangsheng Zhang, Huan Tian, Heng Xu, Yanjun Zhang, Tianqing Zhu, Ming Ding, and Bo Liu. Ragleak: Membership inference attacks on rag-based large language models. In Willy Susilo and Josef Pieprzyk, editors,Information Security and Privacy, pages 147–166, Singapore, 2025. Springer Nature Singapore. ISBN 978-981-96-9101-2
2025
-
[12]
Sma: Who said that? auditing membership leakage in semi-black-box rag controlling, 2025
Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, and Xiaochun Cao. Sma: Who said that? auditing membership leakage in semi-black-box rag controlling, 2025. URL https://arxiv.org/abs/2508. 09105
2025
-
[13]
Reliabilityrag: Effective and provably robust defense for rag-based web-search
Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, and Aleksandra Korolova. ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search.arXiv e-prints, art. arXiv:2509.23519, September 2025. doi: 10.48550/arXiv.2509.23519
-
[14]
SafeRAG: Benchmarking security in retrieval-augmented generation of large language model
Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Zhaoxin Fan, Bo Tang, Jihao Zhao, Jiawei Yang, Shichao Song, and Mengwei Wang. SafeRAG: Benchmarking security in retrieval-augmented generation of large language model. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annua...
-
[15]
Nicolas Grislain. Rag with differential privacy. In2025 IEEE Conference on Artificial Intelligence (CAI), pages 847–852, 2025. doi: 10.1109/CAI64502.2025.00150
-
[16]
TrustRAG: Enhancing robustness and trustworthiness in retrieval-augmented generation,
Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li, Zhaoyang Wang, Hamed Haddadi, and Emine Yilmaz. Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation, 2025. URL https://arxiv.org/abs/2501.00879
-
[17]
arXiv preprint arXiv:2506.04390 (2025)
Sarthak Choudhary, Nils Palumbo, Ashish Hooda, Krishnamurthy Dj Dvijotham, and Somesh Jha. Through the stealth lens: Rethinking attacks and defenses in rag, 2025. URLhttps://arxiv.org/abs/2506.04390
-
[18]
Scott Rose, Oliver Borchert, Stu Mitchell, and Sean Connelly. Zero trust architecture. NIST Special Publication 800-207, National Institute of Standards and Technology, 2020. URL https://doi.org/10.6028/NIST.SP. 800-207
-
[19]
In: Merlo, P., Tiedemann, J., Tsarfaty, R
Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. Retrieval augmentation reduces hallucination in conversation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic, November 2021. Asso...
-
[20]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2(1), 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
arXiv preprint arXiv:1911.00172 , year=
Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through memorization: Nearest neighbor language models.arXiv preprint arXiv:1911.00172, 2019
-
[22]
Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023. doi: 10.1162/tacl_a_00605. URLhttps://aclanthology.org/2023.tacl-1.75/. 19 APREPRINT- APRIL24, 2026
-
[23]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization, 2025. URLhttps://arxiv.org/abs/2404.16130
work page internal anchor Pith review arXiv 2025
-
[24]
Knowledge Graph Retrieval-Augmented Generation for LLM -based Recommendation
Shijie Wang, Wenqi Fan, Yue Feng, Lin Shanru, Xinyu Ma, Shuaiqiang Wang, and Dawei Yin. Knowledge graph retrieval-augmented generation for LLM-based recommendation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long ...
-
[25]
Hongsheng Hu, Zoran Salcic, Lichao Sun, Gillian Dobbie, Philip S. Yu, and Xuyun Zhang. Membership inference attacks on machine learning: A survey.ACM Comput. Surv., 54(11s), September 2022. ISSN 0360-0300. doi: 10.1145/3523273. URLhttps://doi.org/10.1145/3523273
-
[26]
Membership Inference Attacks Against Machine Learning Models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE Symposium on Security and Privacy (SP), pages 3–18, 2017. doi: 10.1109/SP.2017.41
-
[27]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018
2018
-
[28]
Extracting training data from large language models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX security symposium (USENIX Security 21), pages 2633–2650, 2021
2021
-
[29]
Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023
-
[30]
Practical poisoning attacks against retrieval-augmented generation,
Baolei Zhang, Yuxi Chen, Zhuqing Liu, Lihai Nie, Tong Li, Zheli Liu, and Minghong Fang. Practical poisoning attacks against retrieval-augmented generation.arXiv preprint arXiv:2504.03957, 2025
-
[31]
Differential privacy
Cynthia Dwork. Differential privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, ed- itors,Automata, Languages and Programming, pages 1–12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-35908-1
2006
-
[32]
Controlnet: A firewall for rag-based llm system.arXiv preprint arXiv:2504.09593,
Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, and Zhan Qin. Controlnet: A firewall for rag-based llm system, 2025. URLhttps://arxiv.org/abs/2504.09593
-
[33]
Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
2023
-
[34]
https://aclanthology.org/ Q19-1026/
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research.Transact...
-
[35]
Simplified data wrangling with ir_datasets
Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Arman Cohan, and Nazli Goharian. Simplified data wrangling with ir_datasets. InSIGIR, 2021
2021
-
[36]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, November 2020. Association for Computational Linguistics. do...
-
[37]
Pubmedqa: A dataset for biomedical research question answering
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. Pubmedqa: A dataset for biomedical research question answering. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, 2019
2019
-
[38]
T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Regina Barzilay and Min-Yen Kan, editors,Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada, July 2017. A...
-
[39]
Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. Minilm: deep self-attention distillation for task-agnostic compression of pre-trained transformers. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546
2020
-
[40]
deepeval: The open-source llm evaluation framework
Confident AI. deepeval: The open-source llm evaluation framework. https://github.com/confident-ai/ deepeval, 2026. Version 3.7.8, accessed March 2026. A Open Science The core contributions of this paper depend on the following artifacts: (i) the RAG pipeline implementation, (ii) Sentinel and Strategist prompts together with the control registry, (iii) att...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.