Recognition: unknown
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
Pith reviewed 2026-05-08 16:31 UTC · model grok-4.3
The pith
ABAC gating in a layered server-side architecture prevents cross-tenant data leakage in enterprise RAG and agent systems while adding negligible overhead.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that existing RAG and agent architectures conflate relevance ranking with authorization, creating leakage paths in multitenant settings. A layered isolation architecture that combines policy-aware ingestion, retrieval-time ABAC gating, and server-side agentic orchestration centralizes security-critical operations such as tool authorization and state management. This produces natural enforcement points that eliminate unauthorized cross-tenant access while preserving client-side flexibility and maintaining near-zero added latency, as shown through an open-source OpenAI-compatible implementation.
What carries the argument
Layered isolation architecture that performs policy-aware ingestion, applies ABAC gating at retrieval and tool-use time, and enforces multitenant separation through centralized server-side orchestration.
If this is right
- Multitenant enterprises can share retrieval and inference infrastructure without separate per-tenant deployments while meeting regulatory access requirements.
- Agentic workflows remain isolated across conversation turns and tool invocations because gating occurs at each server-side step.
- Client frameworks retain control over prompt construction and latency-sensitive decisions while security enforcement stays on the server.
- Vendor-neutral, open implementations of the Responses API become viable for production use under strict compliance constraints.
Where Pith is reading between the lines
- The same gating pattern could be applied to non-retrieval generation tasks by tagging prompt components and checking attributes before context assembly.
- Organizations could reduce hardware footprint by consolidating tenants on shared clusters once policy tagging pipelines are mature.
- Standardized policy metadata formats would be needed to guarantee complete tagging when data arrives from heterogeneous enterprise sources.
- Adversarial query testing that attempts to probe for hidden data through carefully crafted prompts would provide additional validation of the gating strength.
Load-bearing premise
All incoming data can be accurately and completely tagged with correct access policies during ingestion, and clients cannot bypass the server-side orchestration points that enforce those policies.
What would settle it
A controlled test in which a query issued by one tenant returns or uses documents or tool outputs belonging to a second tenant whose access attributes differ, or a workload benchmark that measures overhead substantially above the reported negligible level.
Figures
read the original abstract
Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank documents by relevance--whether through semantic similarity, keyword matching, or hybrid approaches--not by authorization, so a query from one tenant can surface another tenant's confidential data simply because it scores highest. We formalize this gap and analyze additional shortcomings--including tool-mediated disclosure, context accumulation across turns, and client-side orchestration bypass--that arise when agentic systems conflate relevance with authorization. To address these challenges, we introduce a layered isolation architecture combining policy-aware ingestion, retrieval-time gating, and shared inference, enforced through server-side agentic orchestration. This approach centralizes security-critical operations--tool execution authorization, state isolation, and policy enforcement--on the server, creating natural enforcement points for multitenant isolation while allowing client-side frameworks to retain control over agent composition and latency-sensitive operations. We validate the proposed architecture through an open-source implementation in OGX, a vendor-neutral framework that implements an OpenAI-compatible, open-source Responses API with server-side multi-turn orchestration. We evaluate it empirically and show that ABAC gating eliminates cross-tenant leakage while introducing negligible overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper identifies a fundamental mismatch in enterprise RAG and agentic systems where relevance-based retrieval can leak cross-tenant data, along with related issues such as tool-mediated disclosure and context accumulation. It proposes a layered isolation architecture with policy-aware ingestion, retrieval-time ABAC gating, and server-side orchestration to enforce multitenant security while preserving client-side flexibility. The architecture is realized in the open-source OGX framework implementing an OpenAI-compatible Responses API with server-side multi-turn orchestration, and the authors claim empirical validation that ABAC gating eliminates leakage with negligible overhead.
Significance. If the central claims hold, the work offers a practical, vendor-neutral approach to a real deployment gap in secure enterprise AI that academic RAG literature largely overlooks. The open-source OGX implementation and emphasis on server-side enforcement points constitute reproducible artifacts that could aid adoption and further research in multitenant agent security.
major comments (2)
- [Architecture description and evaluation] The claim that ABAC gating eliminates cross-tenant leakage (Abstract and architecture description) is load-bearing on the prerequisite that policy-aware ingestion correctly and completely tags every incoming document with accurate access policies. The manuscript provides no mechanism, detection procedure, or correction process for untagged or mis-tagged data from heterogeneous sources; if ingestion fails for any subset, relevance-based retrieval can still surface unauthorized content before gating is applied.
- [Evaluation section] The empirical validation (Abstract and evaluation section) asserts that ABAC gating introduces negligible overhead and eliminates leakage, yet the provided text supplies no concrete experimental setup, metrics, datasets, baseline comparisons, or quantitative results. Without these details the overhead and leakage-elimination claims cannot be assessed.
minor comments (1)
- [Abstract] The abstract states that the architecture is 'validated through an open-source implementation' but does not briefly summarize the key evaluation metrics or threat model; adding one sentence would improve readability.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript. We appreciate the points raised regarding the architecture's dependencies and the evaluation details. We respond to each major comment below.
read point-by-point responses
-
Referee: The claim that ABAC gating eliminates cross-tenant leakage (Abstract and architecture description) is load-bearing on the prerequisite that policy-aware ingestion correctly and completely tags every incoming document with accurate access policies. The manuscript provides no mechanism, detection procedure, or correction process for untagged or mis-tagged data from heterogeneous sources; if ingestion fails for any subset, relevance-based retrieval can still surface unauthorized content before gating is applied.
Authors: We agree that the effectiveness of ABAC gating presupposes accurate policy tagging during ingestion. The manuscript describes policy-aware ingestion as a core layer but does not detail error-handling for mis-tagging. This is a valid observation. In the revised manuscript, we will expand the architecture description to include a mechanism for detecting and correcting mis-tagged data, such as periodic policy audits and flagging of documents with inconsistent or missing access policies for administrator review. This addition will clarify that while the core retrieval gating prevents leakage assuming correct tags, we recognize the need for robust ingestion safeguards. revision: yes
-
Referee: The empirical validation (Abstract and evaluation section) asserts that ABAC gating introduces negligible overhead and eliminates leakage, yet the provided text supplies no concrete experimental setup, metrics, datasets, baseline comparisons, or quantitative results. Without these details the overhead and leakage-elimination claims cannot be assessed.
Authors: The referee correctly notes that the evaluation section in the current submission lacks the necessary specifics to fully substantiate the claims. We will revise the evaluation section to provide a complete description of the experimental setup, including the use of synthetic multitenant datasets with predefined access policies, metrics such as unauthorized retrieval rate (set to zero with gating) and latency overhead (measured as additional milliseconds per query), baseline comparisons against ungated vector retrieval, and quantitative results demonstrating negligible overhead (under 3% increase in query time) and complete elimination of cross-tenant leakage. revision: yes
Circularity Check
No significant circularity in architectural proposal
full rationale
The paper presents a layered isolation architecture for multitenant enterprise RAG and agentic systems, relying on policy-aware ingestion, retrieval-time ABAC gating, and server-side orchestration. No mathematical derivations, equations, fitted parameters, or prediction steps exist that could reduce to inputs by construction. Validation occurs via open-source implementation (OGX) and empirical evaluation of overhead, which is independent of any self-referential loop. The central claim does not invoke uniqueness theorems, self-citations as load-bearing premises, or ansatzes smuggled from prior work. The noted dependency on complete policy tagging during ingestion is a correctness assumption, not a circular reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Server-side orchestration points cannot be bypassed by client frameworks
- domain assumption All data can be accurately and exhaustively tagged with access policies during ingestion
Reference graph
Works this paper leans on
-
[1]
Saleema Amershi, Andrew Begel, Christian Bird, Robert De- Line, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. InProceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, Piscataway, NJ, USA...
-
[2]
Anthropic. 2024. Model Context Protocol. Online documenta- tion. https://modelcontextprotocol.io/docs/getting-started/intro Accessed: 2026-02-24
2024
-
[3]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (2016), 50–57. doi:10.1145/2890784
-
[4]
CrewAI, Inc. 2024. CrewAI: Framework for orchestrating role- playing autonomous AI agents. Open-source project. https: //github.com/crewAIInc/crewAI Accessed: 2026-02-23
2024
-
[5]
Databricks. 2025. Author an agent in code using MLflow Respons- esAgent. Online documentation. https://docs.databricks.com/ en/generative-ai/agent-framework/create-agent.html Accessed: 2026-02-24
2025
-
[6]
Databricks. 2025. Mosaic AI Agent Framework. Online doc- umentation. https://www.databricks.com/product/machine- learning/retrieval-augmented-generation Accessed: 2026-02-24
2025
-
[7]
deepset. 2023. Haystack: End-to-end LLM framework for building production-ready applications. GitHub repository. https://github. com/deepset-ai/haystack Accessed: 2026-02-24
2023
-
[8]
Alex Garcia. 2024. sqlite-vec: A vector search SQLite exten- sion. GitHub repository. https://github.com/asg017/sqlite-vec Accessed: 2026-04-25
2024
-
[9]
Google. 2025. Agent Development Kit (ADK). GitHub repository. https://github.com/google/adk-python Accessed: 2026-02-24
2025
-
[10]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-Augmented Language Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Model Pre-Training. InInternational Conference on Learning Representations (ICLR). OpenReview, Addis Ababa, Eth...
work page internal anchor Pith review arXiv 2020
-
[11]
Hugging Face. 2025. smolagents: A smol library to build great agents. GitHub repository. https://github.com/huggingface/ smolagents Accessed: 2026-02-24
2025
-
[12]
Billion-scale similarity search with GPUs
JeffJohnson,MatthijsDouze,andHervéJégou.2017. Billion-Scale Similarity Search with GPUs.arXiv preprint arXiv:1702.08734 1, 1 (2017), 1–17. https://arxiv.org/abs/1702.08734
work page Pith review arXiv 2017
-
[13]
Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, Dor Muhlgay, Noam Rozen, Erez Schwartz, Gal Shachaf, Shai Shalev-Shwartz, Amnon Shashua, and Moshe Tenenholtz. 2022. MRKL Systems: A Modular, Neuro-Symbolic Architecture that Combines Large Language Models, External Kn...
work page internal anchor Pith review arXiv 2022
-
[14]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih
-
[15]
Dense passage retrieval for open-domain question answering
Dense Passage Retrieval for Open-Domain Question Answering. InProceedings of the 2020 Conference on Em- pirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6769–6781. doi:10.18653/v1/2020.emnlp-main.550
-
[16]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lian- min Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Lan- guage Model Serving with PagedAttention. InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, USA, 611–626. doi:10.1145/ 3600006.3613165
-
[17]
LangChain, Inc. 2023. LangChain: Build context-aware reasoning applications. Open-source project. https://github.com/langchain- ai/langchain Accessed: 2026-02-23
2023
-
[18]
LangChain, Inc. 2024. LangGraph: Build resilient language agents as graphs. Open-source project. https://github.com/langchain- ai/langgraph Accessed: 2026-02-23
2024
-
[19]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InAdvances in Neural Information Pro- cessing Systems, Vol. 33. Curran Associates, Inc., Red H...
work page internal anchor Pith review arXiv 2020
-
[20]
Llama Stack Contributors. 2025. Llama Stack. GitHub repository. https://github.com/llamastack/llama-stack Accessed: 2026-01-28
2025
-
[21]
LlamaIndex. 2022. LlamaIndex: Data framework for LLM applica- tions. GitHub repository. https://github.com/run-llama/llama_ index Accessed: 2026-02-24
2022
-
[22]
Microsoft. 2023. AutoGen: A programming framework for agentic AI. GitHub repository. https://github.com/microsoft/autogen Accessed: 2026-02-24
2023
-
[23]
Microsoft. 2023. Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps. GitHub repository. https://github.com/microsoft/semantic-kernel Accessed: 2026-02- 24
2023
-
[24]
Microsoft. 2025. Microsoft Agent Framework. GitHub repository. https://github.com/microsoft/agents Accessed: 2026-02-24
2025
-
[25]
OGX Contributors. 2026. OGX Kubernetes Operator. GitHub repository. https://github.com/ogx-ai/ogx-k8s-operator For- merly Llama Stack Kubernetes Operator. Accessed: 2026-04-25
2026
-
[26]
OGX Contributors. 2026. OGX (Open GenAI Stack). GitHub repository. https://github.com/ogx-ai/ogx Formerly Llama Stack. Accessed: 2026-04-25
2026
-
[27]
Open Responses Community. 2026. Open Responses. Online resource. https://www.openresponses.org/ Accessed: 2026-02-23
2026
-
[28]
OpenAI. 2023. Function Calling. Online documentation. https: //platform.openai.com/docs/guides/function-calling Accessed: 2026-02-24
2023
-
[29]
OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card. arXiv:2508.10925 [cs.CL] https://arxiv.org/abs/2508.10925
work page internal anchor Pith review arXiv 2025
-
[30]
OpenAI. 2025. Responses API Reference. Online documenta- tion. https://platform.openai.com/docs/api-reference/responses Accessed: 2026-02-16
2025
-
[31]
Pydantic. 2024. Pydantic AI: Agent Framework / shim to use Pydantic with LLMs. GitHub repository. https://github.com/ pydantic/pydantic-ai Accessed: 2026-02-24
2024
-
[32]
Kunal Sawarkar, Abhilasha Mangal, and Shivam Raj Solanki
-
[33]
Blended RAG: Improving RAG (Retriever-Augmented Gen- eration) Accuracy with Semantic Search and Hybrid Query-Based Retrievers.arXiv preprint arXiv:2404.072201, 1 (2024), 1–12. https://arxiv.org/abs/2404.07220
-
[34]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Them- selves to Use Tools. InAdvances in Neural Information Process- ing Systems, Vol. 36. Curran Associates, Inc., New Orleans, LA, USA, 1–25. https://arxiv.org/abs/2302.04761
work page internal anchor Pith review arXiv 2023
-
[35]
Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean- François Crespo, and Dan Dennison
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean- François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. InAdvances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc., Red Hook, NY, USA, 2503–2511. https://papers.nips.cc/pa...
2015
-
[36]
Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, et al. 2026. Agents of Chaos.arXiv preprint arXiv:2602.200211, 1 (2026), 1–25. https://arxiv.org/abs/2602.20021
work page internal anchor Pith review arXiv 2026
-
[37]
Vectara. 2023. Vectara: Enterprise Agent and RAG Platform. Online. https://vectara.com/ Accessed: 2026-02-24
2023
-
[38]
Weaviate. 2019. Weaviate: Cloud-native vector database with structured filtering. GitHub repository. https://github.com/ weaviate/weaviate Accessed: 2026-02-24
2019
-
[39]
Simon Willison. 2022. Prompt injection attacks against GPT-
2022
-
[40]
https://simonwillison.net/2022/Sep/12/prompt- injection/ Accessed: 2026-02-24
Blog post. https://simonwillison.net/2022/Sep/12/prompt- injection/ Accessed: 2026-02-24
2022
-
[41]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). OpenReview, Kigali, Rwanda, 1–18. https://arxiv.org/abs/2210.03629
work page internal anchor Pith review arXiv 2023
-
[42]
Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving Sys- tem for Transformer-Based Generative Models. In16th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI). USENIX Association, Carlsbad, CA, USA, 521–538. https://www.usenix.org/conference/osdi22/presentation/yu
2022
-
[43]
Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar
-
[44]
IEEE Data Engineering Bulletin41, 4 (2018), 39–45
Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Engineering Bulletin41, 4 (2018), 39–45. https:// people.eecs.berkeley.edu/~matei/papers/2018/ieee_mlflow.pdf
2018
-
[45]
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kober, Liang Shi, Chien-Sheng Wu, Hao Zhang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, and Wei-Lin Ma. 2024. SGLang: Efficient Execution of Structured Language Model Programs. InAdvances in Neural Information Processing Systems, Vol. 37. Curran Associat...
work page internal anchor Pith review arXiv 2024
-
[46]
Zilliz. 2019. Milvus: A cloud-native vector database. GitHub repository. https://github.com/milvus-io/milvus Accessed: 2026- 02-24. ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Francisco Javier Arceo and Varsha Prasad Narsing A Detailed Evaluation Tables Config Orch. / Retr. p50 p99 Mean A Client / Ungated 3,600ms 10,818ms 4,208ms B Client / Gated 3,4...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.