pith. machine review for the scientific record. sign in

arxiv: 2605.05287 · v1 · submitted 2026-05-06 · 💻 cs.CR · cs.AI· cs.IR· cs.SE

Recognition: unknown

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:31 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.IRcs.SE
keywords multitenant securityRAG access controlABAC gatingenterprise agent isolationpolicy-aware retrievalserver-side orchestrationcross-tenant leakagetool authorization
0
0 comments X

The pith

ABAC gating in a layered server-side architecture prevents cross-tenant data leakage in enterprise RAG and agent systems while adding negligible overhead.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Enterprise deployments of retrieval-augmented generation and agentic AI must handle multiple tenants with private data, strict access rules, and shared infrastructure costs. Current systems retrieve documents or invoke tools based on relevance scores, which allows a query from one tenant to surface another tenant's confidential information. The paper formalizes this authorization gap along with related risks from tool calls, multi-turn context, and client-side bypasses. It introduces a server-centered design that tags data with policies on ingestion, applies attribute-based checks at retrieval and tool execution time, and centralizes orchestration to create enforceable isolation points. This keeps client frameworks in control of composition while moving security decisions to the shared backend, enabling compliant multitenant operation without dedicated instances per tenant.

Core claim

The paper claims that existing RAG and agent architectures conflate relevance ranking with authorization, creating leakage paths in multitenant settings. A layered isolation architecture that combines policy-aware ingestion, retrieval-time ABAC gating, and server-side agentic orchestration centralizes security-critical operations such as tool authorization and state management. This produces natural enforcement points that eliminate unauthorized cross-tenant access while preserving client-side flexibility and maintaining near-zero added latency, as shown through an open-source OpenAI-compatible implementation.

What carries the argument

Layered isolation architecture that performs policy-aware ingestion, applies ABAC gating at retrieval and tool-use time, and enforces multitenant separation through centralized server-side orchestration.

If this is right

  • Multitenant enterprises can share retrieval and inference infrastructure without separate per-tenant deployments while meeting regulatory access requirements.
  • Agentic workflows remain isolated across conversation turns and tool invocations because gating occurs at each server-side step.
  • Client frameworks retain control over prompt construction and latency-sensitive decisions while security enforcement stays on the server.
  • Vendor-neutral, open implementations of the Responses API become viable for production use under strict compliance constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gating pattern could be applied to non-retrieval generation tasks by tagging prompt components and checking attributes before context assembly.
  • Organizations could reduce hardware footprint by consolidating tenants on shared clusters once policy tagging pipelines are mature.
  • Standardized policy metadata formats would be needed to guarantee complete tagging when data arrives from heterogeneous enterprise sources.
  • Adversarial query testing that attempts to probe for hidden data through carefully crafted prompts would provide additional validation of the gating strength.

Load-bearing premise

All incoming data can be accurately and completely tagged with correct access policies during ingestion, and clients cannot bypass the server-side orchestration points that enforce those policies.

What would settle it

A controlled test in which a query issued by one tenant returns or uses documents or tool outputs belonging to a second tenant whose access attributes differ, or a workload benchmark that measures overhead substantially above the reported negligible level.

Figures

Figures reproduced from arXiv: 2605.05287 by Francisco Javier Arceo, Varsha Prasad Narsing.

Figure 2
Figure 2. Figure 2: Server-side orchestration flow: every step runs inside view at source ↗
Figure 1
Figure 1. Figure 1: Layered isolation architecture with server-side or view at source ↗
Figure 3
Figure 3. Figure 3: OGX architecture for multitenant enterprise agentic view at source ↗
Figure 4
Figure 4. Figure 4: Empirical evaluation results across five dimensions. (a) Security: ABAC gating eliminates cross-tenant leakage (CTLR view at source ↗
Figure 5
Figure 5. Figure 5: Responses API surface area and its relation to other APIs, providers, and tools. The Responses API serves as the view at source ↗
read the original abstract

Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank documents by relevance--whether through semantic similarity, keyword matching, or hybrid approaches--not by authorization, so a query from one tenant can surface another tenant's confidential data simply because it scores highest. We formalize this gap and analyze additional shortcomings--including tool-mediated disclosure, context accumulation across turns, and client-side orchestration bypass--that arise when agentic systems conflate relevance with authorization. To address these challenges, we introduce a layered isolation architecture combining policy-aware ingestion, retrieval-time gating, and shared inference, enforced through server-side agentic orchestration. This approach centralizes security-critical operations--tool execution authorization, state isolation, and policy enforcement--on the server, creating natural enforcement points for multitenant isolation while allowing client-side frameworks to retain control over agent composition and latency-sensitive operations. We validate the proposed architecture through an open-source implementation in OGX, a vendor-neutral framework that implements an OpenAI-compatible, open-source Responses API with server-side multi-turn orchestration. We evaluate it empirically and show that ABAC gating eliminates cross-tenant leakage while introducing negligible overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper identifies a fundamental mismatch in enterprise RAG and agentic systems where relevance-based retrieval can leak cross-tenant data, along with related issues such as tool-mediated disclosure and context accumulation. It proposes a layered isolation architecture with policy-aware ingestion, retrieval-time ABAC gating, and server-side orchestration to enforce multitenant security while preserving client-side flexibility. The architecture is realized in the open-source OGX framework implementing an OpenAI-compatible Responses API with server-side multi-turn orchestration, and the authors claim empirical validation that ABAC gating eliminates leakage with negligible overhead.

Significance. If the central claims hold, the work offers a practical, vendor-neutral approach to a real deployment gap in secure enterprise AI that academic RAG literature largely overlooks. The open-source OGX implementation and emphasis on server-side enforcement points constitute reproducible artifacts that could aid adoption and further research in multitenant agent security.

major comments (2)
  1. [Architecture description and evaluation] The claim that ABAC gating eliminates cross-tenant leakage (Abstract and architecture description) is load-bearing on the prerequisite that policy-aware ingestion correctly and completely tags every incoming document with accurate access policies. The manuscript provides no mechanism, detection procedure, or correction process for untagged or mis-tagged data from heterogeneous sources; if ingestion fails for any subset, relevance-based retrieval can still surface unauthorized content before gating is applied.
  2. [Evaluation section] The empirical validation (Abstract and evaluation section) asserts that ABAC gating introduces negligible overhead and eliminates leakage, yet the provided text supplies no concrete experimental setup, metrics, datasets, baseline comparisons, or quantitative results. Without these details the overhead and leakage-elimination claims cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract states that the architecture is 'validated through an open-source implementation' but does not briefly summarize the key evaluation metrics or threat model; adding one sentence would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for reviewing our manuscript. We appreciate the points raised regarding the architecture's dependencies and the evaluation details. We respond to each major comment below.

read point-by-point responses
  1. Referee: The claim that ABAC gating eliminates cross-tenant leakage (Abstract and architecture description) is load-bearing on the prerequisite that policy-aware ingestion correctly and completely tags every incoming document with accurate access policies. The manuscript provides no mechanism, detection procedure, or correction process for untagged or mis-tagged data from heterogeneous sources; if ingestion fails for any subset, relevance-based retrieval can still surface unauthorized content before gating is applied.

    Authors: We agree that the effectiveness of ABAC gating presupposes accurate policy tagging during ingestion. The manuscript describes policy-aware ingestion as a core layer but does not detail error-handling for mis-tagging. This is a valid observation. In the revised manuscript, we will expand the architecture description to include a mechanism for detecting and correcting mis-tagged data, such as periodic policy audits and flagging of documents with inconsistent or missing access policies for administrator review. This addition will clarify that while the core retrieval gating prevents leakage assuming correct tags, we recognize the need for robust ingestion safeguards. revision: yes

  2. Referee: The empirical validation (Abstract and evaluation section) asserts that ABAC gating introduces negligible overhead and eliminates leakage, yet the provided text supplies no concrete experimental setup, metrics, datasets, baseline comparisons, or quantitative results. Without these details the overhead and leakage-elimination claims cannot be assessed.

    Authors: The referee correctly notes that the evaluation section in the current submission lacks the necessary specifics to fully substantiate the claims. We will revise the evaluation section to provide a complete description of the experimental setup, including the use of synthetic multitenant datasets with predefined access policies, metrics such as unauthorized retrieval rate (set to zero with gating) and latency overhead (measured as additional milliseconds per query), baseline comparisons against ungated vector retrieval, and quantitative results demonstrating negligible overhead (under 3% increase in query time) and complete elimination of cross-tenant leakage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in architectural proposal

full rationale

The paper presents a layered isolation architecture for multitenant enterprise RAG and agentic systems, relying on policy-aware ingestion, retrieval-time ABAC gating, and server-side orchestration. No mathematical derivations, equations, fitted parameters, or prediction steps exist that could reduce to inputs by construction. Validation occurs via open-source implementation (OGX) and empirical evaluation of overhead, which is independent of any self-referential loop. The central claim does not invoke uniqueness theorems, self-citations as load-bearing premises, or ansatzes smuggled from prior work. The noted dependency on complete policy tagging during ingestion is a correctness assumption, not a circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The architecture depends on two domain assumptions that are not independently evidenced in the abstract: trusted server enforcement and complete policy tagging at ingestion time.

axioms (2)
  • domain assumption Server-side orchestration points cannot be bypassed by client frameworks
    Central claim requires that security-critical operations remain under server control.
  • domain assumption All data can be accurately and exhaustively tagged with access policies during ingestion
    Policy-aware ingestion is a prerequisite for correct gating.

pith-pipeline@v0.9.0 · 5572 in / 1219 out tokens · 72306 ms · 2026-05-08T16:31:07.048430+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 14 canonical work pages · 8 internal anchors

  1. [1]

    Saleema Amershi, Andrew Begel, Christian Bird, Robert De- Line, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. InProceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, Piscataway, NJ, USA...

  2. [2]

    Anthropic. 2024. Model Context Protocol. Online documenta- tion. https://modelcontextprotocol.io/docs/getting-started/intro Accessed: 2026-02-24

  3. [3]

    Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (2016), 50–57. doi:10.1145/2890784

  4. [4]

    CrewAI, Inc. 2024. CrewAI: Framework for orchestrating role- playing autonomous AI agents. Open-source project. https: //github.com/crewAIInc/crewAI Accessed: 2026-02-23

  5. [5]

    Databricks. 2025. Author an agent in code using MLflow Respons- esAgent. Online documentation. https://docs.databricks.com/ en/generative-ai/agent-framework/create-agent.html Accessed: 2026-02-24

  6. [6]

    Databricks. 2025. Mosaic AI Agent Framework. Online doc- umentation. https://www.databricks.com/product/machine- learning/retrieval-augmented-generation Accessed: 2026-02-24

  7. [7]

    deepset. 2023. Haystack: End-to-end LLM framework for building production-ready applications. GitHub repository. https://github. com/deepset-ai/haystack Accessed: 2026-02-24

  8. [8]

    Alex Garcia. 2024. sqlite-vec: A vector search SQLite exten- sion. GitHub repository. https://github.com/asg017/sqlite-vec Accessed: 2026-04-25

  9. [9]

    Google. 2025. Agent Development Kit (ADK). GitHub repository. https://github.com/google/adk-python Accessed: 2026-02-24

  10. [10]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-Augmented Language Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Model Pre-Training. InInternational Conference on Learning Representations (ICLR). OpenReview, Addis Ababa, Eth...

  11. [11]

    Hugging Face. 2025. smolagents: A smol library to build great agents. GitHub repository. https://github.com/huggingface/ smolagents Accessed: 2026-02-24

  12. [12]

    Billion-scale similarity search with GPUs

    JeffJohnson,MatthijsDouze,andHervéJégou.2017. Billion-Scale Similarity Search with GPUs.arXiv preprint arXiv:1702.08734 1, 1 (2017), 1–17. https://arxiv.org/abs/1702.08734

  13. [13]

    Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, Dor Muhlgay, Noam Rozen, Erez Schwartz, Gal Shachaf, Shai Shalev-Shwartz, Amnon Shashua, and Moshe Tenenholtz. 2022. MRKL Systems: A Modular, Neuro-Symbolic Architecture that Combines Large Language Models, External Kn...

  14. [14]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih

  15. [15]

    Dense passage retrieval for open-domain question answering

    Dense Passage Retrieval for Open-Domain Question Answering. InProceedings of the 2020 Conference on Em- pirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6769–6781. doi:10.18653/v1/2020.emnlp-main.550

  16. [16]

    Gonzalez, Hao Zhang, and Ion Stoica

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lian- min Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Lan- guage Model Serving with PagedAttention. InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, USA, 611–626. doi:10.1145/ 3600006.3613165

  17. [17]

    LangChain, Inc. 2023. LangChain: Build context-aware reasoning applications. Open-source project. https://github.com/langchain- ai/langchain Accessed: 2026-02-23

  18. [18]

    LangChain, Inc. 2024. LangGraph: Build resilient language agents as graphs. Open-source project. https://github.com/langchain- ai/langgraph Accessed: 2026-02-23

  19. [19]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge- Intensive NLP Tasks. InAdvances in Neural Information Pro- cessing Systems, Vol. 33. Curran Associates, Inc., Red H...

  20. [20]

    Llama Stack Contributors. 2025. Llama Stack. GitHub repository. https://github.com/llamastack/llama-stack Accessed: 2026-01-28

  21. [21]

    LlamaIndex. 2022. LlamaIndex: Data framework for LLM applica- tions. GitHub repository. https://github.com/run-llama/llama_ index Accessed: 2026-02-24

  22. [22]

    Microsoft. 2023. AutoGen: A programming framework for agentic AI. GitHub repository. https://github.com/microsoft/autogen Accessed: 2026-02-24

  23. [23]

    Microsoft. 2023. Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps. GitHub repository. https://github.com/microsoft/semantic-kernel Accessed: 2026-02- 24

  24. [24]

    Microsoft. 2025. Microsoft Agent Framework. GitHub repository. https://github.com/microsoft/agents Accessed: 2026-02-24

  25. [25]

    OGX Contributors. 2026. OGX Kubernetes Operator. GitHub repository. https://github.com/ogx-ai/ogx-k8s-operator For- merly Llama Stack Kubernetes Operator. Accessed: 2026-04-25

  26. [26]

    OGX Contributors. 2026. OGX (Open GenAI Stack). GitHub repository. https://github.com/ogx-ai/ogx Formerly Llama Stack. Accessed: 2026-04-25

  27. [27]

    Open Responses Community. 2026. Open Responses. Online resource. https://www.openresponses.org/ Accessed: 2026-02-23

  28. [28]

    OpenAI. 2023. Function Calling. Online documentation. https: //platform.openai.com/docs/guides/function-calling Accessed: 2026-02-24

  29. [29]

    OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card. arXiv:2508.10925 [cs.CL] https://arxiv.org/abs/2508.10925

  30. [30]

    OpenAI. 2025. Responses API Reference. Online documenta- tion. https://platform.openai.com/docs/api-reference/responses Accessed: 2026-02-16

  31. [31]

    Pydantic. 2024. Pydantic AI: Agent Framework / shim to use Pydantic with LLMs. GitHub repository. https://github.com/ pydantic/pydantic-ai Accessed: 2026-02-24

  32. [32]

    Kunal Sawarkar, Abhilasha Mangal, and Shivam Raj Solanki

  33. [33]

    Blended rag: Improving rag (retriever- augmented generation) accuracy with semantic search and hybrid query-based retrievers,

    Blended RAG: Improving RAG (Retriever-Augmented Gen- eration) Accuracy with Semantic Search and Hybrid Query-Based Retrievers.arXiv preprint arXiv:2404.072201, 1 (2024), 1–12. https://arxiv.org/abs/2404.07220

  34. [34]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Them- selves to Use Tools. InAdvances in Neural Information Process- ing Systems, Vol. 36. Curran Associates, Inc., New Orleans, LA, USA, 1–25. https://arxiv.org/abs/2302.04761

  35. [35]

    Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean- François Crespo, and Dan Dennison

    D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean- François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. InAdvances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc., Red Hook, NY, USA, 2503–2511. https://papers.nips.cc/pa...

  36. [36]

    Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, et al. 2026. Agents of Chaos.arXiv preprint arXiv:2602.200211, 1 (2026), 1–25. https://arxiv.org/abs/2602.20021

  37. [37]

    Vectara. 2023. Vectara: Enterprise Agent and RAG Platform. Online. https://vectara.com/ Accessed: 2026-02-24

  38. [38]

    Weaviate. 2019. Weaviate: Cloud-native vector database with structured filtering. GitHub repository. https://github.com/ weaviate/weaviate Accessed: 2026-02-24

  39. [39]

    Simon Willison. 2022. Prompt injection attacks against GPT-

  40. [40]

    https://simonwillison.net/2022/Sep/12/prompt- injection/ Accessed: 2026-02-24

    Blog post. https://simonwillison.net/2022/Sep/12/prompt- injection/ Accessed: 2026-02-24

  41. [41]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). OpenReview, Kigali, Rwanda, 1–18. https://arxiv.org/abs/2210.03629

  42. [42]

    Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving Sys- tem for Transformer-Based Generative Models. In16th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI). USENIX Association, Carlsbad, CA, USA, 521–538. https://www.usenix.org/conference/osdi22/presentation/yu

  43. [43]

    Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar

  44. [44]

    IEEE Data Engineering Bulletin41, 4 (2018), 39–45

    Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Engineering Bulletin41, 4 (2018), 39–45. https:// people.eecs.berkeley.edu/~matei/papers/2018/ieee_mlflow.pdf

  45. [45]

    SGLang: Efficient Execution of Structured Language Model Programs

    Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kober, Liang Shi, Chien-Sheng Wu, Hao Zhang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, and Wei-Lin Ma. 2024. SGLang: Efficient Execution of Structured Language Model Programs. InAdvances in Neural Information Processing Systems, Vol. 37. Curran Associat...

  46. [46]

    Zilliz. 2019. Milvus: A cloud-native vector database. GitHub repository. https://github.com/milvus-io/milvus Accessed: 2026- 02-24. ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Francisco Javier Arceo and Varsha Prasad Narsing A Detailed Evaluation Tables Config Orch. / Retr. p50 p99 Mean A Client / Ungated 3,600ms 10,818ms 4,208ms B Client / Gated 3,4...