Recognition: unknown
An Alternate Agentic AI Architecture (It's About the Data)
Pith reviewed 2026-05-08 13:16 UTC · model grok-4.3
The pith
Enterprise AI needs explicit query algebras and source wrappers to solve data integration instead of relying on LLM agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Enterprises do not suffer from a reasoning deficit but from a data integration problem. By reintroducing explicit query structure, wrapper-based mediation, and cost-based optimization, the architecture obtains the breadth of agentic search while preserving traceability, determinism, and trust in enterprise environments.
What carries the argument
AQL (Agentic Query Language), a small explicit query algebra using Find, From, and Where operators that decomposes questions into structured plans executed through source-specific wrappers enforcing access control, schema alignment, and result normalization.
If this is right
- Complex questions decompose into structured, auditable query plans instead of hidden chains of LLM calls.
- All intermediate results stay visible and inspectable at every step.
- Cost-based optimization techniques from databases can select efficient execution paths across sources.
- Access controls and governance policies are enforced uniformly by the wrappers.
Where Pith is reading between the lines
- This design could let enterprises reuse existing database query optimizers and monitoring tools for AI workloads.
- It suggests that many reliability issues in current agentic systems trace to data handling rather than model scale.
- Direct performance comparisons on standardized enterprise query sets would test whether the algebra loses expressiveness relative to unrestricted tool calling.
Load-bearing premise
A small explicit query algebra executed through source-specific wrappers can decompose and resolve complex enterprise questions at the scale and flexibility of current LLM agent systems without loss of capability or introduction of new integration overhead.
What would settle it
A benchmark test in which RUBICON cannot express or resolve a multi-source enterprise question that requires dynamic source selection beyond its fixed algebra, while an LLM-based agent succeeds.
Figures
read the original abstract
For the last several years, the dominant narrative in "agentic AI" has been that large language models should orchestrate information access by dynamically selecting tools, issuing sub-queries, and synthesizing results. We argue this approach is misguided: enterprises do not suffer from a reasoning deficit, but from a data integration problem. Enterprises are data-centric: critical information is scattered across heterogeneous systems (e.g., databases, documents, and external services), each with its own query language, schema, access controls, and performance constraints. In contrast, contemporary LLM-based architectures are optimized for reasoning over unstructured text and treat enterprise systems as either corpora or external tools invoked by a black-box component. This creates a mismatch between schema-rich, governed, performance-critical data systems and text-centric, probabilistic LLM architectures, leading to limited transparency, weak correctness guarantees, and unpredictable performance. In this paper, we present RUBICON, an alternative architecture grounded in data management principles. Instead of delegating orchestration to an opaque agent, we introduce AQL (Agentic Query Language), a small, explicit query algebra - Find, From, and Where - executed through source-specific wrappers that enforce access control, schema alignment, and result normalization. All intermediate results are visible and inspectable. Complex questions are decomposed into structured, auditable query plans rather than hidden chains of LLM calls. Our thesis is simple: enterprise AI is not a prompt engineering problem; it is a systems problem. By reintroducing explicit query structure, wrapper-based mediation, and cost-based optimization, we obtain the breadth of agentic search while preserving traceability, determinism, and trust in enterprise environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that enterprise AI challenges arise from data integration problems across heterogeneous systems rather than LLM reasoning deficits. It proposes the RUBICON architecture, which uses AQL—a minimal explicit query algebra consisting of Find, From, and Where operators—executed via source-specific wrappers to enforce access controls, schema alignment, and result normalization, thereby achieving agentic breadth with improved transparency, determinism, and auditability.
Significance. If the proposed architecture can be shown to scale without capability loss, it would represent a meaningful contribution to database systems and enterprise AI by reapplying established principles of query mediation and optimization to agentic workflows. The emphasis on inspectable query plans over opaque LLM chains aligns with core DB strengths in traceability and cost-based planning, but the conceptual nature without any empirical grounding or examples limits its current significance to a position statement.
major comments (2)
- [Abstract] The central thesis that AQL plus wrappers can decompose complex enterprise questions while preserving the breadth of agentic search is load-bearing, yet the manuscript provides no worked examples of such decomposition, no argument for the algebra's completeness with respect to typical tasks (e.g., multi-source joins, conditional aggregation, or dynamic access control), and no description of how query planning itself is performed without reintroducing hidden orchestration. This appears in the description of RUBICON and AQL.
- The claim of 'cost-based optimization' and 'determinism' is asserted without any formalization of the optimizer, cost model, or execution semantics for the wrapper layer, making it impossible to evaluate whether the architecture actually avoids the unpredictability it criticizes in LLM agents.
minor comments (2)
- [Abstract] The abstract and description introduce RUBICON and AQL without defining their scope or relation to existing mediator/wrapper architectures in the database literature (e.g., no citations to classic data integration work).
- Terminology such as 'agentic search' and 'breadth of agentic search' is used without precise definition, which reduces clarity when contrasting with LLM-based approaches.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We agree that the manuscript is conceptual in nature and would benefit from additional examples and formal details to better support its claims. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] The central thesis that AQL plus wrappers can decompose complex enterprise questions while preserving the breadth of agentic search is load-bearing, yet the manuscript provides no worked examples of such decomposition, no argument for the algebra's completeness with respect to typical tasks (e.g., multi-source joins, conditional aggregation, or dynamic access control), and no description of how query planning itself is performed without reintroducing hidden orchestration. This appears in the description of RUBICON and AQL.
Authors: We acknowledge the absence of concrete examples and formal arguments in the current draft. In the revised manuscript we will add a dedicated section with worked examples of query decomposition, including a multi-source compliance query that combines relational joins, document retrieval, and conditional filtering. We will also provide an argument for AQL completeness by showing how the three operators can be composed to express multi-source joins (via multiple From clauses with correlation), conditional aggregation (via Where predicates that invoke source-native aggregate functions), and dynamic access control (enforced inside each wrapper before result normalization). Regarding query planning, the architecture explicitly separates a deterministic planner—which uses source metadata and a cost model to produce an AQL expression—from any LLM involvement; the LLM is used only for initial natural-language to AQL translation when needed, after which all orchestration is explicit and inspectable. We will clarify this separation in the revision. revision: yes
-
Referee: The claim of 'cost-based optimization' and 'determinism' is asserted without any formalization of the optimizer, cost model, or execution semantics for the wrapper layer, making it impossible to evaluate whether the architecture actually avoids the unpredictability it criticizes in LLM agents.
Authors: This observation is correct for the present position paper. The manuscript currently states these properties at a high level without formal definitions. In the revision we will add a preliminary formalization of wrapper execution semantics (including result normalization and access-control enforcement) and sketch a basic cost model that accounts for source-specific latency, data volume, and transfer costs. A full optimizer implementation and empirical demonstration of determinism, however, lie beyond the scope of this conceptual work and are noted as future research. revision: partial
Circularity Check
No circularity; conceptual proposal without derivations or self-referential reductions
full rationale
The paper presents RUBICON as an architectural alternative grounded in data management principles, introducing AQL (Find/From/Where) executed via wrappers to address integration issues in enterprise settings. No equations, fitted parameters, predictions, or derivation chains exist that could reduce to inputs by construction. The argument contrasts LLM agent practices with explicit query structure and mediation without invoking self-citations in a load-bearing manner or smuggling ansatzes. The central thesis is a design proposal relying on established database concepts (e.g., wrappers, cost-based optimization) rather than any self-referential loop, rendering the presentation self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Enterprises primarily face data integration challenges rather than deficits in AI reasoning capabilities.
invented entities (2)
-
RUBICON
no independent evidence
-
AQL
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task.arXiv preprint arXiv:1809.08887, 2018
work page Pith review arXiv 2018
-
[2]
Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls.Advances in Neural Information Processing Systems, 36, 2024
Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls.Advances in Neural Information Processing Systems, 36, 2024
2024
-
[3]
Benchpress: A human-in-the-loop annotation system for rapid text-to-sql benchmark curation
Fabian Wenz, Omar Bouattour, Devin Yang, Justin Choi, Cecil Gregg, Nesime Tatbul, and Çagatay Demiralp. Benchpress: A human-in-the-loop annotation system for rapid text-to-sql benchmark curation. In16th Conference on Innovative Data Systems Research, CIDR 2026, Chaminade, CA, USA, January 18-21, 2026. www.cidrdb.org, 2026
2026
-
[4]
Cafarella, Çagatay Demiralp, and Michael Stonebraker
Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael J. Cafarella, Çagatay Demiralp, and Michael Stonebraker. BEA VER: an enterprise benchmark for text-to-sql.CoRR, abs/2409.02038, 2024
work page internal anchor Pith review arXiv 2024
-
[5]
Microsoft 365 copilot: Reinventing productivity with ai
Microsoft. Microsoft 365 copilot: Reinventing productivity with ai. https://www.microsoft.com/en-us/ microsoft-365/copilot, 2023. Accessed: 2026-02-25
2023
-
[6]
Glean: Work ai for the enterprise
Glean Technologies. Glean: Work ai for the enterprise. https://www.glean.com, 2023. Accessed: 2026-02-25
2023
-
[7]
Openclaw: Open source agent framework
OpenCLAW Contributors. Openclaw: Open source agent framework. https://github.com/openclaw, 2024. Project documentation
2024
-
[8]
Data integration: A theoretical perspective
Maurizio Lenzerini. Data integration: A theoretical perspective. In Lucian Popa, Serge Abiteboul, and Phokion G. Kolaitis, editors,Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, Madison, Wisconsin, USA, pages 233–246. ACM, 2002
2002
-
[9]
Magesh Jayapandian and H. V . Jagadish. Automated creation of a forms-based database query interface.Proc. VLDB Endow., 1(1):695–709, 2008
2008
-
[10]
Moshé M. Zloof. Query-by-example: the invocation and definition of tables and forms. In Douglas S. Kerr, editor, Proceedings of the International Conference on Very Large Data Bases, September 22-24, 1975, Framingham, Massachusetts, USA, pages 1–24. ACM, 1975
1975
-
[11]
You only look once: Unified, real-time object detection, 2016
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection, 2016
2016
-
[12]
Term-weighting approaches in automatic text retrieval.Inf
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval.Inf. Process. Manage., 24(5):513–523, August 1988
1988
-
[13]
Efficient estimation of word representations in vector space
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Yoshua Bengio and Yann LeCun, editors,1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013
2013
-
[14]
Gmail.https://workspace.google.com/products/gmail/, 2024
Google. Gmail.https://workspace.google.com/products/gmail/, 2024. Accessed: 2026-02-27
2024
-
[15]
Gpt-5 mini
OpenAI. Gpt-5 mini. https://platform.openai.com/docs/models, 2025. Model documentation, Accessed: 2026-02-27
2025
-
[16]
Gemini 3 flash (preview)
Google DeepMind. Gemini 3 flash (preview). https://ai.google.dev/, 2025. Preview model documentation, Accessed: 2026-02-27. 14 APREPRINT- APRIL24, 2026
2025
-
[17]
Claude sonnet 4.6
Anthropic. Claude sonnet 4.6. https://www.anthropic.com/claude, 2025. Model documentation, Accessed: 2026-02-27
2025
-
[18]
Narasimhan, and Yuan Cao
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023
2023
-
[19]
Langchain
Harrison Chase. Langchain. https://github.com/langchain-ai/langchain, 2022. Accessed: 2026-02-27
2022
-
[20]
React agent documentation
LangChain. React agent documentation. https://python.langchain.com/docs/modules/agents/agent_ types/react, 2024. Accessed: 2026-02-27
2024
-
[21]
The genai divide: State of ai in business 2025
Aditya Challapally, Chris Pease, Ramesh Raskar, Pradyumna Chari, and MIT NANDA. The genai divide: State of ai in business 2025. Technical report, Project NANDA, Massachusetts Institute of Technology, July 2025. 15
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.