arxiv: 2604.15233 · v1 · submitted 2026-04-16 · 💻 cs.AI · cs.DB

Recognition: unknown

Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

Moin Aminnaseri , Farima Fatahi Bayat , Nikita Bhutani , Jean-Flavien Bussotti , Kevin Chan , Rafael Li Chen , Yanlin Feng , Jackson Hassell

show 12 more authors

Estevam Hruschka Eser Kandogan Hannah Kim James Levine Seiji Maekawa Jalal Mahmud Kushan Mitra Naoki Otani Pouya Pezeshkpour Nima Shahbazi Chen Shen Dan Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:47 UTC · model grok-4.3

classification 💻 cs.AI cs.DB

keywords data intelligence layermulti-source datamulti-modal queriesNL2SQLagentic processingdata plannerscompound AIquery decomposition

0 comments

The pith

DIL unifies enterprise data, LLM knowledge, and user context to answer natural language queries that span multiple sources and modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Data Intelligence Layer as a way to handle real-world data queries that go beyond what single databases and NL2SQL can do. Users often need information from enterprise records, external knowledge, and personal interactions, and queries can be iterative or involve different data types. DIL addresses this by using agents to plan and execute retrieval and reasoning across these sources. A sympathetic reader would care because it could make data access more natural and complete for complex enterprise needs.

Core claim

At the core of DIL is a data registry that stores metadata for diverse data sources and modalities. DIL treats LLMs, the Web, and the User as source 'databases' each with their own query interface. Data planners transform user queries into executable query plans that unify relational operators with other operators spanning multiple modalities, supporting decomposition, retrieval, reasoning, and integration.

What carries the argument

Data planners that convert natural language queries into declarative plans for multi-source, multi-modal execution.

Load-bearing premise

Data planners can reliably decompose complex requests, retrieve from heterogeneous sources, and integrate results across modalities without substantial errors or additional human intervention.

What would settle it

A test case involving an iterative query that requires combining enterprise database results, LLM commonsense knowledge, and user-specific context where the system fails to produce accurate integrated output.

Figures

Figures reproduced from arXiv: 2604.15233 by Chen Shen, Dan Zhang, Eser Kandogan, Estevam Hruschka, Farima Fatahi Bayat, Hannah Kim, Jackson Hassell, Jalal Mahmud, James Levine, Jean-Flavien Bussotti, Kevin Chan, Kushan Mitra, Moin Aminnaseri, Naoki Otani, Nikita Bhutani, Nima Shahbazi, Pouya Pezeshkpour, Rafael Li Chen, Seiji Maekawa, Yanlin Feng.

**Figure 2.** Figure 2: An example query over multiple data sources [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Intuitiveness of data abstractions A.2.4 Developer Experience. The development experience was generally rated as moderate , with a mean score of approximately 3.1 on a 1–5 scale (1: "very poor", 5: "excellent"). Debugging was the most challenging aspect, with an average score of 2.3, reflecting frequent issues such as poor error messages, complex setups, and difficulties tracing errors across multi-agent … view at source ↗

read the original abstract

NL2SQL systems aim to address the growing need for natural language interaction with data. However, real-world information rarely maps to a single SQL query because (1) users express queries iteratively (2) questions often span multiple data sources beyond the closed-world assumption of a single database, and (3) queries frequently rely on commonsense or external knowledge. Consequently, satisfying realistic data needs require integrating heterogeneous sources, modalities, and contextual data. In this paper, we present Blue's Data Intelligence Layer (DIL) designed to support multi-source, multi-modal, and data-centric applications. Blue is a compound AI system that orchestrates agents and data for enterprise settings. DIL serves as the data intelligence layer for agentic data processing, to bridge the semantic gap between user intent and available information by unifying structured enterprise data, world knowledge accessible through LLMs, and personal context obtained through interaction. At the core of DIL is a data registry that stores metadata for diverse data sources and modalities to enable both native and natural language queries. DIL treats LLMs, the Web, and the User as source 'databases', each with their own query interface, elevating them to first-class data sources. DIL relies on data planners to transform user queries into executable query plans. These plans are declarative abstractions that unify relational operators with other operators spanning multiple modalities. DIL planners support decomposition of complex requests into subqueries, retrieval from diverse sources, and finally reasoning and integration to produce final results. We demonstrate DIL through two interactive scenarios in which user queries dynamically trigger multi-source retrieval, cross-modal reasoning, and result synthesis, illustrating how compound AI systems can move beyond single database NL2SQL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DIL gives a clean architecture for treating LLMs and the web as queryable sources alongside enterprise data, but the planners' reliability is asserted rather than shown.

read the letter

The paper describes Blue's Data Intelligence Layer (DIL) for handling natural language queries over multiple data sources including structured enterprise data, LLMs for world knowledge, and user context. It treats these as first-class 'databases' with a registry and planners that generate declarative plans mixing relational and other operators. This framing is new in how it unifies the sources for decomposition, retrieval, and integration in agentic workflows. The two scenarios show practical flows for complex, iterative questions that standard NL2SQL cannot handle. The work is clear on the motivation and the high-level design. It points to a real barrier in current systems and offers a structured way to move past single-database assumptions. The limitation is that the planners' ability to reliably break down requests and synthesize results across modalities is not demonstrated. The paper gives no metrics, error analysis, or implementation specifics, so the effectiveness remains an open question. This paper is for researchers and practitioners focused on compound AI systems and extending data query interfaces to multi-modal, multi-source settings. Readers working on agent architectures or enterprise data tools could find the registry and planner concepts worth considering. It has a solid problem statement and architecture to merit peer review, even though it is mostly descriptive at this stage. I would recommend sending it for review so the authors can add the necessary evaluation to back up the claims.

Referee Report

2 major / 2 minor

Summary. The paper presents Blue's Data Intelligence Layer (DIL) as a component of a compound AI system for enterprise data-centric applications. It argues that NL2SQL systems fall short for iterative, multi-source queries requiring external knowledge, and proposes DIL to bridge the semantic gap by unifying structured enterprise data with LLM-accessible world knowledge and user context. Key elements include a data registry storing metadata for heterogeneous sources and modalities, treating LLMs, the Web, and users as first-class queryable sources, and data planners that convert natural language queries into declarative plans unifying relational and multi-modal operators. These planners enable decomposition into subqueries, retrieval, reasoning, and result integration. The approach is illustrated with two interactive scenarios demonstrating dynamic multi-source retrieval and cross-modal synthesis.

Significance. If the data planners prove reliable, DIL could advance compound AI systems by providing an architectural framework for agentic data processing that integrates diverse sources beyond closed-world databases. The elevation of LLMs, web, and user interactions to queryable entities offers a conceptual contribution to multi-modal data access, potentially enabling more flexible natural language interactions in enterprise settings. The manuscript supplies a coherent high-level vision and illustrative examples, which clarify the intended unification of operators, though empirical validation would be needed to realize this significance.

major comments (2)

[Abstract and data planners section] Abstract and data planners description: The central claim that DIL bridges the semantic gap by unifying sources and enabling reliable decomposition/retrieval/integration rests on the data planners' capabilities. However, the manuscript gives only a high-level overview of transforming queries into declarative plans without specifying the planning algorithm, how relational and multi-modal operators are unified, or mechanisms for error handling in cross-source reasoning. This directly impacts the weakest assumption (planner reliability across heterogeneous sources) and leaves the claim unevaluable.
[Interactive scenarios section] Interactive scenarios section: The two scenarios are offered as demonstrations of multi-source retrieval, cross-modal reasoning, and result synthesis. Yet they contain no quantitative metrics (e.g., success rates, accuracy, latency), error analysis, ablation studies, or baselines, making it impossible to assess whether the planners support realistic queries without substantial human intervention. This is load-bearing for the paper's positioning of DIL as a practical solution beyond NL2SQL.

minor comments (2)

The title references streaming data and agents, but the abstract and core description emphasize query planning and scenarios without detailing streaming mechanisms or agent orchestration; clarify this aspect for consistency.
The manuscript would benefit from additional references to prior work on multi-modal agents, data integration frameworks, and NL2SQL extensions to better contextualize the novelty of treating LLMs/Web/User as first-class sources.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the intended scope of our work as a high-level architectural vision for the Data Intelligence Layer (DIL) within compound AI systems. We address each major comment below, focusing on the conceptual contributions while acknowledging areas where additional clarification can strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and data planners section] Abstract and data planners description: The central claim that DIL bridges the semantic gap by unifying sources and enabling reliable decomposition/retrieval/integration rests on the data planners' capabilities. However, the manuscript gives only a high-level overview of transforming queries into declarative plans without specifying the planning algorithm, how relational and multi-modal operators are unified, or mechanisms for error handling in cross-source reasoning. This directly impacts the weakest assumption (planner reliability across heterogeneous sources) and leaves the claim unevaluable.

Authors: We agree that the manuscript presents the data planners at a conceptual level without a specific algorithm, detailed unification mechanics, or error-handling protocols. This reflects the paper's focus on an architectural framework that elevates LLMs, the Web, and users as first-class sources via the data registry, with declarative plans serving as abstractions to unify relational and multi-modal operators. The planners enable decomposition, retrieval, and integration as described, but concrete planning algorithms and reliability mechanisms remain implementation details for future work. We will revise the data planners section to include expanded examples of declarative plan structures, operator unification through the registry metadata, and a discussion of high-level error-handling strategies (e.g., via agentic reasoning loops), making the conceptual claims more concrete without claiming empirical reliability. revision: partial
Referee: [Interactive scenarios section] Interactive scenarios section: The two scenarios are offered as demonstrations of multi-source retrieval, cross-modal reasoning, and result synthesis. Yet they contain no quantitative metrics (e.g., success rates, accuracy, latency), error analysis, ablation studies, or baselines, making it impossible to assess whether the planners support realistic queries without substantial human intervention. This is load-bearing for the paper's positioning of DIL as a practical solution beyond NL2SQL.

Authors: The scenarios function as illustrative demonstrations of DIL's dynamic capabilities in multi-source, multi-modal settings, consistent with the paper's positioning as a vision for agentic data processing rather than a performance evaluation. No quantitative metrics, error analyses, or baselines are included because the work does not claim or evaluate a deployed planner implementation. We acknowledge this limits direct assessment of practicality and human intervention needs. In revision, we will add an explicit limitations subsection and a forward-looking discussion of planned empirical studies (including metrics and baselines) to better contextualize the examples and address the concern about positioning DIL as a practical advance over NL2SQL. revision: partial

Circularity Check

0 steps flagged

No circularity: purely descriptive system architecture with no derivations or self-referential claims

full rationale

The paper provides an architectural description of the Data Intelligence Layer (DIL), its data registry, treatment of LLMs/Web/User as first-class sources, and data planners for query decomposition and multi-modal integration. It contains no equations, fitted parameters, predictions, uniqueness theorems, or derivation chains of any kind. Claims about bridging semantic gaps and supporting agentic processing are presented as design goals illustrated by two scenarios, without any reduction to inputs by construction, self-citation load-bearing premises, or renamed empirical patterns. The system is self-contained as a high-level overview and does not invoke external results that collapse back into its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The proposal introduces several new system entities and relies on domain assumptions about LLM capabilities without independent evidence or formal verification.

axioms (2)

domain assumption LLMs can serve as reliable sources of world knowledge and commonsense for query answering
The system elevates LLMs to first-class data sources for external knowledge.
domain assumption User interaction history provides usable personal context for query personalization
Personal context is obtained through interaction and treated as a data source.

invented entities (3)

Data Intelligence Layer (DIL) no independent evidence
purpose: Orchestrates agents and data sources for multi-source multi-modal queries
Core new system introduced to bridge semantic gaps.
Data registry no independent evidence
purpose: Stores metadata for diverse data sources and modalities to enable native and natural language queries
Central component enabling unified access.
Data planners no independent evidence
purpose: Transform user queries into executable plans that unify relational and multi-modal operators
Key mechanism for decomposition, retrieval, and integration.

pith-pipeline@v0.9.0 · 5697 in / 1500 out tokens · 67594 ms · 2026-05-10T10:47:11.678195+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 18 canonical work pages

[1]

Ashwin Alaparthi, Paul Loh, and Ryan Marcus. 2025. ScaleLLM: A Technique for Scalable LLM-augmented Data Systems. InCompanion of the 2025 International Conference on Management of Data(Berlin, Germany)(SIGMOD/PODS ’25). As- sociation for Computing Machinery, New York, NY, USA, 11–14. doi:10.1145/ 3722212.3725130

work page arXiv 2025
[2]

Muhammad Imam Luthfi Balaka, David Alexander, Qiming Wang, Yue Gong, Adila Krisnadhi, and Raul Castro Fernandez. 2025. Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System.Proc. ACM Manag. Data3, 3, Article 200 (June 2025), 28 pages. doi:10.1145/3725337

work page doi:10.1145/3725337 2025
[3]

Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, and James Zou. 2024. Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems. arXiv:2403.02419 [cs.LG] https: //arxiv.org/abs/2403.02419

work page arXiv 2024
[4]

Chuxuan Hu, Maxwell Yang, James Weiland, Yeji Lim, Suhas Palawala, and Daniel Kang. 2025. Drama: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries.Proc. ACM Manag. Data3, 6, Article 316 (Dec. 2025), 28 pages. doi:10.1145/3769781

work page doi:10.1145/3769781 2025
[5]

Eser Kandogan, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Sairam Gurajada, and Estevam Hruschka. 2025. Orchestrating Agents and Data for Enterprise: A Blueprint Architecture for Compound AI. In2025 IEEE 41st International Confer- ence on Data Engineering Workshops (ICDEW). 18–27. doi:10.1109/ICDEW67478. 2025.00007

work page doi:10.1109/icdew67478 2025
[6]

Rohit Khoja, Devanshu Gupta, Yanjie Fu, Dan Roth, and Vivek Gupta. 2025. Weaver: Interweaving SQL and LLM for Table Reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, Ch...

2025
[7]

Lee, Justin Chan, Michael Fu, Nicolas Kim, Akshay Mehta, Deepti Raghavan, and Uğur Çetintemel

Alexander W. Lee, Justin Chan, Michael Fu, Nicolas Kim, Akshay Mehta, Deepti Raghavan, and Uğur Çetintemel. 2025. Semantic Integrity Constraints: Declara- tive Guardrails for AI-Augmented Data Processing Systems.Proceedings of the VLDB Endowment18, 11 (July 2025), 4073–4080. doi:10.14778/3749646.3749677

work page doi:10.14778/3749646.3749677 2025
[8]

Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready?Proceedings of the VLDB Endowment17, 11 (July 2024), 3318–3331. doi:10.14778/3681954.3682003

work page doi:10.14778/3681954.3682003 2024
[9]

Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, and Gerardo Vitagliano. 2025. Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing. InProceedings of the Conference on Innovative Database Research (CIDR)(2025)

2025
[10]

Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano. 2024. A Declarative System for Optimizing AI Workloads. arXiv:2405.14696 [cs.CL] https://arxiv.org/abs/2405.14696

work page arXiv 2024
[11]

Yuyu Luo, Guoliang Li, Ju Fan, Chengliang Chai, and Nan Tang. 2025. Natural Language to SQL: State of the Art and Open Problems.Proc. VLDB Endow.18, 12 (Aug. 2025), 5466–5471. doi:10.14778/3750601.3750696

work page doi:10.14778/3750601.3750696 2025
[12]

Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2025. Semantic Operators: A Declarative Model for Rich, AI-based Data Processing. arXiv:2407.11418 [cs.DB] https://arxiv.org/abs/ 2407.11418

work page arXiv 2025
[13]

Hasso Plattner. 2009. A common database approach for OLTP and OLAP us- ing an in-memory column database. InProceedings of the 2009 ACM SIGMOD International Conference on Management of Data(Providence, Rhode Island, USA) (SIGMOD ’09). Association for Computing Machinery, New York, NY, USA, 1–2. doi:10.1145/1559845.1559846

work page doi:10.1145/1559845.1559846 2009
[14]

Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, and Michael Cafarella. 2025. Abacus: A Cost-Based Optimizer for Semantic Operator Systems. arXiv:2505.14661 [cs.DB] https://arxiv.org/abs/ 2505.14661

work page arXiv 2025
[15]

Jocelyn Shen, Nicolai Marquardt, Hugo Romat, Ken Hinckley, Nathalie Riche, and Fanny Chevalier

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing. arXiv:2410.12189 [cs.DB] https://arxiv.org/abs/2410.12189

work page arXiv 2025
[16]

Shreya Shankar, Sepanta Zeighami, and Aditya Parameswaran. 2026. Task Cas- cades for Efficient Unstructured Data Processing. arXiv:2601.05536 [cs.DB] https://arxiv.org/abs/2601.05536

work page arXiv 2026
[17]

Michael Stonebraker and Andrew Pavlo. 2024. What Goes Around Comes Around... And Around...SIGMOD Rec.53, 2 (July 2024), 21–37. doi:10.1145/ 3685980.3685984

work page arXiv 2024
[18]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2019. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv:1809.08887 [cs.CL] https://arxiv.org/abs/1809.08887

work page Pith review arXiv 2019
[19]

Matei Zaharia, Ion Stoica, Jerry Li, Peter Liu, et al. 2024. The Shift from Models to Compound AI Systems. Berkeley Artificial Intelligence Research (BAIR) Blog. https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/

2024
[20]

Sepanta Zeighami, Yiming Lin, Shreya Shankar, and Aditya Parameswaran. 2025. LLM-Powered Proactive Data Systems. arXiv:2502.13016 [cs.DB] https://arxiv. org/abs/2502.13016

work page arXiv 2025
[21]

not at all, 5:

Junhao Zhu, Lu Chen, Xiangyu Ke, Ziquan Fang, Tianyi Li, Yunjun Gao, and Christian S. Jensen. 2025. Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization. arXiv:2511.19830 [cs.DB] https: //arxiv.org/abs/2511.19830 A Hackathon Developer Experience Survey A.1 Study We conducted a study to investigate the developer experi...

work page arXiv 2025