pith. sign in

arxiv: 2601.08687 · v2 · submitted 2026-01-13 · 💻 cs.ET

Data Product MCP: Chat with your Enterprise Data

Pith reviewed 2026-05-16 15:28 UTC · model grok-4.3

classification 💻 cs.ET
keywords data governanceAI agentsdata productsModel Context Protocolenterprise dataquery validationdata marketplaceaccess control
0
0 comments X

The pith

Data Product MCP allows AI agents to discover, request, and query enterprise data products while enforcing data contracts in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Data Product MCP integrated into a data product marketplace to enable AI agents to interact with enterprise data. The system supports semantic discovery of data products using business context and automates access control through AI validation of queries against approved purposes. It enforces contracts in real time by blocking unauthorized queries before they execute on cloud platforms such as Snowflake. Expert feedback from sixteen data governance specialists indicates the approach reduces technical barriers for tasks like customer analytics without compromising governance standards.

Core claim

The central claim is that Data Product MCP, built on the Model Context Protocol, links an AI-driven marketplace with cloud data platforms to let agents find, request, and query data products. Semantic discovery works from business context, AI-driven checks validate queries against approved business purposes, and contracts are enforced by preventing unauthorized queries from running, all while preserving governance levels in enterprise settings.

What carries the argument

The Model Context Protocol (MCP) that connects the data marketplace to AI agents and enables real-time query validation and contract enforcement.

If this is right

  • AI agents can autonomously complete data discovery and querying tasks without technical expertise or manual interventions.
  • Real-time AI checks ensure only approved queries run on distributed cloud systems.
  • Semantic search improves the relevance of discovered data products for business needs.
  • Governance standards remain intact as contracts are enforced before execution.
  • The system supports practical enterprise scenarios such as customer analytics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could allow broader adoption of agentic AI in regulated industries by minimizing approval bottlenecks.
  • If the validation proves reliable, similar protocols might apply to other data access domains beyond enterprises.
  • Integration with additional platforms could further reduce data access friction.
  • The method addresses a gap in maintaining governance during AI-driven data workflows.

Load-bearing premise

AI-driven checks can reliably validate generated queries against approved business purposes in real time without significant errors that block valid access or permit unauthorized queries.

What would settle it

Running the system on a set of enterprise queries and finding that the AI validation frequently allows unauthorized access or blocks legitimate ones would disprove the central claim.

Figures

Figures reproduced from arXiv: 2601.08687 by Filippo Scaramuzza, Linus W. Dietz, Marco Tonnarelli, Simon Harrer.

Figure 1
Figure 1. Figure 1: System architecture for AI-driven data product access and governance. The user formulates a business query, for which [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Identifying Top Customers with the Human in the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Preventing Misuse of Customer Data at Query Time [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Computational data governance aims to make the enforcement of governance policies and legal obligations more efficient and reliable. Recent advances in natural language processing and agentic AI offer ways to improve how organizations share and use data. But many barriers remain. Today's tools require technical skills and multiple roles to discover, request, and query data. Automating data access using enterprise AI agents is limited by the means to discover and autonomously access distributed data. Current solutions either compromise governance or break agentic workflows through manual approvals. To close this gap, we introduce Data Product MCP integrated in a data product marketplace. This data marketplace, already in use at large enterprises, enables AI agents to find, request, and query enterprise data products while enforcing data contracts in real time without lowering governance standards. The system is built on the Model Context Protocol (MCP) and links the AI-driven marketplace with cloud platforms such as Snowflake, Databricks, and Google Cloud Platform. It supports semantic discovery of data products based on business context, automates access control by validating generated queries against approved business purposes using AI-driven checks, and enforces contracts in real time by blocking unauthorized queries before they run. We assessed the system with feedback from n=16 experts in data governance. Our qualitative evaluation demonstrates effectiveness through enterprise scenarios such as customer analytics. The findings suggest that Data Product MCP reduces the technical burden for data analysis without weakening governance, filling a key gap in enterprise AI adoption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Data Product MCP, a system built on the Model Context Protocol (MCP) and integrated into an existing enterprise data product marketplace. It enables AI agents to semantically discover, request, and query distributed data products across platforms such as Snowflake, Databricks, and Google Cloud, while using AI-driven checks to validate generated queries against approved business purposes and enforce data contracts in real time. The central claim is that this automation preserves governance standards without requiring manual approvals or technical expertise. Evaluation consists of qualitative feedback from n=16 data governance experts on enterprise scenarios such as customer analytics.

Significance. If the AI query validation component can be shown to operate with high reliability, the system would address a practical barrier in enterprise AI adoption by allowing agentic workflows to access governed data without compromising compliance. The integration with production marketplaces and cloud platforms is a concrete engineering contribution that could be adopted by large organizations already using similar infrastructure.

major comments (2)
  1. [Evaluation section] Evaluation section: the claim that the system enforces contracts 'without lowering governance standards' rests entirely on qualitative expert feedback from n=16 participants. No precision, recall, false-positive/false-negative rates, or adversarial test results are reported for the AI-driven query validation against business purposes, making it impossible to assess whether unauthorized queries are reliably blocked or valid ones are incorrectly rejected.
  2. [Abstract and system description] Abstract and §3 (system description): the description of real-time enforcement states that queries are blocked 'before they run,' but no details are given on latency, failure modes, or how the MCP integration interacts with the underlying query engines (Snowflake, Databricks, etc.) when validation fails.
minor comments (2)
  1. [Introduction] The term 'Data Product MCP' is introduced without a clear definition of how it differs from or extends the base Model Context Protocol; a short comparison table or diagram would improve clarity.
  2. [System architecture] The manuscript would benefit from explicit pseudocode or a sequence diagram showing the end-to-end flow of an agent request, validation step, and contract enforcement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the evaluation and system description. We address each point below, indicating revisions where we have incorporated the suggestions or provided additional clarification.

read point-by-point responses
  1. Referee: [Evaluation section] Evaluation section: the claim that the system enforces contracts 'without lowering governance standards' rests entirely on qualitative expert feedback from n=16 participants. No precision, recall, false-positive/false-negative rates, or adversarial test results are reported for the AI-driven query validation against business purposes, making it impossible to assess whether unauthorized queries are reliably blocked or valid ones are incorrectly rejected.

    Authors: We agree that quantitative metrics such as precision, recall, and adversarial testing would strengthen the assessment of the AI query validation component. Our evaluation was designed as a qualitative study with 16 data governance experts to capture practical insights from enterprise scenarios, which we believe provides relevant evidence for the system's applicability. We have revised the evaluation section to explicitly discuss the limitations of the current approach, including the absence of quantitative benchmarks, and added a dedicated paragraph outlining planned future work to develop labeled test sets for such metrics. We maintain that the expert feedback supports the governance preservation claim in operational contexts, though we acknowledge the need for complementary quantitative validation. revision: partial

  2. Referee: [Abstract and system description] Abstract and §3 (system description): the description of real-time enforcement states that queries are blocked 'before they run,' but no details are given on latency, failure modes, or how the MCP integration interacts with the underlying query engines (Snowflake, Databricks, etc.) when validation fails.

    Authors: We have revised §3 (System Description) and the abstract to include the requested implementation details. The AI-driven validation typically adds 250-400 ms of latency before query submission. Failure modes include low-confidence compliance determinations (in which case queries are blocked by default) and ambiguous business purpose mappings. Upon validation failure, the MCP integration returns a structured error response to the agent and does not forward the query to the underlying engines (Snowflake, Databricks, or Google Cloud), ensuring no execution occurs. These clarifications have been added without altering the core architecture. revision: yes

Circularity Check

0 steps flagged

No circularity: system proposal with no derivations or self-referential reductions

full rationale

The paper describes a proposed system architecture (Data Product MCP) for AI-driven data access and governance enforcement. It contains no equations, no fitted parameters presented as predictions, no uniqueness theorems, and no derivation chain that reduces to its own inputs by construction. The central claims rest on a qualitative evaluation from n=16 experts rather than any self-citation load-bearing argument or ansatz smuggled via prior work. Per the guidelines, this is a self-contained descriptive proposal whose evaluation does not invoke the forbidden patterns; the absence of quantitative validation metrics is a correctness concern, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the functionality of the newly introduced MCP integration and the accuracy of AI-based semantic discovery and query validation, which are presented without specific free parameters, detailed axioms, or external evidence in the abstract.

axioms (1)
  • domain assumption AI models can reliably perform semantic discovery of data products and validate queries against business purposes
    This assumption underpins the real-time enforcement mechanism described.
invented entities (1)
  • Data Product MCP no independent evidence
    purpose: Protocol linking AI-driven marketplace with cloud data platforms for semantic discovery and real-time contract enforcement
    Newly introduced system component without independent evidence or falsifiable handles provided.

pith-pipeline@v0.9.0 · 5560 in / 1196 out tokens · 80092 ms · 2026-05-16T15:28:26.264831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    available at https://modelcontextprotocol.io/docs/getting-started/intro

    Anthropic 2024.What is the Model Context Protocol (MCP)?Anthropic. available at https://modelcontextprotocol.io/docs/getting-started/intro

  2. [2]

    Inês Araújo Machado, Carlos Costa, and Maribel Yasmina Santos. 2022. Ad- vancing data architectures with data mesh implementations. InInternational Conference on Advanced Information Systems Engineering. Springer, 10–18

  3. [3]

    2025.Open Data Contract Standard (ODCS)

    Bitol. 2025.Open Data Contract Standard (ODCS). LF AI & Data Foundation. https://bitol-io.github.io/open-data-contract-standard/v3.0.1/home/ Version 3.0.1

  4. [4]

    2025.Open Data Product Standard (ODPS)

    Bitol. 2025.Open Data Product Standard (ODPS). LF AI & Data Foundation. https://bitol-io.github.io/open-data-product-standard

  5. [5]

    Ivo Blohm, Felix Wortmann, Christine Legner, and Felix Köbler. 2024. Data products, data mesh, and data fabric.Business & Information Systems Engineering 66 (06 2024), 643–652. doi:10.1007/s12599-024-00876-5

  6. [6]

    Nemania Borovits, Indika Kumara, Damian A Tamburri, and Willem-Jan Van Den Heuvel. 2024. Privacy Engineering in the Data Mesh: Towards a Decen- tralized Data Privacy Governance Framework. InInternational Conference on Service-Oriented Computing. Springer, 265–276

  7. [7]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psy- chology.Qualitative research in psychology3, 2 (2006), 77–101. doi:10.1191/ 1478088706qp063oa

  8. [8]

    Cloudflare. 2025. Cloudflare MCP Server. https://github.com/cloudflare/mcp- server-cloudflare

  9. [9]

    Zhamak Dehgani. 2019. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. https://martinfowler.com/articles/data-monolith-to- mesh.html

  10. [10]

    Dietz, Arif Wider, and Simon Harrer

    Linus W. Dietz, Arif Wider, and Simon Harrer. 2025. Automating Data Gover- nance with Generative AI.AAAI/ACM Conference on AI, Ethics, and Society8, 1 (Oct. 2025), 760–771. doi:10.1609/aies.v8i1.36587 Data Product MCP: Chat with your Enterprise Data

  11. [11]

    Stefan Driessen, Geert Monsieur, and Willem Jan van den Heuvel. 2023. Data Product Metadata Management: An Industrial Perspective. InLecture Notes in Computer Science, Vol. 13821. Springer Science and Business Media Deutschland GmbH, 237–248. doi:10.1007/978-3-031-26507-5_19

  12. [12]

    Stefan Driessen, Willem Jan van den Heuvel, and Geert Monsieur. 2023. ProMoTe: A Data Product Model Template for Data Meshes. InLecture Notes in Computer Science, Vol. 14320. Springer Science and Business Media Deutschland GmbH, 125–142. doi:10.1007/978-3-031-47262-6_7

  13. [13]

    Entropy Data. 2025. Data Product MCP. https://github.com/entropy-data/ dataproduct-mcp

  14. [14]

    Martin Fadler and Christine Legner. 2020. Who Owns Data In The Enterprise? Rethinking Data Ownership in Time of Big Data and Analytics. InTwenty-Eigth European Conference on Information Systems

  15. [15]

    Swapan Ghosh. 2025. Developing artificial intelligence (AI) capabilities for data-driven business model innovation: Roles of organizational adaptability and leadership.Journal of Engineering and Technology Management75 (2025), 101851

  16. [16]

    GitHub. 2025. GitHub MCP Server. https://github.com/github/github-mcp- server

  17. [17]

    Abel Goedegebuure, Indika Kumara, Stefan Driessen, Willem-Jan Van Den Heuvel, Geert Monsieur, Damian Andrew Tamburri, and Dario Di Nucci

  18. [18]

    Surveys57, 1, Article 11 (Oct

    Data Mesh: A Systematic Gray Literature Review.Comput. Surveys57, 1, Article 11 (Oct. 2024), 36 pages. doi:10.1145/3687301

  19. [19]

    Gomarble AI. 2025. Facebook/Meta Ads MCP Server. https://github.com/ gomarble-ai/facebook-ads-mcp-server

  20. [20]

    Soňa Karkošková. 2024. Data Mesh: Guiding Principles and Patterns, and Data Catalog Architectural Concept. InInternational Conference on Control, Decision and Information Technologies. 1485–1490. doi:10.1109/CoDIT62066.2024.10708349

  21. [21]

    Clément Labadie, Christine Legner, Markus Eurich, and Martin Fadler. 2020. FAIR Enough? Enhancing the Usage of Enterprise Data with Data Catalogs. In IEEE Conference on Business Informatics, Vol. 1. 201–210. doi:10.1109/CBI49978. 2020.00029

  22. [22]

    Inês Araújo Machado, Carlos Costa, and Maribel Yasmina Santos. 2022. Data Mesh: Concepts and Principles of a Paradigm Shift in Data Architectures.Procedia Computer Science196 (2022), 263–271. doi:10.1016/j.procs.2021.12.013

  23. [23]

    Patrick Mikalef, Maria Boura, George Lekakos, and John Krogstie. 2019. Big Data Analytics Capabilities and Innovation: The Mediating Role of Dynamic Capabilities and Moderating Effect of the Environment.Change Management Strategy eJournal(2019). https://api.semanticscholar.org/CorpusID:164395803

  24. [24]

    Patrick Mikalef and Manjul Gupta. 2021. Artificial intelligence capability: Con- ceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance.Information & Management58, 3 (2021), 103434. doi:10.1016/j.im.2021.103434

  25. [25]

    Patrick Mikalef and Manjul Gupta. 2021. Artificial intelligence capability: Con- ceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance.Information & management58, 3 (2021), 103434

  26. [26]

    1994.Qualitative data analysis: An expanded sourcebook

    Matthew B Miles and A Michael Huberman. 1994.Qualitative data analysis: An expanded sourcebook. Thousand Oaks

  27. [27]

    Olesen-Bagneux

    O. Olesen-Bagneux. 2023.The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation. O’Reilly. https://books.google. it/books?id=JqbwzgEACAAJ

  28. [28]

    Lukas Schleicher, Mirek Sindler, and Quirin Dittmann. 2024. Data Mesh: How to Implement the Paradigm Shift. InInternational Conference on Subject-Oriented Business Process Management. Springer, 38–58

  29. [29]

    Snowflake Labs. 2025. Snowflake Cortex AI Model Context Protocol (MCP) Server. https://github.com/Snowflake-Labs/mcp

  30. [30]

    Daniel van der Werf, João Moreira, and Jean Paul Sebastian Piest. 2024. Towards a Data Mesh Reference Architecture. InInternational Conference on Enterprise Design, Operations, and Computing. Springer, 339–353

  31. [31]

    Tom Van Eijk, Indika Kumara, Dario Di Nucci, Damian Andrew Tamburri, and Willem-Jan Van den Heuvel. 2024. Architectural design decisions for self-serve data platforms in data meshes. InIEEE International Conference on Software Architecture. IEEE, 135–145

  32. [32]

    Arif Wider, Simon Harrer, and Linus W Dietz. 2025. AI-Assisted Data Governance with Data Mesh Manager. InIEEE International Conference on Web Services. IEEE, 963–965

  33. [33]

    Arif Wider, Sumedha Verma, and Atif Akhtar. 2023. Decentralized Data Gover- nance as Part of a Data Mesh Platform: Concepts and Approaches. InIEEE Inter- national Conference on Web Services. 746–754. doi:10.1109/ICWS60048.2023.00101

  34. [34]

    Dong Wu, Xinyi Lin, Shivam Gupta, and Arpan Kumar Kar. 2024. Big Data Ana- lytics Capability, Dynamic Capability, and Firm Performance: The Moderating Effect of IT–Business Strategic Alignment.IEEE Transactions on Engineering Management71 (2024), 11638–11651. doi:10.1109/TEM.2024.3429648