Data Product MCP: Chat with your Enterprise Data
Pith reviewed 2026-05-16 15:28 UTC · model grok-4.3
The pith
Data Product MCP allows AI agents to discover, request, and query enterprise data products while enforcing data contracts in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Data Product MCP, built on the Model Context Protocol, links an AI-driven marketplace with cloud data platforms to let agents find, request, and query data products. Semantic discovery works from business context, AI-driven checks validate queries against approved business purposes, and contracts are enforced by preventing unauthorized queries from running, all while preserving governance levels in enterprise settings.
What carries the argument
The Model Context Protocol (MCP) that connects the data marketplace to AI agents and enables real-time query validation and contract enforcement.
If this is right
- AI agents can autonomously complete data discovery and querying tasks without technical expertise or manual interventions.
- Real-time AI checks ensure only approved queries run on distributed cloud systems.
- Semantic search improves the relevance of discovered data products for business needs.
- Governance standards remain intact as contracts are enforced before execution.
- The system supports practical enterprise scenarios such as customer analytics.
Where Pith is reading between the lines
- This could allow broader adoption of agentic AI in regulated industries by minimizing approval bottlenecks.
- If the validation proves reliable, similar protocols might apply to other data access domains beyond enterprises.
- Integration with additional platforms could further reduce data access friction.
- The method addresses a gap in maintaining governance during AI-driven data workflows.
Load-bearing premise
AI-driven checks can reliably validate generated queries against approved business purposes in real time without significant errors that block valid access or permit unauthorized queries.
What would settle it
Running the system on a set of enterprise queries and finding that the AI validation frequently allows unauthorized access or blocks legitimate ones would disprove the central claim.
Figures
read the original abstract
Computational data governance aims to make the enforcement of governance policies and legal obligations more efficient and reliable. Recent advances in natural language processing and agentic AI offer ways to improve how organizations share and use data. But many barriers remain. Today's tools require technical skills and multiple roles to discover, request, and query data. Automating data access using enterprise AI agents is limited by the means to discover and autonomously access distributed data. Current solutions either compromise governance or break agentic workflows through manual approvals. To close this gap, we introduce Data Product MCP integrated in a data product marketplace. This data marketplace, already in use at large enterprises, enables AI agents to find, request, and query enterprise data products while enforcing data contracts in real time without lowering governance standards. The system is built on the Model Context Protocol (MCP) and links the AI-driven marketplace with cloud platforms such as Snowflake, Databricks, and Google Cloud Platform. It supports semantic discovery of data products based on business context, automates access control by validating generated queries against approved business purposes using AI-driven checks, and enforces contracts in real time by blocking unauthorized queries before they run. We assessed the system with feedback from n=16 experts in data governance. Our qualitative evaluation demonstrates effectiveness through enterprise scenarios such as customer analytics. The findings suggest that Data Product MCP reduces the technical burden for data analysis without weakening governance, filling a key gap in enterprise AI adoption.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Data Product MCP, a system built on the Model Context Protocol (MCP) and integrated into an existing enterprise data product marketplace. It enables AI agents to semantically discover, request, and query distributed data products across platforms such as Snowflake, Databricks, and Google Cloud, while using AI-driven checks to validate generated queries against approved business purposes and enforce data contracts in real time. The central claim is that this automation preserves governance standards without requiring manual approvals or technical expertise. Evaluation consists of qualitative feedback from n=16 data governance experts on enterprise scenarios such as customer analytics.
Significance. If the AI query validation component can be shown to operate with high reliability, the system would address a practical barrier in enterprise AI adoption by allowing agentic workflows to access governed data without compromising compliance. The integration with production marketplaces and cloud platforms is a concrete engineering contribution that could be adopted by large organizations already using similar infrastructure.
major comments (2)
- [Evaluation section] Evaluation section: the claim that the system enforces contracts 'without lowering governance standards' rests entirely on qualitative expert feedback from n=16 participants. No precision, recall, false-positive/false-negative rates, or adversarial test results are reported for the AI-driven query validation against business purposes, making it impossible to assess whether unauthorized queries are reliably blocked or valid ones are incorrectly rejected.
- [Abstract and system description] Abstract and §3 (system description): the description of real-time enforcement states that queries are blocked 'before they run,' but no details are given on latency, failure modes, or how the MCP integration interacts with the underlying query engines (Snowflake, Databricks, etc.) when validation fails.
minor comments (2)
- [Introduction] The term 'Data Product MCP' is introduced without a clear definition of how it differs from or extends the base Model Context Protocol; a short comparison table or diagram would improve clarity.
- [System architecture] The manuscript would benefit from explicit pseudocode or a sequence diagram showing the end-to-end flow of an agent request, validation step, and contract enforcement.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the evaluation and system description. We address each point below, indicating revisions where we have incorporated the suggestions or provided additional clarification.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section: the claim that the system enforces contracts 'without lowering governance standards' rests entirely on qualitative expert feedback from n=16 participants. No precision, recall, false-positive/false-negative rates, or adversarial test results are reported for the AI-driven query validation against business purposes, making it impossible to assess whether unauthorized queries are reliably blocked or valid ones are incorrectly rejected.
Authors: We agree that quantitative metrics such as precision, recall, and adversarial testing would strengthen the assessment of the AI query validation component. Our evaluation was designed as a qualitative study with 16 data governance experts to capture practical insights from enterprise scenarios, which we believe provides relevant evidence for the system's applicability. We have revised the evaluation section to explicitly discuss the limitations of the current approach, including the absence of quantitative benchmarks, and added a dedicated paragraph outlining planned future work to develop labeled test sets for such metrics. We maintain that the expert feedback supports the governance preservation claim in operational contexts, though we acknowledge the need for complementary quantitative validation. revision: partial
-
Referee: [Abstract and system description] Abstract and §3 (system description): the description of real-time enforcement states that queries are blocked 'before they run,' but no details are given on latency, failure modes, or how the MCP integration interacts with the underlying query engines (Snowflake, Databricks, etc.) when validation fails.
Authors: We have revised §3 (System Description) and the abstract to include the requested implementation details. The AI-driven validation typically adds 250-400 ms of latency before query submission. Failure modes include low-confidence compliance determinations (in which case queries are blocked by default) and ambiguous business purpose mappings. Upon validation failure, the MCP integration returns a structured error response to the agent and does not forward the query to the underlying engines (Snowflake, Databricks, or Google Cloud), ensuring no execution occurs. These clarifications have been added without altering the core architecture. revision: yes
Circularity Check
No circularity: system proposal with no derivations or self-referential reductions
full rationale
The paper describes a proposed system architecture (Data Product MCP) for AI-driven data access and governance enforcement. It contains no equations, no fitted parameters presented as predictions, no uniqueness theorems, and no derivation chain that reduces to its own inputs by construction. The central claims rest on a qualitative evaluation from n=16 experts rather than any self-citation load-bearing argument or ansatz smuggled via prior work. Per the guidelines, this is a self-contained descriptive proposal whose evaluation does not invoke the forbidden patterns; the absence of quantitative validation metrics is a correctness concern, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI models can reliably perform semantic discovery of data products and validate queries against business purposes
invented entities (1)
-
Data Product MCP
no independent evidence
Reference graph
Works this paper leans on
-
[1]
available at https://modelcontextprotocol.io/docs/getting-started/intro
Anthropic 2024.What is the Model Context Protocol (MCP)?Anthropic. available at https://modelcontextprotocol.io/docs/getting-started/intro
work page 2024
-
[2]
Inês Araújo Machado, Carlos Costa, and Maribel Yasmina Santos. 2022. Ad- vancing data architectures with data mesh implementations. InInternational Conference on Advanced Information Systems Engineering. Springer, 10–18
work page 2022
-
[3]
2025.Open Data Contract Standard (ODCS)
Bitol. 2025.Open Data Contract Standard (ODCS). LF AI & Data Foundation. https://bitol-io.github.io/open-data-contract-standard/v3.0.1/home/ Version 3.0.1
work page 2025
-
[4]
2025.Open Data Product Standard (ODPS)
Bitol. 2025.Open Data Product Standard (ODPS). LF AI & Data Foundation. https://bitol-io.github.io/open-data-product-standard
work page 2025
-
[5]
Ivo Blohm, Felix Wortmann, Christine Legner, and Felix Köbler. 2024. Data products, data mesh, and data fabric.Business & Information Systems Engineering 66 (06 2024), 643–652. doi:10.1007/s12599-024-00876-5
-
[6]
Nemania Borovits, Indika Kumara, Damian A Tamburri, and Willem-Jan Van Den Heuvel. 2024. Privacy Engineering in the Data Mesh: Towards a Decen- tralized Data Privacy Governance Framework. InInternational Conference on Service-Oriented Computing. Springer, 265–276
work page 2024
-
[7]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psy- chology.Qualitative research in psychology3, 2 (2006), 77–101. doi:10.1191/ 1478088706qp063oa
work page 2006
-
[8]
Cloudflare. 2025. Cloudflare MCP Server. https://github.com/cloudflare/mcp- server-cloudflare
work page 2025
-
[9]
Zhamak Dehgani. 2019. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. https://martinfowler.com/articles/data-monolith-to- mesh.html
work page 2019
-
[10]
Dietz, Arif Wider, and Simon Harrer
Linus W. Dietz, Arif Wider, and Simon Harrer. 2025. Automating Data Gover- nance with Generative AI.AAAI/ACM Conference on AI, Ethics, and Society8, 1 (Oct. 2025), 760–771. doi:10.1609/aies.v8i1.36587 Data Product MCP: Chat with your Enterprise Data
-
[11]
Stefan Driessen, Geert Monsieur, and Willem Jan van den Heuvel. 2023. Data Product Metadata Management: An Industrial Perspective. InLecture Notes in Computer Science, Vol. 13821. Springer Science and Business Media Deutschland GmbH, 237–248. doi:10.1007/978-3-031-26507-5_19
-
[12]
Stefan Driessen, Willem Jan van den Heuvel, and Geert Monsieur. 2023. ProMoTe: A Data Product Model Template for Data Meshes. InLecture Notes in Computer Science, Vol. 14320. Springer Science and Business Media Deutschland GmbH, 125–142. doi:10.1007/978-3-031-47262-6_7
-
[13]
Entropy Data. 2025. Data Product MCP. https://github.com/entropy-data/ dataproduct-mcp
work page 2025
-
[14]
Martin Fadler and Christine Legner. 2020. Who Owns Data In The Enterprise? Rethinking Data Ownership in Time of Big Data and Analytics. InTwenty-Eigth European Conference on Information Systems
work page 2020
-
[15]
Swapan Ghosh. 2025. Developing artificial intelligence (AI) capabilities for data-driven business model innovation: Roles of organizational adaptability and leadership.Journal of Engineering and Technology Management75 (2025), 101851
work page 2025
-
[16]
GitHub. 2025. GitHub MCP Server. https://github.com/github/github-mcp- server
work page 2025
-
[17]
Abel Goedegebuure, Indika Kumara, Stefan Driessen, Willem-Jan Van Den Heuvel, Geert Monsieur, Damian Andrew Tamburri, and Dario Di Nucci
-
[18]
Data Mesh: A Systematic Gray Literature Review.Comput. Surveys57, 1, Article 11 (Oct. 2024), 36 pages. doi:10.1145/3687301
-
[19]
Gomarble AI. 2025. Facebook/Meta Ads MCP Server. https://github.com/ gomarble-ai/facebook-ads-mcp-server
work page 2025
-
[20]
Soňa Karkošková. 2024. Data Mesh: Guiding Principles and Patterns, and Data Catalog Architectural Concept. InInternational Conference on Control, Decision and Information Technologies. 1485–1490. doi:10.1109/CoDIT62066.2024.10708349
-
[21]
Clément Labadie, Christine Legner, Markus Eurich, and Martin Fadler. 2020. FAIR Enough? Enhancing the Usage of Enterprise Data with Data Catalogs. In IEEE Conference on Business Informatics, Vol. 1. 201–210. doi:10.1109/CBI49978. 2020.00029
-
[22]
Inês Araújo Machado, Carlos Costa, and Maribel Yasmina Santos. 2022. Data Mesh: Concepts and Principles of a Paradigm Shift in Data Architectures.Procedia Computer Science196 (2022), 263–271. doi:10.1016/j.procs.2021.12.013
-
[23]
Patrick Mikalef, Maria Boura, George Lekakos, and John Krogstie. 2019. Big Data Analytics Capabilities and Innovation: The Mediating Role of Dynamic Capabilities and Moderating Effect of the Environment.Change Management Strategy eJournal(2019). https://api.semanticscholar.org/CorpusID:164395803
work page 2019
-
[24]
Patrick Mikalef and Manjul Gupta. 2021. Artificial intelligence capability: Con- ceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance.Information & Management58, 3 (2021), 103434. doi:10.1016/j.im.2021.103434
-
[25]
Patrick Mikalef and Manjul Gupta. 2021. Artificial intelligence capability: Con- ceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance.Information & management58, 3 (2021), 103434
work page 2021
-
[26]
1994.Qualitative data analysis: An expanded sourcebook
Matthew B Miles and A Michael Huberman. 1994.Qualitative data analysis: An expanded sourcebook. Thousand Oaks
work page 1994
-
[27]
O. Olesen-Bagneux. 2023.The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation. O’Reilly. https://books.google. it/books?id=JqbwzgEACAAJ
work page 2023
-
[28]
Lukas Schleicher, Mirek Sindler, and Quirin Dittmann. 2024. Data Mesh: How to Implement the Paradigm Shift. InInternational Conference on Subject-Oriented Business Process Management. Springer, 38–58
work page 2024
-
[29]
Snowflake Labs. 2025. Snowflake Cortex AI Model Context Protocol (MCP) Server. https://github.com/Snowflake-Labs/mcp
work page 2025
-
[30]
Daniel van der Werf, João Moreira, and Jean Paul Sebastian Piest. 2024. Towards a Data Mesh Reference Architecture. InInternational Conference on Enterprise Design, Operations, and Computing. Springer, 339–353
work page 2024
-
[31]
Tom Van Eijk, Indika Kumara, Dario Di Nucci, Damian Andrew Tamburri, and Willem-Jan Van den Heuvel. 2024. Architectural design decisions for self-serve data platforms in data meshes. InIEEE International Conference on Software Architecture. IEEE, 135–145
work page 2024
-
[32]
Arif Wider, Simon Harrer, and Linus W Dietz. 2025. AI-Assisted Data Governance with Data Mesh Manager. InIEEE International Conference on Web Services. IEEE, 963–965
work page 2025
-
[33]
Arif Wider, Sumedha Verma, and Atif Akhtar. 2023. Decentralized Data Gover- nance as Part of a Data Mesh Platform: Concepts and Approaches. InIEEE Inter- national Conference on Web Services. 746–754. doi:10.1109/ICWS60048.2023.00101
-
[34]
Dong Wu, Xinyi Lin, Shivam Gupta, and Arpan Kumar Kar. 2024. Big Data Ana- lytics Capability, Dynamic Capability, and Firm Performance: The Moderating Effect of IT–Business Strategic Alignment.IEEE Transactions on Engineering Management71 (2024), 11638–11651. doi:10.1109/TEM.2024.3429648
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.