pith. sign in

arxiv: 2606.08811 · v1 · pith:YGZMM3UYnew · submitted 2026-06-07 · 💻 cs.DB

Data Architectures and their Technical Requirements (DATER)

Pith reviewed 2026-06-27 17:23 UTC · model grok-4.3

classification 💻 cs.DB
keywords data architecturesdata warehousedata lakedata lakehousedata fabricdata meshtechnical requirementsevaluation framework
0
0 comments X

The pith

The DATER framework supplies a shared set of technical requirement dimensions for describing and comparing data architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents DATER as a conceptual framework that breaks data architectures down by the technical requirements they must satisfy. It applies the framework to six established approaches, tracing each one's history, core features, and degree of fit to the defined dimensions. A reader would care because organizations face ongoing choices among these architectures when managing large, fast-moving data for analytics and decisions. The work shows how the dimensions reveal both overlaps and distinct strengths across the options. In this way the framework aims to make architecture selection more systematic rather than ad hoc.

Core claim

DATER is introduced as a conceptual framework consisting of dimensions that capture essential technical requirements; the six architectures—data warehouse, semantic data lake, data lake, data lakehouse, data fabric, and data mesh—are then described by historical context and defining features before their conformance to each DATER dimension is assessed.

What carries the argument

The DATER framework, a set of technical-requirement dimensions used to evaluate conformance of data architectures.

If this is right

  • Organizations gain a common language for matching data architectures to specific technical needs.
  • Overlaps between architectures such as data lakes and data meshes become visible through shared dimension scores.
  • Strengths and gaps of each architecture are highlighted for particular use cases.
  • Researchers obtain a structured basis for extending or refining architecture comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dimensions could be tested on hybrid or future architectures not covered in the original six.
  • DATER might serve as a starting point for adding non-technical criteria such as organizational governance.
  • Standardized scoring templates derived from the dimensions could support repeatable architecture audits.

Load-bearing premise

The chosen DATER dimensions capture the essential technical requirements completely and without subjective weighting that would alter the conformance results for the six architectures.

What would settle it

Independent teams applying the DATER dimensions to the same architecture and arriving at materially different conformance scores would show the framework fails to produce consistent evaluations.

Figures

Figures reproduced from arXiv: 2606.08811 by Christoph Quix, Sayed Hoseini, Stefan Decker.

Figure 1
Figure 1. Figure 1: Overview of terminology, abstractions and components for data architectures [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the various discussed data architectures [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data warehouse architecture based on [ [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Semantic data lake architecture [38] by storing diverse raw data according to its original format: re￾lational (e.g., MySQL), document-based (e.g., MongoDB), and graph databases (e.g., Neo4j). A key feature is the use of metadata to manage, organize, and analyze the stored data, because otherwise the data is hardly usable as their structure and semantics are not known. Hence, metadata extraction or metadat… view at source ↗
Figure 6
Figure 6. Figure 6: Data Lakehouse architecture inspired by [ [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Data fabric architecture inspired by [103, 97] knowledge catalog section. Essentially, we view the data fab￾ric as knowledge catalog that connects various underlying data storages through comprehensive metadata management capable of data access and transformation via predefined standardized data pipelines. Following the illustration for active metadata by Hechler et al. [97], for the data fabric we first d… view at source ↗
Figure 8
Figure 8. Figure 8: Data mesh architecture inspired by [127, 130, 97, 111] Definition 6. [Data Mesh] “Data mesh is a socio-technical, decentralized, distributed concept for enterprise data manage￾ment [99]. It is characterized by the four principles of data￾as-a-product, domain orientation, self-service platforms, and federated computational governance [131].” [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Modern organizations generate and consume massive volumes of heterogeneous data at high speed. This requires a continuous development of new techniques for more efficient and reliable data management. Designing appropriate data architectures has therefore become a strategic necessity, as they shape how data is integrated, governed, and made available for analytics and decisionmaking. This paper introduces a conceptual framework - Data Architectures and their Technical Requirements (DATER) - to systematically describe and evaluate data architectures based on technical requirements. Six modern architectures are examined: data warehouse, (semantic) data lake, data lakehouse, data fabric, and data mesh. Each is analyzed by historical context, defining features, and conformance to DATER dimensions. The study supports researchers and practitioners in navigating architectural paradigms, clarifying overlaps, and highlighting strengths, limitations, and use-case suitability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Data Architectures and their Technical Requirements (DATER) conceptual framework to systematically describe and evaluate data architectures based on technical requirements. It examines six architectures—data warehouse, (semantic) data lake, data lakehouse, data fabric, and data mesh—by their historical context, defining features, and conformance to DATER dimensions, with the goal of clarifying overlaps and use-case suitability.

Significance. If the DATER dimensions form a complete and unbiased basis, the framework offers a structured conceptual tool for researchers and practitioners to compare data architecture paradigms without reliance on fitted parameters or mathematical derivations. The absence of invented entities or ad-hoc axioms in the presentation supports its role as an independent contribution.

major comments (2)
  1. [DATER dimensions definition section] The section defining the DATER dimensions provides no derivation method (e.g., from requirements engineering literature, ISO 8000, or NIST guidelines), no empirical survey basis, and no inter-rater validation. This is load-bearing for the central claim, as subjective selection could introduce omissions (such as fine-grained consistency models, multi-tenancy isolation, or energy efficiency) or uneven weighting that directly alters the conformance assessments for the six architectures.
  2. [Conformance analysis for the six architectures] In the conformance analysis sections for each architecture, the qualitative scoring lacks explicit criteria, weighting scheme, or cross-check against external references. This undermines the reported strengths, limitations, and use-case suitability, as the evaluations rest on unvalidated dimension application.
minor comments (2)
  1. [Abstract] The abstract lists five items under 'six modern architectures' (data warehouse, (semantic) data lake, data lakehouse, data fabric, and data mesh); clarify whether semantic data lake is treated as distinct from data lake.
  2. [References] Add references to standard data management guidelines (e.g., ISO 8000, NIST) to ground the framework choice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our conceptual framework. We address each major comment below.

read point-by-point responses
  1. Referee: [DATER dimensions definition section] The section defining the DATER dimensions provides no derivation method (e.g., from requirements engineering literature, ISO 8000, or NIST guidelines), no empirical survey basis, and no inter-rater validation. This is load-bearing for the central claim, as subjective selection could introduce omissions (such as fine-grained consistency models, multi-tenancy isolation, or energy efficiency) or uneven weighting that directly alters the conformance assessments for the six architectures.

    Authors: We agree that an explicit derivation strengthens the framework. The dimensions were synthesized from recurring technical requirements in data management literature (e.g., scalability, consistency, governance). We will revise the section to include a derivation subsection referencing requirements engineering practices and ISO 8000 for data quality. We will also discuss why certain aspects (e.g., energy efficiency) were treated as secondary rather than core dimensions. Inter-rater validation will be noted as a limitation for future empirical work, as the current contribution is conceptual. revision: yes

  2. Referee: [Conformance analysis for the six architectures] In the conformance analysis sections for each architecture, the qualitative scoring lacks explicit criteria, weighting scheme, or cross-check against external references. This undermines the reported strengths, limitations, and use-case suitability, as the evaluations rest on unvalidated dimension application.

    Authors: We acknowledge the value of explicit criteria. We will add a methods subsection defining conformance levels (full/partial/none) based on each architecture's documented features, with equal weighting applied across dimensions. Cross-references to existing literature surveys and case studies will be incorporated to support the assessments. These additions will improve reproducibility without changing the qualitative conclusions. revision: yes

Circularity Check

0 steps flagged

DATER is a self-contained conceptual framework with no derivation chain or reductions to inputs

full rationale

The paper presents DATER as an original conceptual framework for evaluating data architectures. No equations, fitted parameters, predictions, or first-principles derivations are described. The dimensions are introduced as part of the contribution itself rather than derived from prior results by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are referenced in the provided text. The analysis of the six architectures proceeds by direct conformance assessment to the defined dimensions, without reducing to any fitted or self-referential inputs. This meets the criteria for an independent contribution; no circular steps exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the introduction of the DATER framework and its dimensions as a useful evaluation lens; no free parameters, standard axioms, or invented physical entities are specified in the abstract, with the framework itself constituting the primary postulated contribution.

pith-pipeline@v0.9.1-grok · 5658 in / 1084 out tokens · 17676 ms · 2026-06-27T17:23:32.793752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

142 extracted references · 20 canonical work pages

  1. [1]

    data in management and decision engineering

    A. Tzanetos, G. Dounias, Introduction to the special issue “data in management and decision engineering”, Data in Brief 55 (2024) 110711

  2. [2]

    Muller, A reference architecture primer, Eindhoven Univ

    G. Muller, A reference architecture primer, Eindhoven Univ. of Techn., Eindhoven, White paper (2008) 24

  3. [3]

    Gröger, There is no ai without data, Communications of the ACM 64 (11) (2021) 98–108

    C. Gröger, There is no ai without data, Communications of the ACM 64 (11) (2021) 98–108

  4. [4]

    A. P. Sheth, J. A. Larson, Federated database systems for managing distributed, heterogeneous, and autonomous databases, ACM Comput. Surv. 22 (3) (1990) 183–236. doi:10.1145/96602.96604. URLhttps://doi.org/10.1145/96602.96604

  5. [5]

    G. M. Sang, L. Xu, P. de Vrieze, A reference architec- ture for big data systems, in: 2016 10th International Conference on Software, Knowledge, Information Man- agement and Applications (SKIMA), 2016, pp. 370–375. doi:10.1109/SKIMA.2016.7916249

  6. [6]

    G. M. Sang, L. Xu, P. De Vrieze, Simplifying big data analytics systems with a reference architecture, in: Col- laboration in a Data-Rich World: 18th IFIP WG 5.5 Working Conference on Virtual Enterprises, PRO-VE 2017, Vicenza, Italy, September 18-20, 2017, Proceed- ings 18, Springer, 2017, pp. 242–249

  7. [7]

    Van den Hoven, Data architecture: Blueprints for data., Information systems management 20 (1) (2003)

    J. Van den Hoven, Data architecture: Blueprints for data., Information systems management 20 (1) (2003)

  8. [8]

    Badampudi, C

    D. Badampudi, C. Wohlin, K. Petersen, Experiences from using snowballing and database searches in sys- tematic literature studies, in: Proceedings of the 19th in- ternational conference on evaluation and assessment in software engineering, 2015, pp. 1–10

  9. [9]

    Otto, The evolution of data spaces, in: Designing data spaces: The ecosystem approach to competitive advan- tage, Springer International Publishing Cham, 2022, pp

    B. Otto, The evolution of data spaces, in: Designing data spaces: The ecosystem approach to competitive advan- tage, Springer International Publishing Cham, 2022, pp. 3–15

  10. [10]

    Gessler, M

    J. Gessler, M. R. Cencic, C. Metzner, H. Wieker, K. Lin- dow, W. H. Schulz, Business models and organizational roles of data spaces: A framework for sustainable value creation, Data in Brief (2025) 111795

  11. [11]

    Lin, The lambda and the kappa, IEEE Internet Com- puting 21 (05) (2017) 60–66

    J. Lin, The lambda and the kappa, IEEE Internet Com- puting 21 (05) (2017) 60–66

  12. [12]

    Linstedt, M

    D. Linstedt, M. Olschimke, Building a scalable data warehouse with data vault 2.0, Morgan Kaufmann, 2015

  13. [13]

    W. H. Inmon, D. Linstedt, Data architecture: a primer for the data scientist: big data, data warehouse and data vault, Morgan Kaufmann, 2014

  14. [14]

    Kiran, P

    M. Kiran, P. Murphy, I. Monga, J. Dugan, S. S. Baveja, Lambda architecture for cost-effective batch and speed big data processing, in: 2015 IEEE International Con- ference on Big Data (Big Data), 2015, pp. 2785–2792. doi:10.1109/BigData.2015.7364082

  15. [15]

    Hoseini, A

    S. Hoseini, A. Ali, H. Shaker, C. Quix, Sedar: A seman- tic data reservoir for heterogeneous datasets, in: Pro- ceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 5056–5060

  16. [16]

    rep., Federal Ministry for Economic Affairs and Energy (BMWi), D-11019 Berlin, Germany (Oct

    Plattform Industrie 4.0, Project gaia-x: A federated data infrastructure as the cradle of a vibrant european ecosys- tem, Tech. rep., Federal Ministry for Economic Affairs and Energy (BMWi), D-11019 Berlin, Germany (Oct. 2019)

  17. [17]

    Venugopal, R

    S. Venugopal, R. Buyya, K. Ramamohanarao, A tax- onomy of data grids for distributed data sharing, man- agement, and processing, ACM Computing Surveys (CSUR) 38 (1) (2006) 3–es

  18. [18]

    Ataei, A

    P. Ataei, A. Litchfield, The state of big data reference ar- chitectures: A systematic literature review, IEEE Access 10 (2022) 113789–113807

  19. [19]

    Giebler, C

    C. Giebler, C. Gröger, E. Hoos, R. Eichler, H. Schwarz, B. Mitschang, The data lake architecture framework, in: BTW 2021, Gesellschaft für Informatik, Bonn, 2021, pp. 351–370

  20. [20]

    Geisler, M.-E

    S. Geisler, M.-E. Vidal, C. Cappiello, B. F. Lóscio, A. Gal, M. Jarke, M. Lenzerini, P. Missier, B. Otto, E. Paja, et al., Knowledge-driven data ecosystems toward 19/25 Preprint submitted to arXiv data transparency, ACM Journal of Data and Information Quality (JDIQ) 14 (1) (2021) 1–12

  21. [21]

    K. M. Endris, P. D. Rohde, M. Vidal, S. Auer, Ontario: Federated query processing against a semantic data lake, in: Proc. DEXA, V ol. 11706 of LNCS, Springer, 2019, pp. 379–395.doi:10.1007/978-3-030-27615-7\ _{2}{9}

  22. [22]

    R. Hai, S. Geisler, C. Quix, Constance: An intelligent data lake system, in: Proc. SIGMOD, 2016

  23. [23]

    D. A. M. Association, et al., DAMA-DMBOK: Data management body of knowledge, Technics Publications, LLC, 2017

  24. [24]

    M. I. S. Oliveira, B. F. Lóscio, What is a data ecosys- tem?, in: Proceedings of the 19th Annual International Conference on Digital Government Research: Gover- nance in the Data Age, dg.o ’18, ACM, New York, NY , USA, 2018.doi:10.1145/3209281.3209335. URLhttps://doi.org/10.1145/3209281. 3209335

  25. [25]

    C. J. Date, An introduction to database systems, Pearson Education India, 1977

  26. [26]

    Fathy, W

    N. Fathy, W. Gad, N. Badr, A unified access to hetero- geneous big data through ontology-based semantic inte- gration, in: 2019 Ninth International Conference on In- telligent Computing and Information Systems (ICICIS), IEEE, 2019, pp. 387–392

  27. [27]

    W. H. Inmon, Building the data warehouse, John wiley & sons, 2005

  28. [28]

    Decker, S

    S. Decker, S. Melnik, F. van Harmelen, D. Fensel, M. C. A. Klein, J. Broekstra, M. Erdmann, I. Horrocks, The semantic web: The roles of XML and RDF, IEEE Internet Comput. 4 (5) (2000) 63–74.doi:10.1109/ 4236.877487. URLhttps://doi.org/10.1109/4236.877487

  29. [29]

    Laney, et al., 3d data management: Controlling data volume, velocity and variety, META group research note 6 (70) (2001) 1

    D. Laney, et al., 3d data management: Controlling data volume, velocity and variety, META group research note 6 (70) (2001) 1

  30. [30]

    Moniruzzaman, S

    A. Moniruzzaman, S. A. Hossain, Nosql database: New era of databases for big data analytics-classification, characteristics and comparison, arXiv preprint arXiv:1307.0191 (2013)

  31. [31]

    Armbrust, A

    M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al., A view of cloud computing, Communications of the ACM 53 (4) (2010) 50–58

  32. [32]

    Golab, M

    L. Golab, M. T. Ozsu, Data stream management, Springer Nature, 2022

  33. [33]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012)

  34. [34]

    Simonyan, A

    K. Simonyan, A. Zisserman, Very deep convolu- tional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

  35. [35]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn- ing for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  36. [36]

    I. M. Putrama, P. Martinek, Heterogeneous data inte- gration: Challenges and opportunities, Data in Brief 56 (2024) 110853

  37. [37]

    Dixon, Pentaho, hadoop, and data lakes|james dixon’s blog, [Online; accessed 22-Oct-2024] (2010)

    J. Dixon, Pentaho, hadoop, and data lakes|james dixon’s blog, [Online; accessed 22-Oct-2024] (2010). URLhttps://jamesdixon.wordpress.com/2010/ 10/14/pentaho-hadoop-and-data-lakes/

  38. [38]

    tcs.2020.02.014,doi:10.1016/J.TCS.2020.02.014

    S. Hoseini, J. Theissen-Lipp, C. Quix, A sur- vey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakes, Journal of Web Semantics 81 (2024) 100819.doi:https://doi.org/10.1016/j. websem.2024.100819

  39. [39]

    Dehghani, Data Mesh: Delivering Data-Driven Value at Scale, O’Reilly Media, Sebastopol, CA, 2022

    Z. Dehghani, Data Mesh: Delivering Data-Driven Value at Scale, O’Reilly Media, Sebastopol, CA, 2022. URLhttps://www.oreilly.com/library/view/ data-mesh/9781492092346/

  40. [40]

    Zaidi, E

    E. Zaidi, E. Thoo, G. De Simoni, M. Beyer, Data fab- rics add augmented intelligence to modernize your data integration, With Eric Thoo. Gartner Grou 17 (2019)

  41. [41]

    Armbrust, A

    M. Armbrust, A. Ghodsi, R. Xin, M. Zaharia, Lake- house: a new generation of open platforms that unify data warehousing and advanced analytics, in: Proceed- ings of CIDR, V ol. 8, 2021, p. 28

  42. [42]

    Y . Huai, A. Chauhan, A. Gates, G. Hagleitner, E. N. Han- son, O. O’Malley, J. Pandey, Y . Yuan, R. Lee, X. Zhang, Major technical advancements in apache hive, in: Pro- ceedings of the 2014 ACM SIGMOD international con- ference on Management of data, 2014, pp. 1235–1246

  43. [43]

    Hentschel, J

    M. Hentschel, J. Dees, F. Funke, M. Heimel, I. Oukid, Building a data management system for the cloud: Lessons learned and future direc- tions, Datenbank-Spektrum 25 (1) (2025) 17–28. doi:10.1007/S13222-025-00494-9. URLhttps://doi.org/10.1007/ s13222-025-00494-9

  44. [44]

    Chaudhuri, U

    S. Chaudhuri, U. Dayal, An overview of data warehous- ing and olap technology, ACM Sigmod record 26 (1) (1997) 65–74. 20/25 Preprint submitted to arXiv

  45. [45]

    Bose, Advanced analytics: opportunities and chal- lenges, Industrial Management & Data Systems 109 (2) (2009) 155–172

    R. Bose, Advanced analytics: opportunities and chal- lenges, Industrial Management & Data Systems 109 (2) (2009) 155–172

  46. [46]

    A. A. Harby, F. Zulkernine, Data lakehouse: A sur- vey and experimental study, Information Systems (2024) 102460

  47. [47]

    Jarke, M

    M. Jarke, M. A. Jeusfeld, C. Quix, P. Vassiliadis, Architecture and Quality in Data Warehouses: An Extended Repository Approach, Information Systems 24 (3) (1999) 229–253

  48. [48]

    Kimball, M

    R. Kimball, M. Ross, The data warehouse toolkit: The definitive guide to dimensional modeling, John Wiley & Sons, 2013

  49. [49]

    Davoudian, L

    A. Davoudian, L. Chen, M. Liu, A survey on nosql stores, ACM Computing Surveys (CSUR) 51 (2) (2018) 1–43

  50. [50]

    Jarke, M

    M. Jarke, M. Lenzerini, Y . Vassiliou, P. Vassiliadis (Eds.), Fundamentals of Data Warehouses, 2nd Edition, Springer-Verlag, 2003

  51. [51]

    C. Quix, R. Hai, Data lake, in: Encyclopedia of Big Data Technologies, Springer, 2019.doi:10.1007/ 978-3-319-63962-8\_{7}{-}{1}

  52. [52]

    Sawadogo, J

    P. Sawadogo, J. Darmont, On data lake architectures and metadata management, JJIS (2021)

  53. [53]

    Scholly, P

    E. Scholly, P. Sawadogo, P. Liu, J. A. Espinosa-Oviedo, C. Favre, S. Loudcher, J. Darmont, C. Noûs, Coining goldmedal: a new contribution to data lake generic meta- data modeling, arXiv preprint arXiv:2103.13155 (2021)

  54. [54]

    Y . Zhao, I. Megdiche, F. Ravat, Data lake ingestion man- agement, arXiv preprint arXiv:2107.02885 (2021)

  55. [55]

    Farid, A

    M. Farid, A. Roatis, I. F. Ilyas, H.-F. Hoffmann, X. Chu, Clams: Bringing quality to data lakes, in: Proc. SIG- MOD, 2016

  56. [56]

    Galhotra, U

    S. Galhotra, U. Khurana, Semantic search over struc- tured data, in: Proceedings of the 29th ACM Interna- tional Conference on Information & Knowledge Man- agement, Association for Computing Machinery, New York, NY , USA, 2020, p. 3381–3384.doi:10.1145/ 3340531.3417426

  57. [57]

    Dibowski, S

    H. Dibowski, S. Schmid, Y . Svetashova, C. A. Henson, T. Tran, Using semantic technologies to manage a data lake: Data catalog, provenance and access control, in: SSWS@ISWC, 2020. URLhttps://api.semanticscholar.org/ CorpusID:229357020

  58. [58]

    Eichler, C

    R. Eichler, C. Giebler, C. Gröger, H. Schwarz, B. Mitschang, Modeling metadata in data lakes - A generic model, Data Knowl. Eng. 136 (2021) 101931. doi:10.1016/J.DATAK.2021.101931. URLhttps://doi.org/10.1016/j.datak.2021. 101931

  59. [59]

    Nambiar, D

    A. Nambiar, D. Mundra, An overview of data warehouse and data lake in modern enterprise data management, Big data and cognitive computing 6 (4) (2022) 132

  60. [60]

    Ramakrishnan, B

    R. Ramakrishnan, B. Sridharan, J. R. Douceur, P. Kas- turi, B. Krishnamachari-Sampath, K. Krishnamoorthy, P. Li, M. Manu, S. Michaylov, R. Ramos, et al., Azure data lake store: a hyperscale distributed file service for big data analytics, in: Proceedings of the 2017 ACM In- ternational Conference on Management of Data, 2017, pp. 51–63

  61. [61]

    A. Y . Halevy, F. Korn, N. F. Noy, C. Olston, N. Polyzotis, S. Roy, S. E. Whang, Managing google’s data lake: an overview of the goods system, IEEE Data Eng. Bull. (2016). URLhttp://sites.computer.org/debull/ A16sept/p5.pdf

  62. [62]

    Azzabi, Z

    S. Azzabi, Z. Alfughi, A. Ouda, Data lakes: A survey of concepts and architectures, Computers 13 (7) (2024) 183

  63. [63]

    R. Hai, C. Koutras, C. Quix, M. Jarke, Data lakes: A survey of functions and systems, IEEE TKDE 35 (12) (2023) 12571–12590

  64. [64]

    Nargesian, E

    F. Nargesian, E. Zhu, R. J. Miller, K. Q. Pu, P. C. Aro- cena, Data lake management: challenges and opportuni- ties, Proc. VLDB Endow. 12 (12) (2019) 1986–1989

  65. [65]

    Gartner, Gartner says beware of the data lake fallacy, [Online; accessed 29-Jun-2025] (2014)

    I. Gartner, Gartner says beware of the data lake fallacy, [Online; accessed 29-Jun-2025] (2014). URLhttps://www.gartner.com/ en/newsroom/press-releases/ 2014-07-28-gartner-says-beware-of-the-data-lake-fallacy

  66. [66]

    Dixon, Data lakes revisited, [Online; accessed 22-Oct- 2024] (2014)

    J. Dixon, Data lakes revisited, [Online; accessed 22-Oct- 2024] (2014). URLhttps://jamesdixon.wordpress.com/2014/ 09/25/data-lakes-revisited/

  67. [67]

    Jarke, C

    M. Jarke, C. Quix, On warehouses, lakes, and spaces: The changing role of conceptual modeling for data integration, in: J. Cabot, C. Gómez, O. Pastor, M. Sancho, E. Teniente (Eds.), Conceptual Mod- eling Perspectives, Springer, 2017, pp. 231–245. doi:10.1007/978-3-319-67271-7\_16. URLhttps://doi.org/10.1007/ 978-3-319-67271-7_16

  68. [68]

    Hoseini, M

    S. Hoseini, M. Ibbels, C. Quix, Enhancing machine learning capabilities in data lakes with automl and llms, in: European Conference on Advances in Databases and Information Systems, Springer, 2024, pp. 184–198. 21/25 Preprint submitted to arXiv

  69. [69]

    Bacco, A

    M. Bacco, A. Kocian, S. Chessa, A. Crivello, P. Barsoc- chi, What are data spaces? systematic survey and future outlook, Data in Brief 57 (2024) 110969

  70. [70]

    C. Quix, R. Hai, I. Vatov, Metadata extraction and man- agement in data lakes with GEMMS, Complex Syst. In- formatics Model. Q. (2016)

  71. [71]

    P. N. Sawadogo, E. Scholly, C. Favre, E. Ferey, S. Loud- cher, J. Darmont, Metadata systems for data lakes: mod- els and features, in: European conference on advances in databases and information systems, Springer, 2019, pp. 440–451

  72. [72]

    Eichler, C

    R. Eichler, C. Giebler, C. Gröger, H. Schwarz, B. Mitschang, Handle-a generic metadata model for data lakes, in: International Conference on Big Data Analyt- ics and Knowledge Discovery, Springer, 2020, pp. 73– 88

  73. [73]

    Ouellette, A

    P. Ouellette, A. Sciortino, F. Nargesian, B. G. Bashardoost, E. Zhu, K. Q. Pu, R. J. Miller, Ronin: data lake exploration, Proceedings of the VLDB Endowment 14 (12) (2021)

  74. [74]

    N. W. Paton, J. Chen, Z. Wu, Dataset discovery and ex- ploration: A survey, ACM Computing Surveys 56 (4) (2023) 1–37

  75. [75]

    M. N. Mami, D. Graux, S. Scerri, H. Jabeen, S. Auer, J. Lehmann, Squerall: Virtual ontology-based access to heterogeneous and large data sources, in: C. Ghi- dini, O. Hartig, M. Maleshkova, V . Svátek, I. F. Cruz, A. Hogan, J. Song, M. Lefrançois, F. Gandon (Eds.), The Semantic Web - ISWC 2019 - 18th International Seman- tic Web Conference, Auckland, New ...

  76. [76]

    Bagozi, D

    A. Bagozi, D. Bianchini, V . D. Antonellis, M. Garda, M. Melchiori, Personalised exploration graphs on se- mantic data lakes, in: Proc. OTM Conf., V ol. 11877 of LNCS, Springer, 2019, pp. 22–39.doi:10.1007/ 978-3-030-33246-4\_2

  77. [77]

    Sharma, Architecting Data Lakes—Data Manage- ment Architectures for Advanced Business Use Cases, 2nd Edition, O’Reilly Media, Sebastopol, CA, USA, 2018

    B. Sharma, Architecting Data Lakes—Data Manage- ment Architectures for Advanced Business Use Cases, 2nd Edition, O’Reilly Media, Sebastopol, CA, USA, 2018

  78. [78]

    Patel, G

    P. Patel, G. Wood, A. Diaz, Data lake governance best practices, Dzone Guide Big Data—Data Sci. Adv. Anal. 4 (2017) 6–7

  79. [79]

    Ravat, Y

    F. Ravat, Y . Zhao, Data lakes: Trends and perspectives, in: Database and Expert Systems Applications: 30th In- ternational Conference, DEXA 2019, Linz, Austria, Au- gust 26–29, 2019, Proceedings, Part I 30, Springer, 2019, pp. 304–313

  80. [80]

    Madsen, How to Build an Enterprise Data Lake: Important Considerations before Jumping, Third Nature Inc., San Mateo, CA, USA, 2015

    M. Madsen, How to Build an Enterprise Data Lake: Important Considerations before Jumping, Third Nature Inc., San Mateo, CA, USA, 2015

Showing first 80 references.