pith. sign in

arxiv: 2604.09163 · v1 · submitted 2026-04-10 · 💻 cs.DB

Evaluating Data Quality Tools: Measurement Capabilities and LLM Integration

Pith reviewed 2026-05-10 16:45 UTC · model grok-4.3

classification 💻 cs.DB
keywords data qualitytool evaluationlarge language modelsopen sourceproprietary softwarerule creationdata validationmetric aggregation
0
0 comments X

The pith

Proprietary data quality tools deliver broader measurement features and emerging LLM assistance than open-source options, with LLM use limited to rule creation in all cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares three open-source data quality tools against three proprietary ones on features like rule definition, duplicate detection, metric aggregation, and uncertainty handling. These criteria were drawn from actual business scenarios. It also assesses how much large language models are built into each tool. The key result is that proprietary platforms cover more capabilities with less user effort and are beginning to incorporate LLMs for creating rules, while open-source tools offer more adaptability but at higher setup cost. None of the tools currently allow large language models to perform direct data validation.

Core claim

Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.

What carries the argument

Structured evaluation of rule definition, duplicate detection, metric aggregation, uncertainty handling, and LLM integration capabilities across six specific data quality tools.

Load-bearing premise

The evaluation criteria, derived from real-world use cases of company partners, are representative and unbiased for assessing general data quality tool capabilities across different contexts and industries.

What would settle it

An independent test where open-source tools achieve equivalent or superior measurement coverage with similar or lower effort than proprietary ones, or the release of a tool that enables LLMs to directly validate data without predefined rules.

read the original abstract

High data quality is critical for reliable analytics and operational efficiency. A growing ecosystem of tools has emerged to support data quality management, ranging from lightweight open-source libraries to comprehensive enterprise platforms. This paper evaluates six data quality tools: Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama. The evaluation criteria cover rule definition, duplicate detection, metric aggregation, and uncertainty handling, and were derived from real-world use cases of company partners. We further examine to what extent these tools integrate Large Language Models (LLMs). Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates six data quality tools (Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama) using criteria derived from real-world use cases of company partners. The criteria include rule definition, duplicate detection, metric aggregation, and uncertainty handling. The paper additionally examines LLM integration in these tools and reports that proprietary tools provide more comprehensive measurement features with emerging LLM assistance, while open-source tools offer flexibility at higher implementation cost. LLM support across all tools is limited to rule creation workflows, with no direct data validation capabilities observed.

Significance. If the evaluation methodology and supporting evidence are provided in detail, the work would offer practical value to data engineers and researchers by comparing commercial and open-source options and identifying gaps in LLM adoption for data quality tasks. It could guide tool selection and highlight areas for future LLM integration beyond rule generation. The absence of quantitative results and methodological transparency in the current form reduces its immediate utility as a reference.

major comments (2)
  1. [Abstract and Evaluation section] The abstract and evaluation section state comparative findings (proprietary tools more comprehensive; LLM integration limited to rule creation) but supply no specific methodology details, test cases, quantitative metrics, scoring rubrics, or error analysis. This leaves the central claims unverifiable and load-bearing for the reported conclusions.
  2. [Evaluation Criteria] The evaluation criteria are derived from partner use cases, yet the manuscript provides no explicit mapping of criteria to use cases, no diversity audit (industries, data scales, modalities), and no comparison against external benchmarks. This directly affects the generalizability of claims that the selected criteria fairly represent tool capabilities across contexts.
minor comments (1)
  1. A summary table comparing the six tools across the four criteria plus LLM features would improve readability and allow readers to quickly assess the evidence for the stated differences.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving methodological transparency and the presentation of our evaluation criteria. We address each point below and will revise the manuscript to incorporate additional details where feasible.

read point-by-point responses
  1. Referee: [Abstract and Evaluation section] The abstract and evaluation section state comparative findings (proprietary tools more comprehensive; LLM integration limited to rule creation) but supply no specific methodology details, test cases, quantitative metrics, scoring rubrics, or error analysis. This leaves the central claims unverifiable and load-bearing for the reported conclusions.

    Authors: We agree that the current version of the manuscript lacks sufficient detail on the evaluation methodology, which reduces verifiability. In the revised manuscript, we will expand the Evaluation section to include: (1) a description of the specific test cases and datasets used for each criterion, (2) the scoring rubric applied for qualitative comparisons, (3) any quantitative metrics collected during tool assessments (e.g., rule coverage counts or execution times where measurable), and (4) a brief error analysis or limitations discussion. This will allow readers to better assess the basis for our comparative findings on proprietary vs. open-source tools and LLM capabilities. revision: yes

  2. Referee: [Evaluation Criteria] The evaluation criteria are derived from partner use cases, yet the manuscript provides no explicit mapping of criteria to use cases, no diversity audit (industries, data scales, modalities), and no comparison against external benchmarks. This directly affects the generalizability of claims that the selected criteria fairly represent tool capabilities across contexts.

    Authors: We acknowledge the value of explicit mapping and a diversity discussion. The revised manuscript will include a new subsection or table that maps each evaluation criterion (rule definition, duplicate detection, metric aggregation, uncertainty handling) directly to the partner use cases from which it was derived. We will also add a paragraph discussing the diversity of the underlying use cases, including the industries represented, typical data scales, and data modalities involved. While a formal external benchmark comparison was outside the scope of this work, we will reference related data quality evaluation frameworks from the literature and note the generalizability limitations of partner-derived criteria. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation based on external criteria and direct tool inspection

full rationale

The paper performs a comparative evaluation of six data quality tools using criteria (rule definition, duplicate detection, metric aggregation, uncertainty handling) explicitly derived from company partners' real-world use cases, followed by direct feature assessment and LLM integration checks. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain. The central claims rest on observable tool capabilities rather than any reduction to the paper's own inputs or prior self-referential results, rendering the analysis self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on an empirical comparison framework whose main unverified premise is the representativeness of partner-derived criteria; no free parameters, new entities, or additional axioms are introduced.

axioms (1)
  • domain assumption The evaluation criteria derived from real-world use cases of company partners are representative of broader data quality management needs.
    Explicitly stated as the source of criteria in the abstract, but no validation of generality or cross-industry applicability is provided.

pith-pipeline@v0.9.0 · 5446 in / 1280 out tokens · 56771 ms · 2026-05-10T16:45:49.256440+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    com/docs/library/overview, accessed: 2025-11-04

    AI, E.: Evidently documentation overview (2025),https://docs.evidentlyai. com/docs/library/overview, accessed: 2025-11-04

  2. [2]

    AI, E.: Evidently github repository (2025),https://github.com/evidentlyai/ evidently, accessed: 2025-11-04

  3. [3]

    Ataccama: 7 essential data quality management capabilities (2025), https://www.ataccama.com/blog/7-essential-data-quality-management- capabilities, accessed: 2025-11-04

  4. [4]

    Ataccama: Ataccama one documentation (2025),https://docs.ataccama.com/ one/15.3.0/overview.html, accessed: 2025-11-04

  5. [5]

    Ataccama: Generative ai in ataccama one (2025),https://docs.ataccama.com/ one/15.3.0/generative-ai/generative-ai-in-one.html, accessed: 2025-11-04

  6. [6]

    Datenbank-Spektrum 14, 15–21 (2005)

    Barateiro, J., Galhardas, H.: A survey of data quality tools. Datenbank-Spektrum 14, 15–21 (2005)

  7. [7]

    Batini, M

    Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer (2016).https://doi.org/10.1007/978-3-319-24106-7

  8. [8]

    Boeckling, T., Bronselaer, A.: Cleaning data with swipe. ACM J. Data Inf. Qual. 17(1), 1–29 (2025).https://doi.org/10.1145/3712205

  9. [9]

    Frontiers in Big Data5(2022).https://doi.org/10.3389/fdata.2022

    Ehrlinger, L., Wöß, W.: A survey of data quality measurement and monitoring tools. Frontiers in Big Data5(2022).https://doi.org/10.3389/fdata.2022. 850611

  10. [10]

    Expectations, G.: Data catalogs and data quality using great expecta- tions (2025),https://greatexpectations.io/blog/data-catalogs-and-data- quality-using-great-expectations-with-data-catalog/, accessed: 2025-11-04 10 Rehberger et al

  11. [11]

    Expectations, G.: Great expectations homepage (2025),https:// greatexpectations.io/, accessed: 2025-11-04

  12. [12]

    experianaperture.io/, accessed: 2025-11-04

    Experian: Experian aperture documentation (2025),https://docs. experianaperture.io/, accessed: 2025-11-04

  13. [13]

    Experian: Experian data quality solutions (2025),https://www.edq.com/, ac- cessed: 2025-11-04

  14. [14]

    experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/overview/, accessed: 2025-11-04

    Experian: Find duplicates step overview (2025),https://docs. experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/overview/, accessed: 2025-11-04

  15. [15]

    experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/key-concepts/, accessed: 2025-11-04

    Experian: Match status and confidence levels (2025),https://docs. experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/key-concepts/, accessed: 2025-11-04

  16. [16]

    Gartner: Magic quadrant for data quality solutions (2021)

  17. [17]

    Proceedings of the VLDB Endowment

    Glock, A.C., Dominka-Kiss, C., Korom, P., Ehrlinger, L.: Detecting and cleaning errors in personal contact information with large language models. Proceedings of the VLDB Endowment. ISSN2150, 8097 (2025)

  18. [18]

    Manage- ment Information Systems Quarterly49(4), 1539–1566 (12 2025).https://doi

    Heinrich, B., Klier, M., Obermeier, A., Schiller, A.: Different but the same? an event-driven approach to determine probabilities of data duplication1. Manage- ment Information Systems Quarterly49(4), 1539–1566 (12 2025).https://doi. org/10.25300/MISQ/2025/18178,https://doi.org/10.25300/MISQ/2025/18178

  19. [19]

    Informatica: Claire gpt: Ai-driven data management (2025),https: //www.informatica.com/resources/articles/claire-gpt-data-quality.html, accessed: 2025-11-04

  20. [20]

    Informatica: Data quality overview (2025),https://www.informatica.com/ products/data-quality.html, accessed: 2025-11-04

  21. [21]

    Informatica: Deduplication asset structure (2025),https://docs.informatica. com/data-governance-and-quality-cloud/data-quality/current-version/ deduplicate-assets/introduction-to-deduplicate-assets/deduplicate- asset-structure/advanced-options-on-the-deduplication-tab.html, ac- cessed: 2025-11-04

  22. [22]

    Informatica: How CLAIRE AI engine can help automate data gover- nance (2025),https://www.informatica.com/content/dam/informatica- com/en/collateral/solution-brief/how-claire-ai-engine-can-help-you- automate-data-governance_solution-brief_4591en.pdf, accessed: 2025-11-04

  23. [23]

    Informatica: Rule specification functions (2025),https://docs.informatica. com/data-governance-and-quality-cloud/data-quality/current-version/ rule-specification-assets/rule-specification-configuration/rule- specification-functions.html, accessed: 2025-11-04

  24. [24]

    Informatica: Viewing scorecards (2025),https://docs.informatica.com/data- governance-and-quality-cloud/data-profiling/current-version/data- profiling/profiles/rule-occurrences-and-scorecards/viewing- scorecards.html, accessed: 2025-11-04

  25. [25]

    Kharbanda, P.: Dive into data quality with deequ (2025),https://medium.com/ @pkharbanda09/dive-into-data-quality-with-deequ-bb1fd332d7c0, accessed: 2025-11-04

  26. [26]

    Business Horizons63(3), 325–337.https://doi.org/10.1016/j.bushor

    Nagle, T., Redman, T., Sammon: Assessing data quality: A managerial call to action. Business Horizons63(3), 325–337.https://doi.org/10.1016/j.bushor. 2020.01.006

  27. [27]

    In: Proceedings of the ACM Evaluating Data Quality Tools 11 Turing Award Celebration Conference - China 2024

    Ni, W., Zhang, K., Miao, X., Zhao, X., Wu, Y., Yin, J.: Iterclean: An iterative data cleaning framework with large language models. In: Proceedings of the ACM Evaluating Data Quality Tools 11 Turing Award Celebration Conference - China 2024. p. 100–105. ACM-TURC ’24, ACM, New York, NY, USA (2024).https://doi.org/10.1145/3674399.3674436, https://doi.org/10...

  28. [28]

    Papastergios, V., Ehrlinger, L., Gounaris, A.: Unfolding data quality dimensions in practice: A survey. ACM J. Data Inf. Qual. (2026).https://doi.org/10.1145/ 3786328

  29. [29]

    In: Proceedings of the 2010 International Conference on In- formation & Knowledge Engineering, IKE 2010, July 12-15, 2010

    Pushkarev, V., Neumann, H., Varol, C., Talburt, J.R.: An overview of open source data quality tools. In: Proceedings of the 2010 International Conference on In- formation & Knowledge Engineering, IKE 2010, July 12-15, 2010. pp. 370–376. CSREA Press, Las Vegas, NV, USA (2010)

  30. [30]

    Harvard Business Review (2018)

    Redman, T.C.: The impact of poor data quality on business performance. Harvard Business Review (2018)

  31. [31]

    Restat, V., Diestelkämper, I., Klettke, M., Störl, U.: FONDUE - fine-tuned opti- mization: Nurturing data usability & efficiency. J. Big Data12(1), 131 (2025). https://doi.org/10.1186/S40537-025-01158-X

  32. [32]

    In: Datenbanksysteme für Business, Technologie und Web (BTW 2023)

    Restat, V., Klettke, M., Störl, U.: FAIR is not enough - A metrics framework to ensure data quality through data preparation. In: Datenbanksysteme für Business, Technologie und Web (BTW 2023). LNI, vol. P-331, pp. 917–929. Gesellschaft für Informatik e.V. (2023).https://doi.org/10.18420/BTW2023-61

  33. [33]

    In: Proceedings of the 2019 In- ternational Conference on Management of Data, SIGMOD Conference 2019, Am- sterdam, The Netherlands, June 30 - July 5, 2019

    Schelter, S., Bießmann, F., Lange, D., Rukat, T., Schmidt, P., Seufert, S., Brunelle, P., Taptunov, A.: Unit testing data with deequ. In: Proceedings of the 2019 In- ternational Conference on Management of Data, SIGMOD Conference 2019, Am- sterdam, The Netherlands, June 30 - July 5, 2019. pp. 1993–1996. ACM (2019). https://doi.org/10.1145/3299869.3320210

  34. [34]

    Services, A.W.: Deequ github repository (2025),https://github.com/awslabs/ deequ, accessed: 2025-11-04

  35. [35]

    Journal of Management Information Systems12(4), 5–33 (1996)

    Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems12(4), 5–33 (1996). https://doi.org/10.1080/07421222.1996.11518099