Evaluating Data Quality Tools: Measurement Capabilities and LLM Integration
Pith reviewed 2026-05-10 16:45 UTC · model grok-4.3
The pith
Proprietary data quality tools deliver broader measurement features and emerging LLM assistance than open-source options, with LLM use limited to rule creation in all cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.
What carries the argument
Structured evaluation of rule definition, duplicate detection, metric aggregation, uncertainty handling, and LLM integration capabilities across six specific data quality tools.
Load-bearing premise
The evaluation criteria, derived from real-world use cases of company partners, are representative and unbiased for assessing general data quality tool capabilities across different contexts and industries.
What would settle it
An independent test where open-source tools achieve equivalent or superior measurement coverage with similar or lower effort than proprietary ones, or the release of a tool that enables LLMs to directly validate data without predefined rules.
read the original abstract
High data quality is critical for reliable analytics and operational efficiency. A growing ecosystem of tools has emerged to support data quality management, ranging from lightweight open-source libraries to comprehensive enterprise platforms. This paper evaluates six data quality tools: Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama. The evaluation criteria cover rule definition, duplicate detection, metric aggregation, and uncertainty handling, and were derived from real-world use cases of company partners. We further examine to what extent these tools integrate Large Language Models (LLMs). Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates six data quality tools (Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama) using criteria derived from real-world use cases of company partners. The criteria include rule definition, duplicate detection, metric aggregation, and uncertainty handling. The paper additionally examines LLM integration in these tools and reports that proprietary tools provide more comprehensive measurement features with emerging LLM assistance, while open-source tools offer flexibility at higher implementation cost. LLM support across all tools is limited to rule creation workflows, with no direct data validation capabilities observed.
Significance. If the evaluation methodology and supporting evidence are provided in detail, the work would offer practical value to data engineers and researchers by comparing commercial and open-source options and identifying gaps in LLM adoption for data quality tasks. It could guide tool selection and highlight areas for future LLM integration beyond rule generation. The absence of quantitative results and methodological transparency in the current form reduces its immediate utility as a reference.
major comments (2)
- [Abstract and Evaluation section] The abstract and evaluation section state comparative findings (proprietary tools more comprehensive; LLM integration limited to rule creation) but supply no specific methodology details, test cases, quantitative metrics, scoring rubrics, or error analysis. This leaves the central claims unverifiable and load-bearing for the reported conclusions.
- [Evaluation Criteria] The evaluation criteria are derived from partner use cases, yet the manuscript provides no explicit mapping of criteria to use cases, no diversity audit (industries, data scales, modalities), and no comparison against external benchmarks. This directly affects the generalizability of claims that the selected criteria fairly represent tool capabilities across contexts.
minor comments (1)
- A summary table comparing the six tools across the four criteria plus LLM features would improve readability and allow readers to quickly assess the evidence for the stated differences.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving methodological transparency and the presentation of our evaluation criteria. We address each point below and will revise the manuscript to incorporate additional details where feasible.
read point-by-point responses
-
Referee: [Abstract and Evaluation section] The abstract and evaluation section state comparative findings (proprietary tools more comprehensive; LLM integration limited to rule creation) but supply no specific methodology details, test cases, quantitative metrics, scoring rubrics, or error analysis. This leaves the central claims unverifiable and load-bearing for the reported conclusions.
Authors: We agree that the current version of the manuscript lacks sufficient detail on the evaluation methodology, which reduces verifiability. In the revised manuscript, we will expand the Evaluation section to include: (1) a description of the specific test cases and datasets used for each criterion, (2) the scoring rubric applied for qualitative comparisons, (3) any quantitative metrics collected during tool assessments (e.g., rule coverage counts or execution times where measurable), and (4) a brief error analysis or limitations discussion. This will allow readers to better assess the basis for our comparative findings on proprietary vs. open-source tools and LLM capabilities. revision: yes
-
Referee: [Evaluation Criteria] The evaluation criteria are derived from partner use cases, yet the manuscript provides no explicit mapping of criteria to use cases, no diversity audit (industries, data scales, modalities), and no comparison against external benchmarks. This directly affects the generalizability of claims that the selected criteria fairly represent tool capabilities across contexts.
Authors: We acknowledge the value of explicit mapping and a diversity discussion. The revised manuscript will include a new subsection or table that maps each evaluation criterion (rule definition, duplicate detection, metric aggregation, uncertainty handling) directly to the partner use cases from which it was derived. We will also add a paragraph discussing the diversity of the underlying use cases, including the industries represented, typical data scales, and data modalities involved. While a formal external benchmark comparison was outside the scope of this work, we will reference related data quality evaluation frameworks from the literature and note the generalizability limitations of partner-derived criteria. revision: yes
Circularity Check
No circularity: empirical evaluation based on external criteria and direct tool inspection
full rationale
The paper performs a comparative evaluation of six data quality tools using criteria (rule definition, duplicate detection, metric aggregation, uncertainty handling) explicitly derived from company partners' real-world use cases, followed by direct feature assessment and LLM integration checks. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain. The central claims rest on observable tool capabilities rather than any reduction to the paper's own inputs or prior self-referential results, rendering the analysis self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The evaluation criteria derived from real-world use cases of company partners are representative of broader data quality management needs.
Reference graph
Works this paper leans on
-
[1]
com/docs/library/overview, accessed: 2025-11-04
AI, E.: Evidently documentation overview (2025),https://docs.evidentlyai. com/docs/library/overview, accessed: 2025-11-04
work page 2025
-
[2]
AI, E.: Evidently github repository (2025),https://github.com/evidentlyai/ evidently, accessed: 2025-11-04
work page 2025
-
[3]
Ataccama: 7 essential data quality management capabilities (2025), https://www.ataccama.com/blog/7-essential-data-quality-management- capabilities, accessed: 2025-11-04
work page 2025
-
[4]
Ataccama: Ataccama one documentation (2025),https://docs.ataccama.com/ one/15.3.0/overview.html, accessed: 2025-11-04
work page 2025
-
[5]
Ataccama: Generative ai in ataccama one (2025),https://docs.ataccama.com/ one/15.3.0/generative-ai/generative-ai-in-one.html, accessed: 2025-11-04
work page 2025
-
[6]
Datenbank-Spektrum 14, 15–21 (2005)
Barateiro, J., Galhardas, H.: A survey of data quality tools. Datenbank-Spektrum 14, 15–21 (2005)
work page 2005
-
[7]
Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer (2016).https://doi.org/10.1007/978-3-319-24106-7
-
[8]
Boeckling, T., Bronselaer, A.: Cleaning data with swipe. ACM J. Data Inf. Qual. 17(1), 1–29 (2025).https://doi.org/10.1145/3712205
-
[9]
Frontiers in Big Data5(2022).https://doi.org/10.3389/fdata.2022
Ehrlinger, L., Wöß, W.: A survey of data quality measurement and monitoring tools. Frontiers in Big Data5(2022).https://doi.org/10.3389/fdata.2022. 850611
-
[10]
Expectations, G.: Data catalogs and data quality using great expecta- tions (2025),https://greatexpectations.io/blog/data-catalogs-and-data- quality-using-great-expectations-with-data-catalog/, accessed: 2025-11-04 10 Rehberger et al
work page 2025
-
[11]
Expectations, G.: Great expectations homepage (2025),https:// greatexpectations.io/, accessed: 2025-11-04
work page 2025
-
[12]
experianaperture.io/, accessed: 2025-11-04
Experian: Experian aperture documentation (2025),https://docs. experianaperture.io/, accessed: 2025-11-04
work page 2025
-
[13]
Experian: Experian data quality solutions (2025),https://www.edq.com/, ac- cessed: 2025-11-04
work page 2025
-
[14]
Experian: Find duplicates step overview (2025),https://docs. experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/overview/, accessed: 2025-11-04
work page 2025
-
[15]
Experian: Match status and confidence levels (2025),https://docs. experianaperture.io/data-quality/aperture-data-studio/find- duplicates-step/key-concepts/, accessed: 2025-11-04
work page 2025
-
[16]
Gartner: Magic quadrant for data quality solutions (2021)
work page 2021
-
[17]
Proceedings of the VLDB Endowment
Glock, A.C., Dominka-Kiss, C., Korom, P., Ehrlinger, L.: Detecting and cleaning errors in personal contact information with large language models. Proceedings of the VLDB Endowment. ISSN2150, 8097 (2025)
work page 2025
-
[18]
Manage- ment Information Systems Quarterly49(4), 1539–1566 (12 2025).https://doi
Heinrich, B., Klier, M., Obermeier, A., Schiller, A.: Different but the same? an event-driven approach to determine probabilities of data duplication1. Manage- ment Information Systems Quarterly49(4), 1539–1566 (12 2025).https://doi. org/10.25300/MISQ/2025/18178,https://doi.org/10.25300/MISQ/2025/18178
-
[19]
Informatica: Claire gpt: Ai-driven data management (2025),https: //www.informatica.com/resources/articles/claire-gpt-data-quality.html, accessed: 2025-11-04
work page 2025
-
[20]
Informatica: Data quality overview (2025),https://www.informatica.com/ products/data-quality.html, accessed: 2025-11-04
work page 2025
-
[21]
Informatica: Deduplication asset structure (2025),https://docs.informatica. com/data-governance-and-quality-cloud/data-quality/current-version/ deduplicate-assets/introduction-to-deduplicate-assets/deduplicate- asset-structure/advanced-options-on-the-deduplication-tab.html, ac- cessed: 2025-11-04
work page 2025
-
[22]
Informatica: How CLAIRE AI engine can help automate data gover- nance (2025),https://www.informatica.com/content/dam/informatica- com/en/collateral/solution-brief/how-claire-ai-engine-can-help-you- automate-data-governance_solution-brief_4591en.pdf, accessed: 2025-11-04
work page 2025
-
[23]
Informatica: Rule specification functions (2025),https://docs.informatica. com/data-governance-and-quality-cloud/data-quality/current-version/ rule-specification-assets/rule-specification-configuration/rule- specification-functions.html, accessed: 2025-11-04
work page 2025
-
[24]
Informatica: Viewing scorecards (2025),https://docs.informatica.com/data- governance-and-quality-cloud/data-profiling/current-version/data- profiling/profiles/rule-occurrences-and-scorecards/viewing- scorecards.html, accessed: 2025-11-04
work page 2025
-
[25]
Kharbanda, P.: Dive into data quality with deequ (2025),https://medium.com/ @pkharbanda09/dive-into-data-quality-with-deequ-bb1fd332d7c0, accessed: 2025-11-04
work page 2025
-
[26]
Business Horizons63(3), 325–337.https://doi.org/10.1016/j.bushor
Nagle, T., Redman, T., Sammon: Assessing data quality: A managerial call to action. Business Horizons63(3), 325–337.https://doi.org/10.1016/j.bushor. 2020.01.006
-
[27]
Ni, W., Zhang, K., Miao, X., Zhao, X., Wu, Y., Yin, J.: Iterclean: An iterative data cleaning framework with large language models. In: Proceedings of the ACM Evaluating Data Quality Tools 11 Turing Award Celebration Conference - China 2024. p. 100–105. ACM-TURC ’24, ACM, New York, NY, USA (2024).https://doi.org/10.1145/3674399.3674436, https://doi.org/10...
-
[28]
Papastergios, V., Ehrlinger, L., Gounaris, A.: Unfolding data quality dimensions in practice: A survey. ACM J. Data Inf. Qual. (2026).https://doi.org/10.1145/ 3786328
work page 2026
-
[29]
Pushkarev, V., Neumann, H., Varol, C., Talburt, J.R.: An overview of open source data quality tools. In: Proceedings of the 2010 International Conference on In- formation & Knowledge Engineering, IKE 2010, July 12-15, 2010. pp. 370–376. CSREA Press, Las Vegas, NV, USA (2010)
work page 2010
-
[30]
Harvard Business Review (2018)
Redman, T.C.: The impact of poor data quality on business performance. Harvard Business Review (2018)
work page 2018
-
[31]
Restat, V., Diestelkämper, I., Klettke, M., Störl, U.: FONDUE - fine-tuned opti- mization: Nurturing data usability & efficiency. J. Big Data12(1), 131 (2025). https://doi.org/10.1186/S40537-025-01158-X
-
[32]
In: Datenbanksysteme für Business, Technologie und Web (BTW 2023)
Restat, V., Klettke, M., Störl, U.: FAIR is not enough - A metrics framework to ensure data quality through data preparation. In: Datenbanksysteme für Business, Technologie und Web (BTW 2023). LNI, vol. P-331, pp. 917–929. Gesellschaft für Informatik e.V. (2023).https://doi.org/10.18420/BTW2023-61
-
[33]
Schelter, S., Bießmann, F., Lange, D., Rukat, T., Schmidt, P., Seufert, S., Brunelle, P., Taptunov, A.: Unit testing data with deequ. In: Proceedings of the 2019 In- ternational Conference on Management of Data, SIGMOD Conference 2019, Am- sterdam, The Netherlands, June 30 - July 5, 2019. pp. 1993–1996. ACM (2019). https://doi.org/10.1145/3299869.3320210
-
[34]
Services, A.W.: Deequ github repository (2025),https://github.com/awslabs/ deequ, accessed: 2025-11-04
work page 2025
-
[35]
Journal of Management Information Systems12(4), 5–33 (1996)
Wang, R.Y., Strong, D.M.: Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems12(4), 5–33 (1996). https://doi.org/10.1080/07421222.1996.11518099
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.