pith. sign in

arxiv: 1907.08138 · v1 · pith:77PE3HBVnew · submitted 2019-07-18 · 💻 cs.DB · cs.LG

A Survey of Data Quality Measurement and Monitoring Tools

Pith reviewed 2026-05-24 19:10 UTC · model grok-4.3

classification 💻 cs.DB cs.LG
keywords data qualitydata profilingdata quality metricsdata quality monitoringsoftware toolssurveydata preprocessing
0
0 comments X

The pith

A survey of data quality tools finds that generally applicable metrics are rarely implemented despite wide research acceptance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs a systematic search to identify hundreds of data quality tools and then evaluates 13 domain-independent ones on their support for profiling, metric-based measurement, and continuous monitoring. It establishes that many tools cover basic profiling but fall short on the measurement and monitoring functions that research has emphasized. The evaluation highlights opportunities to expand tool capabilities and enables discussion of why research concepts like broadly usable metrics see little practical adoption. This matters for anyone relying on data analytics, since unmeasured quality issues undermine downstream decisions. The work positions current tools as incomplete relative to established data quality ideas.

Core claim

After identifying 667 tools and evaluating 13 that meet criteria for domain independence and free evaluability, the survey shows common support for data profiling but limited implementation of generally applicable data quality metrics and continuous monitoring. This gap between research literature and tool functionality allows a critical discussion of concepts that are widely accepted in theory yet absent from observed practice.

What carries the argument

Systematic search followed by evaluation of 13 tools across the three areas of data profiling, metric-based data quality measurement, and continuous monitoring.

If this is right

  • Data quality tools can be extended to include more of the measurement functions described in research.
  • Continuous monitoring remains an underdeveloped capability across the evaluated tools.
  • Generally applicable metrics that work across domains are missing from most current implementations.
  • The survey results support targeted enhancements to close the gap between research concepts and tool features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool builders could test whether adopting research-defined metric sets increases user adoption for analytics pipelines.
  • A follow-up inventory could track whether new tools since the survey have narrowed the implementation gap for general metrics.
  • Standardization efforts might focus on metrics that map directly to profiling outputs already common in tools.

Load-bearing premise

The 13 tools chosen after exclusion criteria represent the functional range of current domain-independent data quality tools and the search captured the relevant population without major bias.

What would settle it

Release or discovery of multiple additional domain-independent tools that each provide a suite of generally applicable metrics and continuous monitoring functions would contradict the observed rarity of those features.

Figures

Figures reproduced from arXiv: 1907.08138 by Elisa Rusz, Lisa Ehrlinger, Wolfram W\"o{\ss}.

Figure 1
Figure 1. Figure 1: Systematic Search different search functionalities, we selected the closest search-engine-specific settings to reflect our original search aim [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
read the original abstract

High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of research has been published about the measurement (i.e., the detection) of data quality issues and different generally applicable data quality dimensions and metrics have been discussed. In this work, we close the gap between research into data quality measurement and practical implementations by investigating the functional scope of current data quality tools. With a systematic search, we identified 667 software tools dedicated to "data quality", from which we evaluated 13 tools with respect to three functionality areas: (1) data profiling, (2) data quality measurement in terms of metrics, and (3) continuous data quality monitoring. We selected the evaluated tools with regard to pre-defined exclusion criteria to ensure that they are domain-independent, provide the investigated functions, and are evaluable freely or as trial. This survey aims at a comprehensive overview on state-of-the-art data quality tools and reveals potential for their functional enhancement. Additionally, the results allow a critical discussion on concepts, which are widely accepted in research, but hardly implemented in any tool observed, for example, generally applicable data quality metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports a systematic search that identified 667 data quality tools, from which 13 domain-independent tools were selected and evaluated on three functionality areas: data profiling, metric-based data quality measurement, and continuous monitoring. It concludes that widely accepted research concepts such as generally applicable data quality metrics are hardly implemented in the observed tools and identifies potential for functional enhancement.

Significance. A methodologically transparent survey of this kind could usefully document the gap between data-quality research and deployed tools, providing a reference point for both practitioners and researchers seeking to implement more advanced metrics or monitoring.

major comments (2)
  1. [Abstract / Methods] The description of the systematic search (abstract and corresponding methods section) states that 667 candidates were obtained but supplies no search strings, list of queried sources or repositories, date range, or quantitative record of how the exclusion criteria were applied. Without these details the claim that the final 13 tools support the generalization that research concepts are “hardly implemented in any tool observed” cannot be assessed for selection bias or reproducibility.
  2. [Evaluation / Results] The evaluation of the 13 tools on the three functionality areas is presented without an explicit protocol (e.g., test data sets used, criteria for determining whether a metric is “generally applicable,” or how continuous monitoring was verified). This absence directly affects the reliability of the functional-scope comparison that underpins the headline observation.
minor comments (1)
  1. [Abstract] The abstract and introduction should briefly summarize the search and selection numbers so readers can immediately gauge the scope of the survey.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for improving methodological transparency, which we will address through revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Methods] The description of the systematic search (abstract and corresponding methods section) states that 667 candidates were obtained but supplies no search strings, list of queried sources or repositories, date range, or quantitative record of how the exclusion criteria were applied. Without these details the claim that the final 13 tools support the generalization that research concepts are “hardly implemented in any tool observed” cannot be assessed for selection bias or reproducibility.

    Authors: We agree that the manuscript would be strengthened by including these methodological details. The systematic search was conducted using specific search strings across multiple sources (including general web searches, academic repositories, and software directories) within a defined time frame, followed by application of the pre-defined exclusion criteria mentioned in the paper. These elements were part of our process but were not reported in full. We will revise the Methods section to add the search strings, queried sources, date range, and a quantitative record (e.g., via a flow diagram) of how exclusions were applied from 667 candidates to the final 13 tools. This will support reproducibility and allow assessment of selection bias. revision: yes

  2. Referee: [Evaluation / Results] The evaluation of the 13 tools on the three functionality areas is presented without an explicit protocol (e.g., test data sets used, criteria for determining whether a metric is “generally applicable,” or how continuous monitoring was verified). This absence directly affects the reliability of the functional-scope comparison that underpins the headline observation.

    Authors: We acknowledge that an explicit evaluation protocol is needed to substantiate the comparisons. Our assessments relied on reviewing publicly available documentation, trial versions, and feature sets of the tools against criteria drawn from data quality literature for profiling capabilities, metric applicability, and monitoring features. However, the paper does not detail the exact verification steps or test data considerations. We will add a dedicated evaluation protocol subsection describing the criteria (including how “generally applicable” metrics were defined based on established dimensions), verification methods for each functionality area, and any use of test datasets or documentation checks. This revision will improve the reliability of the reported findings. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive survey reports external tool observations without self-referential derivations

full rationale

The paper performs a systematic search and evaluates 13 third-party tools against pre-defined criteria. Its claims (e.g., that generally applicable metrics are hardly implemented) are direct reports on observed software, not predictions or results derived from the paper's own fitted parameters, equations, or self-citations. The selection process and exclusion criteria are methodological choices, not load-bearing self-definitions or renamings of known results. No equations, ansatzes, or uniqueness theorems are invoked that reduce to the authors' prior work or inputs by construction. This matches the default case of a self-contained descriptive survey against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central observations rest on the assumption that the tool identification and selection process is comprehensive and unbiased; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption A systematic search can reliably identify the population of data quality tools and that pre-defined exclusion criteria produce an unbiased sample of 13 evaluable tools.
    Invoked to support the claim of a comprehensive overview of state-of-the-art tools.

pith-pipeline@v0.9.0 · 5769 in / 1184 out tokens · 20777 ms · 2026-05-24T19:10:56.567200+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages

  1. [1]

    Profiling Relational Data: A Survey

    Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. Profiling Relational Data: A Survey. The VLDB Journal, 24(4):557–581, 2015

  2. [2]

    Data Profiling

    Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. Data Profiling. Synthesis Lectures on Data Management, 10(4):1–154, 2019

  3. [3]

    Aggarwal

    Charu C. Aggarwal. Outlier Analysis. Springer International Publishing, New York, NY , USA, 2nd edition, 2017

  4. [4]

    Apache Griffin User Guide

    Apache Foundation. Apache Griffin User Guide. Technical report, Apache Foundation, 2019. https://github. com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md (retrieved: June 2019)

  5. [5]

    Datenqualität erfolgreich steuern: Praxislösungen für Business-Intelligence-Projekte [Successfully Governing Data Quality: Practical Solutions for Business-Intelligence Projects]

    Detlef Apel, Wolfgang Behme, Rüdiger Eberlein, and Christian Merighi. Datenqualität erfolgreich steuern: Praxislösungen für Business-Intelligence-Projekte [Successfully Governing Data Quality: Practical Solutions for Business-Intelligence Projects]. Edition TDWI. dpunkt.verlag GmbH, Heidelberg, Germany, 2015

  6. [6]

    Aggregate Profile User Guide Version 6.1.8

    Arrah Technology. Aggregate Profile User Guide Version 6.1.8. Technical report, 2019

  7. [7]

    The Six Primary Dimensions for Data Quality Assessment

    Nicola Askham, Denise Cook, Martin Doyle, Helen Fereday, Mike Gibson, Ulrich Landbeck, Rob Lee, Chris Maynard, Gary Palmer, and Julian Schwarzenbach. The Six Primary Dimensions for Data Quality Assessment. Technical report, DAMA United Kingdom, 2013

  8. [8]

    Ballou and Harold L

    Donald P. Ballou and Harold L. Pazer. Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems. Management Science, 31(2):150–162, 1985

  9. [9]

    A Survey of Data Quality Tools

    José Barateiro and Helena Galhardas. A Survey of Data Quality Tools. Datenbank-Spektrum, 14:15–21, 2005

  10. [10]

    Methodologies for Data Quality Assessment and Improvement

    Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys (CSUR), 41(3):16:1–16:52, 2009

  11. [11]

    Data and Information Quality: Concepts, Methodologies and Techniques

    Carlo Batini and Monica Scannapieco. Data and Information Quality: Concepts, Methodologies and Techniques. Springer International Publishing, Switzerland, 2016. 27 A PREPRINT - JULY 19, 2019

  12. [12]

    Visual Interactive Creation, Customization, and Analysis of Data Quality Metrics

    Christian Bors, Theresia Gschwandtner, Simone Kriglstein, Silvia Miksch, and Margit Pohl. Visual Interactive Creation, Customization, and Analysis of Data Quality Metrics. Journal of Data and Information Quality , 10(1):3:1–3:26, May 2018

  13. [13]

    A Measure-theoretic Foundation for Data Quality

    Antoon Bronselaer, Robin De Mol, and Guy De Tré. A Measure-theoretic Foundation for Data Quality. IEEE Transactions on Fuzzy Systems, 26(2):627–639, 2018

  14. [14]

    Mengjie Chen, Meina Song, Jing Han, and E. Haihong. Survey on Data Quality. In 2012 World Congress on Information and Communication Technologies (WICT), pages 1009–1013, Trivandrum, India, 2012. IEEE

  15. [15]

    Magic Quadrant for Data Quality Tools

    Melody Chien and Ankush Jain. Magic Quadrant for Data Quality Tools. Technical report, Gartner, Inc., March 2019

  16. [16]

    Chrisman

    Nicholas R. Chrisman. The Role of Quality Information in the Long-Term Functioning of a Geographic Information System. Cartographica: The International Journal for Geographic Information and Geovisualization, 21(2):79– 88, 1983

  17. [17]

    Edgar F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM , 13(6):377–387, 1970

  18. [18]

    Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking

    Wei Dai, Isaac Wardlaw, Yu Cui, Kashif Mehdi, Yanyan Li, and Jun Long. Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking. In Information Technology: New Generations, pages 439–450. Springer, Las Vegas, NV , USA, 2016

  19. [19]

    Exploratory Data Mining and Data Cleaning, volume 479

    Tamraparni Dasu and Theodore Johnson. Exploratory Data Mining and Data Cleaning, volume 479. John Wiley & Sons, Hoboken, NJ, USA, 2003

  20. [20]

    Automated Continuous Data Quality Measurement with QuaIIe

    Lisa Ehrlinger, Bernhard Werth, and Wolfram Wöß. Automated Continuous Data Quality Measurement with QuaIIe. International Journal on Advances in Software, 11(3 & 4):400–417, December 2018

  21. [21]

    Automated Data Quality Monitoring

    Lisa Ehrlinger and Wolfram Wöß. Automated Data Quality Monitoring. In John R. Talburt, editor, Proceedings of the 22nd MIT International Conference on Information Quality (ICIQ 2017), pages 15.1–15.9, Little Rock, AR, USA, 2017

  22. [22]

    A Novel Data Quality Metric for Minimality

    Lisa Ehrlinger and Wolfram Wöß. A Novel Data Quality Metric for Minimality. In Hakim Hacid, Quan Z. Sheng, Tetsuya Yoshida, Azadeh Sarkheyli, and Rui Zhou, editors, Data Quality and Trust in Big Data , pages 1–15, Cham, 2019. Springer International Publishing

  23. [23]

    Elmagarmid, Panagiotis G

    Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S Verykios. Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering, 19(1):1–16, 2007

  24. [24]

    Larry P. English. Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. John Wiley & Sons, Inc., New York, NY , USA, 1999

  25. [25]

    User Manual Version 5.9

    Experian. User Manual Version 5.9. Technical report, Experian, 2018.https://www.edq.com/globalassets/ documentation/pandora/pandora_manual_590.pdf (retrieved: June 2019)

  26. [26]

    Fisher, Eitel J

    Craig W. Fisher, Eitel J. M. Lauria, and Carolyn C. Matheus. An Accuracy Metric: Percentages, Randomness, and Probabilities. Journal of Data and Information Quality, 1(3):16:1–16:21, December 2009

  27. [27]

    Big Data Validation and Quality Assurance – Issuses, Challenges, and Needs

    Jerry Gao, Chunli Xie, and Chuanqi Tao. Big Data Validation and Quality Assurance – Issuses, Challenges, and Needs. In Proceedings of the 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), pages 433–441, Oxford, UK, March 2016. IEEE

  28. [28]

    A Review of Information Quality Research

    Mouzhi Ge and Markus Helfert. A Review of Information Quality Research. In Proceedings of the 12th International Conference on Information Quality (ICIQ), pages 76–91, Cambridge, MA, USA, 2007. MIT 2007

  29. [29]

    An Evaluation Framework For Data Quality Tools

    Virginie Goasdoué, Sylvaine Nugier, Dominique Duquennoy, and Brigitte Laboisse. An Evaluation Framework For Data Quality Tools. In Proceedings of the 12th International Conference on Information Quality (ICIQ) , pages 280–294, Cambridge, MA, USA, 2007. MIT 2007

  30. [30]

    Towards a Precise Definition of Data Accuracy and a Justification for its Measure

    Tom Haegemans, Monique Snoeck, and Wilfried Lemahieu. Towards a Precise Definition of Data Accuracy and a Justification for its Measure. In Proceedings of the International Conference on Information Quality (ICIQ 2016), pages 16.1–16.13, Ciudad Real, Spain, 2016. Alarcos Research Group (UCLM)

  31. [31]

    Requirements for Data Quality Metrics

    Bernd Heinrich, Diana Hristova, Mathias Klier, Alexander Schiller, and Michael Szubartowicz. Requirements for Data Quality Metrics. Journal of Data and Information Quality, 9(2):12:1–12:32, January 2018

  32. [32]

    How to Measure Data Quality? A Metric-based Approach

    Bernd Heinrich, Marcus Kaiser, and Mathias Klier. How to Measure Data Quality? A Metric-based Approach. In S. Rivard and J. Webster, editors,Proceedings of the 28th International Conference on Information Systems (ICIS), pages 1–15, Montreal, Canada, 2007. Association for Information Systems 2007. 28 A PREPRINT - JULY 19, 2019

  33. [33]

    A Novel Data Quality Metric for Timeliness Considering Supplemental Data

    Bernd Heinrich and Mathias Klier. A Novel Data Quality Metric for Timeliness Considering Supplemental Data. In Proceedings of the 17th European Conference on Information Systems, pages 2701–2713, Verona, Italy, 2009. Università di Verona, Facoltà di Economia, Departimento de Economia Aziendale

  34. [34]

    Daten- und Informationsqualität [Data and Information Quality], volume 3

    Knut Hildebrand, Marcus Gebauer, Holger Hinrichs, and Michael Mielke. Daten- und Informationsqualität [Data and Information Quality], volume 3. Springer Vieweg, Wiesbaden, Germany, 2015

  35. [35]

    Datenqualitätsmanagement in Data Warehouse-Systemen [Data Quality Management in Data Warehouse Systems]

    Holger Hinrichs. Datenqualitätsmanagement in Data Warehouse-Systemen [Data Quality Management in Data Warehouse Systems]. PhD thesis, Universität Oldenburg, 2002

  36. [36]

    Standard for a Software Quality Metrics Methodology

    IEEE. Standard for a Software Quality Metrics Methodology. Technical Report 1061-1998, Institute of Electrical and Electronics Engineers, 1998

  37. [37]

    The Informatica Data Quality Methodology

    Informatica. The Informatica Data Quality Methodology. Technical report, Informatica, 2010

  38. [38]

    Profile Guide – 10.2 HotFix 1

    Informatica. Profile Guide – 10.2 HotFix 1. Technical report, Informatica, 2018. https://kb.informatica. com/proddocs/Product%20Documentation/5/IN_101_ProfileGuide_en.pdf (retrieved: June 2019)

  39. [39]

    ISO/IEC 25012

    International Organization of Standardization. ISO/IEC 25012. Online, 2019. https://iso25000.com/index. php/en/iso-25000-standards/iso-25012 (retrieved: June 2019)

  40. [40]

    Jain, M Narasimha Murty, and Patrick J

    Anil K. Jain, M Narasimha Murty, and Patrick J. Flynn. Data Clustering: A Review. ACM Computing Surveys (CSUR), 31(3):264–323, 2000

  41. [41]

    Magic Quadrant for Data Quality Tools

    Saul Judah, Mei Yang Selvage, and Ankush Jain. Magic Quadrant for Data Quality Tools. Technical report, Gartner, Inc., November 2016

  42. [42]

    Procedures for Performing Systematic Reviews

    Barbara Kitchenham. Procedures for Performing Systematic Reviews. Technical report, Keele University TR/SE-0401 and NICTA 0400011T.1, 2004

  43. [43]

    Datenqualitätswerkzeuge 2012 – Werkzeuge zur Bewertung und Erhöhung von Datenqualität [Data Quality Tools 2012 - Tools for the Assessment and Improvement of Data Quality]

    Jochen Kokemüller and Florian Haupt. Datenqualitätswerkzeuge 2012 – Werkzeuge zur Bewertung und Erhöhung von Datenqualität [Data Quality Tools 2012 - Tools for the Assessment and Improvement of Data Quality]. Technical report, Fraunhofer IAO, 2012

  44. [44]

    Now or Never: 2016 Global CEO Outlook, 2016

    KPMG International. Now or Never: 2016 Global CEO Outlook, 2016. https://home.kpmg/content/dam/ kpmg/pdf/2016/06/2016-global-ceo-outlook.pdf (retrieved: June 2019)

  45. [45]

    Data Profiling for Data Quality Improvement with OpenRefine

    Tien Fabrianti Kusumasari et al. Data Profiling for Data Quality Improvement with OpenRefine. In 2016 International Conference on Information Technology Systems and Innovation (ICITSI), pages 1–6, Bali, 2016. IEEE

  46. [46]

    A Survey on Data Quality: Classifying Poor Data

    Nuno Laranjeiro, Seyma Nur Soydemir, and Jorge Bernardino. A Survey on Data Quality: Classifying Poor Data. In Proceedings of the 21st Pacific Rim International Symposium on Dependable Computing (PRDC), pages 179–188, Zhangjiajie, China, 2015. IEEE

  47. [47]

    Lee, Leo L

    Yang W. Lee, Leo L. Pipino, James D. Funk, and Richard Y . Wang.Journey to Data Quality. The MIT Press, Cambridge, MA, USA, 2009

  48. [48]

    The Practitioner’s Guide to Data Quality Improvement

    David Loshin. The Practitioner’s Guide to Data Quality Improvement. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2010

  49. [49]

    Maletic and Andrian Marcus

    Jonathan I. Maletic and Andrian Marcus. Data Cleansing: A Prelude to Knowledge Discovery. In Oded Maimon, editor, Data Mining and Knowledge Discovery Handbook, pages 19–32. Springer, New York, NY , USA, 2009

  50. [50]

    Data Quality Assessment

    Arkady Maydanchik. Data Quality Assessment. Technics Publications, LLC, Bradley Beach, NJ, USA, 2007

  51. [51]

    The New Oxford American Dictionary, volume 2

    Erin McKean. The New Oxford American Dictionary, volume 2. Oxford University Press New York, Oxford, UK, 2005

  52. [52]

    How to Create a Business Case for Data Quality Improvement

    Susan Moore. How to Create a Business Case for Data Quality Improvement. Online,

  53. [53]

    https://www.gartner.com/smarterwithgartner/how-to-create-a-business-case-for- data-quality-improvement (retrieved: June 2019)

  54. [54]

    About the Dimensions of Data Quality

    Dan Myers. About the Dimensions of Data Quality. Online, 2017. http://dimensionsofdataquality.com/ about_dims (retrieved: June 2019)

  55. [55]

    Data Profiling Revisited

    Felix Naumann. Data Profiling Revisited. ACM SIGMOD Record, 42(4):40–49, 2014

  56. [56]

    Enterprise Data Quality Help version 9.0

    Oracle. Enterprise Data Quality Help version 9.0. Technical report, Oracle, 2018. https://www.oracle.com/ webfolder/technetwork/data-quality/edqhelp/index.htm (retrieved: June 2019)

  57. [57]

    Corporate Data Quality: Prerequisite for Successful Business Models

    Boris Otto and Hubert Österle. Corporate Data Quality: Prerequisite for Successful Business Models. Springer Gabler, Berlin, Germany, 2016. 29 A PREPRINT - JULY 19, 2019

  58. [58]

    Trusted Data in IBM’s MDM: Accuracy Dimension

    Przemyslaw Pawluk. Trusted Data in IBM’s MDM: Accuracy Dimension. InProceedings of the 2010 International Multiconference on Computer Science and Information Technology (IMCSIT), pages 577–584, Wisla, Poland,

  59. [59]

    Pipino, Yang W

    Leo L. Pipino, Yang W. Lee, and Richard Y . Wang. Data Quality Assessment. Communications of the ACM, 45(4):211–218, 2002

  60. [60]

    Informationsqualität bewerten – Grundlagen, Methoden, Praxisbeispiele [Assessing Infor- mation Quality – Foundations, Methods, and Practical Examples]

    Andrea Piro, editor. Informationsqualität bewerten – Grundlagen, Methoden, Praxisbeispiele [Assessing Infor- mation Quality – Foundations, Methods, and Practical Examples]. Symposion Publishing GmbH, Düsseldorf, Germany, 1st edition, 2014

  61. [61]

    Open Source Data Quality Tools: Revisited

    Venkata Sai Venkatesh Pulla, Cihan Varol, and Murat Al. Open Source Data Quality Tools: Revisited. In Shahram Latifi, editor,Information Technology: New Generations: 13th International Conference on Information Technology, pages 893–902, Cham, Switzerland, 2016. Springer International Publishing

  62. [62]

    An Overview of Open Source Data Quality Tools

    Val Pushkarev, Henry Neumann, Cihan Varol, and John R Talburt. An Overview of Open Source Data Quality Tools. In Proceedings of the 2010 International Conference on Information & Knowledge Engineering, IKE 2010, July 12-15, 2010, pages 370–376, Las Vegas, NV , USA, 2010. CSREA Press

  63. [63]

    DataCleaner Reference Documentation 5.2

    Quadient. DataCleaner Reference Documentation 5.2. Technical report, 2008. https://datacleaner.org/ resources/docs/5.2/pdf/datacleaner-reference.pdf (retrieved: June 2019)

  64. [64]

    Measuring Data Accuracy: A Framework and Review

    Thomas C Redman. Measuring Data Accuracy: A Framework and Review. In Information Quality, chapter 2, pages 21–36. M.E. Sharpe, Armonk, NY , USA, 2005

  65. [65]

    Alexis Rolland. mobyDQ. Technical report, The Data Tourists, 2019. https://mobydq.github.io/pages/ overview (retrieved: June 2019)

  66. [66]

    DataFlux Data Management Studio 2.7: User Guide

    SAS. DataFlux Data Management Studio 2.7: User Guide. Technical report, SAS, 2019. http://support.sas. com/documentation/onlinedoc/dfdmstudio/2.7/dmpdmsug/dfUnity.html (retrieved: June 2019)

  67. [67]

    Data Quality under the Computer Science Perspective

    Monica Scannapieco and Tiziana Catarci. Data Quality under the Computer Science Perspective. Archivi & Computer, 2:1–15, 2002

  68. [68]

    Thomas Schäffer and Helmut Beckmann. Trendstudie Stammdatenqualität 2013: Erhebung der aktuellen Situation zur Stammdatenqualität in Unternehmen und daraus abgeleitete Trends [Trend Study Master Data Quality 2013: Inquiry of the Current Situation of Master Data Quality in Companies and Derived Trends]. Technical report, Hochschule Heilbronn, 2014

  69. [69]

    Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework

    Laura Sebastian-Coleman. Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Elsevier, Waltham, MA, USA, 2012

  70. [70]

    Magic Quadrant for Data Quality Tools

    Mei Yang Selvage, Saul Judah, and Ankush Jain. Magic Quadrant for Data Quality Tools. Technical report, Gartner, Inc., October 2017

  71. [71]

    David J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton, FL, USA, 3rd edition, 2003

  72. [72]

    Methods and Theory behind the Clustering Functionality in OpenRefine

    Owen Stephens. Methods and Theory behind the Clustering Functionality in OpenRefine. Online, 2018. https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth (retrieved: June 2019)

  73. [73]

    Talend Open Studio for Data Quality – User Guide 7.0.1M2

    Talend. Talend Open Studio for Data Quality – User Guide 7.0.1M2. Technical report, Talend, 2017. http://download-mirror1.talend.com/top/user-guide-download/V552/TalendOpenStudio_DQ_ UG_5.5.2_EN.pdf (retrieved: June 2019)

  74. [74]

    Tools and Techniques for Assessing Metadata Quality

    Effie Tsiflidou and Nikos Manouselis. Tools and Techniques for Assessing Metadata Quality. In Research Conference on Metadata and Semantic Research, pages 99–110, Cham, Switzerland, 2013. Springer

  75. [75]

    Yair Wand and Richard Y . Wang. Anchoring Data Quality Dimensions in Ontological Foundations.Communica- tions of the ACM, 39(11):86–95, November 1996

  76. [76]

    Richard Y . Wang. A Product Perspective on Total Data Quality Management. Communications of the ACM, 41(2):58–65, 1998

  77. [77]

    Wang and Diane M

    Richard Y . Wang and Diane M. Strong. Beyond Accuracy: What Data Quality Means to Data Consumers.Journal of Management Information Systems, 12(4):5–33, March 1996

  78. [78]

    A Classification of Data Quality Assessment and Improvement Methods

    Philip Woodall, Martin Oberhofer, and Alexander Borek. A Classification of Data Quality Assessment and Improvement Methods. International Journal of Information Quality, 3(4):298–321, 2014

  79. [79]

    Hongwei Zhu, Stuart Madnick, Yang Lee, and Richard Y . Wang. Data and Information Quality Research: Its Evolution and Future. In Computing Handbook: Information Systems and Information Technology , pages 16.1–16.20, London, UK, 2014. Chapman and Hall/CRC. 30