Evaluation of Pipelines for Data Integration into Knowledge Graphs

Erhard Rahm; Marvin Hofer

arxiv: 2605.22304 · v1 · pith:LU4NRL6Anew · submitted 2026-05-21 · 💻 cs.AI · cs.DB· cs.LG

Evaluation of Pipelines for Data Integration into Knowledge Graphs

Marvin Hofer , Erhard Rahm This is my paper

Pith reviewed 2026-05-22 05:11 UTC · model grok-4.3

classification 💻 cs.AI cs.DBcs.LG

keywords knowledge graphsdata integrationbenchmarkpipelinesevaluationcoveragecorrectnessconsistency

0 comments

The pith

A new benchmark evaluates data integration pipelines for knowledge graphs using coverage, correctness, and consistency on movie-domain datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KGI-Bench to assess different workflows that add various input data to an existing knowledge graph. It supplies a seed KG, overlapping input data in three formats, and a reference KG as ground truth, all in the movie domain. Pipelines are judged by how much new information their updates add, how accurate those additions are, and whether the resulting graph remains consistent without contradictions. Testing twelve pipelines demonstrates measurable differences tied to input formats and internal design decisions. The approach supplies a concrete way to identify stronger pipeline options for integration tasks.

Core claim

The central claim is that integration pipelines can be systematically compared by running them on shared benchmark datasets and scoring the resulting updated knowledge graph with the three complementary metrics of coverage, correctness, and consistency. The supplied movie-domain resources include a seed KG, multi-format input data that overlaps with the seed, and a reference KG serving as ground truth, enabling reproducible evaluation of twelve pipelines across formats and design choices.

What carries the argument

KGI-Bench benchmark, which supplies a seed knowledge graph, multi-format overlapping input data, a reference ground-truth KG, and evaluation through the three metrics of coverage, correctness, and consistency on the updated graph.

Load-bearing premise

The movie-domain datasets and the three chosen metrics are representative enough to identify the best pipeline choices for general data integration problems across domains and data types.

What would settle it

Re-running the twelve pipelines on a non-movie dataset such as a biology or finance knowledge graph and obtaining a substantially different ranking of which pipelines perform best would indicate the movie resources do not generalize.

Figures

Figures reproduced from arXiv: 2605.22304 by Erhard Rahm, Marvin Hofer.

**Figure 2.** Figure 2: RDF single-source pipeline layouts used in the eval [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: JSON single-source pipeline layouts used in the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Text single-source pipeline layouts used in the eval [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Number of integrated entities by entity type and expected entities at each source increment (stage) for all pipelines. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Integrating new data into knowledge graphs (KG) typically involves different tasks that are executed within workflows or pipelines There are many possible pipelines for a specific integration problem but there is not yet a general approach to evaluate the overall quality and performance of such pipelines to be able to determine the best choices. We therefore propose a new benchmark KGI-Bench to evaluate integration pipelines that ingest different kinds of input data into an existing KG. We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency. We also provide benchmark datasets (seed KG, overlapping input data of three formats, reference KG as a ground truth) for the movie domain. To demonstrate the applicability and usefulness of the proposed benchmark, we comparatively evaluate 12 pipelines and analyze their behavior across different input data formats and design choices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces KGI-Bench with movie datasets and 12-pipeline tests but the domain choice limits how far the results can be trusted for general use.

read the letter

The main thing to take away is that the authors introduce KGI-Bench as a benchmark to evaluate pipelines for integrating different kinds of data into knowledge graphs. They supply movie-domain datasets and compare 12 pipelines using coverage, correctness, and consistency as metrics. The paper does a good job creating concrete resources for this purpose. The seed KG, the overlapping input data in three formats, and the reference KG as ground truth make it possible to measure the quality of the updated graph after integration. This approach of looking at the overall output rather than separate tasks is sensible. The demonstration with the 12 pipelines and the analysis of their behavior across input formats and design choices provides some initial insights that could guide choices in practice. One potential issue is the domain choice. Movie entities and relations are relatively clean and well-structured, which might not capture the challenges in other areas like scientific literature or business data where entity resolution and consistency problems are tougher. If the metrics and rankings don't hold up under those conditions, the usefulness for general data integration could be narrower than claimed. This work is aimed at people working on knowledge graph population and data quality for AI systems. A reader who needs a way to test and compare integration pipelines would get practical value from the benchmark and datasets. The paper shows clear thinking in setting up the evaluation framework. It deserves a serious referee because the benchmark idea is a useful addition even if it needs expansion to more domains. I would recommend sending it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes KGI-Bench, a benchmark for evaluating data integration pipelines that ingest different input data formats into an existing knowledge graph. Pipelines are assessed by analyzing the updated KG against three complementary quality metrics (coverage, correctness, consistency). The authors supply movie-domain datasets (seed KG, overlapping inputs in three formats, reference KG as ground truth) and demonstrate the benchmark by comparatively evaluating 12 pipelines while analyzing behavior across input formats and design choices.

Significance. If the metrics are rigorously defined and the evaluation results reproducible, the benchmark could address the lack of standardized methods for comparing KG integration pipelines. The provision of concrete datasets and a multi-pipeline demonstration is a positive step toward reproducibility and practical utility in data integration research.

major comments (3)

[Benchmark and metrics description] Section describing the quality metrics: the three metrics (coverage, correctness, consistency) are presented as complementary for evaluating the updated KG, but no explicit definitions, formulas, or computation procedures relative to the reference KG are supplied. This is load-bearing for the central claim that the benchmark enables determination of best pipeline choices.
[Comparative evaluation] Evaluation and demonstration section: the analysis of the 12 pipelines' behavior across input formats and design choices lacks reported quantitative results, tables of metric values, or statistical comparisons, leaving the demonstration without verifiable support for the claimed usefulness.
[Datasets and generalizability] Benchmark datasets section: the movie-domain seed KG and inputs are used to support general conclusions about pipeline selection, yet no cross-domain experiments or sensitivity analysis address whether the metrics and rankings transfer to domains with greater schema heterogeneity or noise (e.g., biomedical data).

minor comments (2)

[Abstract] The abstract mentions 'three formats' for input data but does not name them; adding this detail would improve clarity without altering the contribution.
[Metrics] Notation for the updated KG versus reference KG should be introduced consistently when first describing the metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below, indicating the changes we will make to strengthen the paper.

read point-by-point responses

Referee: [Benchmark and metrics description] Section describing the quality metrics: the three metrics (coverage, correctness, consistency) are presented as complementary for evaluating the updated KG, but no explicit definitions, formulas, or computation procedures relative to the reference KG are supplied. This is load-bearing for the central claim that the benchmark enables determination of best pipeline choices.

Authors: We appreciate this observation. The manuscript introduces the metrics in the context of the benchmark but does not provide the explicit formulas or detailed computation procedures. We will revise the section on quality metrics to include precise definitions and formulas. For instance, coverage will be defined as the fraction of reference KG elements covered by the updated KG, correctness as the accuracy of integrated facts against the reference, and consistency as the degree to which the updated KG satisfies predefined constraints, with step-by-step procedures for calculation relative to the reference KG. This will support the claim more rigorously. revision: yes
Referee: [Comparative evaluation] Evaluation and demonstration section: the analysis of the 12 pipelines' behavior across input formats and design choices lacks reported quantitative results, tables of metric values, or statistical comparisons, leaving the demonstration without verifiable support for the claimed usefulness.

Authors: We acknowledge that while the manuscript provides an analysis of the pipelines' behaviors, it would be improved by including explicit quantitative results. In the revised version, we will add tables presenting the coverage, correctness, and consistency scores for each of the 12 pipelines under different input formats. Additionally, we will include statistical summaries or comparisons where appropriate to provide verifiable support for our observations on design choices and input formats. revision: yes
Referee: [Datasets and generalizability] Benchmark datasets section: the movie-domain seed KG and inputs are used to support general conclusions about pipeline selection, yet no cross-domain experiments or sensitivity analysis address whether the metrics and rankings transfer to domains with greater schema heterogeneity or noise (e.g., biomedical data).

Authors: The demonstration uses the movie domain to provide a controlled and reproducible example with the supplied datasets. We recognize that this limits the ability to draw broad conclusions about generalizability. In the revision, we will add a discussion on the potential applicability to other domains, including considerations for schema heterogeneity and noise, and perform a basic sensitivity analysis on the existing movie data by simulating varying levels of input noise if feasible. Full cross-domain validation would require additional datasets and is planned for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark proposal and empirical evaluation stand independently

full rationale

The paper proposes KGI-Bench as a new evaluation framework for KG integration pipelines, supplies movie-domain seed KG, input data in three formats, and reference KG, then runs an empirical comparison of 12 pipelines using the three metrics coverage, correctness, and consistency. No equations, derivations, fitted parameters, or predictions appear in the provided text. The central demonstration of the benchmark's usefulness is a direct empirical measurement on the supplied datasets rather than a reduction to self-citation, self-definition, or renamed known results. Concerns about domain representativeness affect external validity but do not constitute circularity under the stated criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work introduces a benchmark without new mathematical axioms, free parameters, or invented entities; it builds on standard knowledge-graph concepts and evaluation practices.

pith-pipeline@v0.9.0 · 5670 in / 990 out tokens · 48517 ms · 2026-05-22T05:11:49.657432+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate pipelines by analyzing their output, i.e., the updated KG, with the three complementary quality metrics coverage, correctness and consistency.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KGI-Bench-Movie datasets cover films and their related persons and companies.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

Gabriel Amaral, Odinaldo Rodrigues, and Elena Simperl. 2024. ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources.Semantic Web15, 6 (2024), 2159–2192. https://doi.org/10.3233/SW- 233467

work page doi:10.3233/sw- 2024
[2]

Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Dusan Zivkovic. 2023. PG-Schema: Schemas for Prop...

work page doi:10.1145/3589778 2023
[3]

Roos M Bakker and Maaike HT de Boer. 2026. Dynamic knowledge graph evaluation: Semantic and syntactic metrics for evaluating changes.Data & Knowledge Engineering(2026), 102611

work page 2026
[4]

Meghyn Bienvenu and Camille Bourgaux. 2017. Inconsistency-Tolerant Querying of Description Logic Knowledge Bases. InReasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering, Jeff Z. Pan, Diego Calvanese, Thomas Eiter, Ian Horrocks, Michael Kifer, Fangzhen Lin, and Yuting Zhao (Eds.). Vol. 9885. Springer International Publishin...

work page doi:10.1007/978-3-319-49493-7_5 2017
[5]

Christian Bizer and Andy Seaborne. 2004. D2RQ-treating non-RDF databases as virtual RDF graphs. InProceedings of the 3rd international semantic web conference (ISWC2004), Vol. 2004. Springer Hiroshima

work page 2004
[6]

Martin Brümmer, Milan Dojchinovski, and Sebastian Hellmann. 2016. DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, H...

work page 2016
[7]

Ringwald Celian, Gandon, Fabien, Faron Catherine, Michel Franck, and Abi Akl Hanna. 2025. A systematic review of relation extraction task since the emergence of Transformers. https://doi.org/10.48550/arXiv.2511.03610 arXiv:2511.03610 [cs]

work page doi:10.48550/arxiv.2511.03610 2025
[8]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data.ACM Comput. Surv.53, 6 (2021), 127:1–127:42. https://doi.org/10.1145/ 3418896

work page 2021
[9]

Carolina Cortes, Lisa Ehrlinger, Lorena Etcheverry, and Felix Naumann. 2025. Is SHACL Suitable for Data Quality Assessment?CoRRabs/2507.22305 (2025). https://doi.org/10.48550/ARXIV.2507.22305 arXiv:2507.22305

work page doi:10.48550/arxiv.2507.22305 2025
[10]

Meiji Cui, Li Li, Zhihong Wang, and Mingyu You. 2017. A Survey on Relation Extraction. InKnowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence - Second China Conference, CCKS 2017, Chengdu, China, August 26-29, 2017, Revised Selected Papers (Communications in Computer and Information Science), Juanzi Li, Ming Zhou, Guilin Qi, Ni La...

work page doi:10.1007/978-981-10-7359-5_6 2017
[11]

Xi Deng, Volker Haarslev, and Nematollaah Shiri. 2007. Measuring Inconsisten- cies in Ontologies. InThe Semantic Web: Research and Applications, Enrico Fran- coni, Michael Kifer, and Wolfgang May (Eds.). Vol. 4519. Springer Berlin Heidel- berg, Berlin, Heidelberg, 326–340. https://doi.org/10.1007/978-3-540-72667-8_24 Series Title: Lecture Notes in Compute...

work page doi:10.1007/978-3-540-72667-8_24 2007
[12]

Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, and Cássia Trojahn. 2011. Ontology alignment evaluation initiative: six years of experience. InJournal on data semantics XV. Springer, 158–192

work page 2011
[13]

Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras, Yurong Liu, Eduardo Peña, and Eden Wu. 2025. Large Language Models for Data Discovery and Integration: Challenges and Opportunities.IEEE Data Eng. Bull49, 1 (2025), 3–31

work page 2025
[14]

Hamed Babaei Giglou, Jennifer D’Souza, Oliver Karras, and Sören Auer. 2025. OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment. https://doi.org/10.48550/arXiv.2503.21902 arXiv:2503.21902 [cs]

work page doi:10.48550/arxiv.2503.21902 2025
[15]

Nicolas Heist, Sven Hertling, and Heiko Paulheim. 2023. KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks. https://doi.org/10.48550/ arXiv.2308.10537 arXiv:2308.10537 [cs]

work page arXiv 2023
[16]

Sven Hertling and Heiko Paulheim. 2020. The Knowledge Graph Track at OAEI. The Semantic Web12123 (May 2020), 343–359. https://doi.org/10.1007/978-3- 030-49461-2_20

work page doi:10.1007/978-3- 2020
[17]

Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm

work page
[18]

https://doi.org/10.3390/INFO15080509

Construction of Knowledge Graphs: Current State and Challenges.Inf.15, 8 (2024), 509. https://doi.org/10.3390/INFO15080509

work page doi:10.3390/info15080509 2024
[19]

Marvin Hofer and Erhard Rahm. 2025. KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs.CoRRabs/2511.18364 (2025). https://doi.org/10.48550/arXiv.2511.18364

work page doi:10.48550/arxiv.2511.18364 2025
[20]

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. 2021. Knowledge graphs.ACM Computing Surveys (Csur)54, 4 (2021), 1–37

work page 2021
[21]

Elwin Huaman. 2022. Steps to Knowledge Graphs Quality Assessment. https: //doi.org/10.48550/arXiv.2208.07779 arXiv:2208.07779 [cs]. Evaluation of Pipelines for Data Integration into Knowledge Graphs

work page doi:10.48550/arxiv.2208.07779 2022
[22]

Elwin Huaman, Amar Tauqeer, and Anna Fensel. 2021. Towards Knowledge Graphs Validation through Weighted Knowledge Sources. InKnowledge Graphs and Semantic Web (KGSWC 2021) (Communications in Computer and Information Science), Vol. 1459. Springer, 45–60. https://doi.org/10.1007/978-3-030-91305-2_4

work page doi:10.1007/978-3-030-91305-2_4 2021
[23]

Pere-Lluís Huguet Cabot, Simone Tedeschi, Axel-Cyrille Ngonga Ngomo, and Roberto Navigli. 2023. RED FM: a Filtered and Multilingual Relation Extraction Dataset. InProceedings of the 61st Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers). Association for Computational Linguis- tics, Toronto, Canada, 4326–4343. https:...

work page doi:10.18653/v1/2023.acl-long.237 2023
[24]

Jan Martin Keil. [n.d.]. ABECTO: Assessing Accuracy and Completeness of RDF Knowledge Graphs. ([n. d.])

work page
[26]

Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM...

work page arXiv 2014
[27]

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodi- mos. 2021. Valentine: Evaluating Matching Techniques for Dataset Discovery. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 468–479

work page 2021
[28]

Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. [n.d.]. Inconsistency-tolerant Query Answering in Ontology-based Data Access. ([n. d.])

work page
[29]

Thomas Lukasiewicz, Enrico Malizia, Maria Vanina Martinez, Cristian Moli- naro, Andreas Pieris, and Gerardo I. Simari. 2022. Inconsistency-tolerant query answering for existential rules.Artificial Intelligence307 (June 2022), 103685. https://doi.org/10.1016/j.artint.2022.103685

work page doi:10.1016/j.artint.2022.103685 2022
[30]

Stefano Marchesin and Gianmaria Silvello. 2024. Efficient and Reliable Estimation of Knowledge Graph Accuracy.Proceedings of the VLDB Endowment17, 9 (May 2024), 2392–2403. https://doi.org/10.14778/3665844.3665865

work page doi:10.14778/3665844.3665865 2024
[31]

Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo

Jose L. Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo. 2020. Infor- mation extraction meets the Semantic Web: A survey.Semantic Web11, 2 (Feb. 2020), 255–335. https://doi.org/10.3233/SW-180333

work page doi:10.3233/sw-180333 2020
[32]

Mendes, Hannes Mühleisen, and Christian Bizer

Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. 2012. Sieve: linked data quality assessment and fusion. InProceedings of the 2012 Joint EDBT/ICDT Workshops, Berlin, Germany, March 30, 2012, Divesh Srivastava and Ismail Ari (Eds.). ACM, 116–123. https://doi.org/10.1145/2320765.2320803

work page doi:10.1145/2320765.2320803 2012
[33]

Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Jung- hanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, and Michael Martin. 2023. LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT. In First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow - AI Tomorrow 2023, Leipzig, ...

work page doi:10.1007/978-3-658-43705-3_8 2023
[34]

Nandana Mihindukulasooriya, Sanju Tiwari, Carlos F Enguix, and Kusum Lata

work page
[35]

InInternational semantic web conference

Text2kgbench: A benchmark for ontology-driven knowledge graph gener- ation from text. InInternational semantic web conference. Springer, 247–265

work page
[36]

Sedir Mohammed, Lisa Ehrlinger, Hazar Harmouch, Felix Naumann, and Divesh Srivastava. 2025. The Five Facets of Data Quality Assessment.SIGMOD Rec.54, 2 (July 2025), 18–27. https://doi.org/10.1145/3749116.3749120

work page doi:10.1145/3749116.3749120 2025
[37]

Talukdar

Prakhar Ojha and Partha P. Talukdar. 2017. KGEval: Accuracy Estimation of Automatically Constructed Knowledge Graphs. InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguis...

work page doi:10.18653/v1/d17-1183 2017
[38]

George Papadakis, Leonidas Tsekouras, Emmanouil Thanos, Nikiforos Pittaras, Giovanni Simonini, Dimitrios Skoutas, Paul Isaris, George Giannakopoulos, Themis Palpanas, and Manolis Koubarakis. 2020. JedAI3 : beyond batch, blocking- based Entity Resolution. InProceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenh...

work page doi:10.5441/002/edbt.2020.74 2020
[39]

Heiko Paulheim. 2016. Knowledge graph refinement: A survey of approaches and evaluation methods.Semantic Web8, 3 (Dec. 2016), 489–508. https://doi. org/10.3233/SW-160218

work page doi:10.3233/sw-160218 2016
[40]

Umair Qudus, Michael Röder, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. [n.d.]. Fact Checking over Knowledge Graphs – A Survey. ([n. d.])

work page
[41]

Kashif Rabbani, Matteo Lissandrini, and Katja Hose. 2022. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. InCompanion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022, Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel Médin...

work page doi:10.1145/3487553.3524253 2022
[42]

André Gomes Regino and Anderson Rossanez. 2026. A Systematic Literature Review on RDF Triple Generation from Natural Language Texts. (2026)

work page 2026
[43]

Farzad Shami, Stefano Marchesin, and Gianmaria Silvello. 2026. Benchmarking Large Language Models for Knowledge Graph Validation. InProceedings 29th International Conference on Extending Database Technology, EDBT 2026, Tampere, Finland, March 24-27, 2026. OpenProceedings.org, 551–565. https://doi.org/10. 48786/EDBT.2026.45

work page 2026
[44]

Suchanek, Serge Abiteboul, and Pierre Senellart

Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Proba- bilistic Alignment of Relations, Instances, and Schema.Proc. VLDB Endow.5, 3 (2011), 157–168. https://doi.org/10.14778/2078331.2078332

work page doi:10.14778/2078331.2078332 2011
[45]

Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs.arXiv preprint arXiv:2003.07743(2020)

work page arXiv 2020
[46]

Gyte Tamasauskaite and Paul Groth. 2023. Defining a Knowledge Graph Devel- opment Process Through a Systematic Review.ACM Trans. Softw. Eng. Methodol. 32, 1 (2023), 27:1–27:40. https://doi.org/10.1145/3522586

work page doi:10.1145/3522586 2023
[47]

Dylan Van Assche, Thomas Delva, Gerald Haesendonck, Pieter Heyvaert, Ben De Meester, and Anastasia Dimou. 2023. Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review.Journal of Web Semantics75 (Jan. 2023), 100753. https://doi.org/10.1016/j.websem.2022. 100753

work page doi:10.1016/j.websem.2022 2023
[48]

Xiangyu Wang, Lyuzhou Chen, Taiyu Ban, Muhammad Usman, Yifeng Guan, Shikang Liu, Tianhao Wu, and Huanhuan Chen. 2021. Knowledge graph quality control: A survey.Fundamental Research1, 5 (Sept. 2021), 607–626. https: //doi.org/10.1016/j.fmre.2021.09.003

work page doi:10.1016/j.fmre.2021.09.003 2021
[49]

Gerhard Weikum, Xin Luna Dong, Simon Razniewski, Fabian Suchanek, et al

work page
[50]

Machine knowledge: Creation and curation of comprehensive knowledge bases.Foundations and Trends®in Databases10, 2-4 (2021), 108–490

work page 2021
[51]

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2015. Quality assessment for Linked Data: A Survey: A sys- tematic literature review and conceptual framework.Semantic Web7, 1 (March 2015), 63–93. https://doi.org/10.3233/SW-150175 Marvin Hofer and Erhard Rahm APPENDIX Tables 4 and 5 report task-specific evaluat...

work page doi:10.3233/sw-150175 2015

[1] [1]

Gabriel Amaral, Odinaldo Rodrigues, and Elena Simperl. 2024. ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources.Semantic Web15, 6 (2024), 2159–2192. https://doi.org/10.3233/SW- 233467

work page doi:10.3233/sw- 2024

[2] [2]

Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Dusan Zivkovic. 2023. PG-Schema: Schemas for Prop...

work page doi:10.1145/3589778 2023

[3] [3]

Roos M Bakker and Maaike HT de Boer. 2026. Dynamic knowledge graph evaluation: Semantic and syntactic metrics for evaluating changes.Data & Knowledge Engineering(2026), 102611

work page 2026

[4] [4]

Meghyn Bienvenu and Camille Bourgaux. 2017. Inconsistency-Tolerant Querying of Description Logic Knowledge Bases. InReasoning Web: Logical Foundation of Knowledge Graph Construction and Query Answering, Jeff Z. Pan, Diego Calvanese, Thomas Eiter, Ian Horrocks, Michael Kifer, Fangzhen Lin, and Yuting Zhao (Eds.). Vol. 9885. Springer International Publishin...

work page doi:10.1007/978-3-319-49493-7_5 2017

[5] [5]

Christian Bizer and Andy Seaborne. 2004. D2RQ-treating non-RDF databases as virtual RDF graphs. InProceedings of the 3rd international semantic web conference (ISWC2004), Vol. 2004. Springer Hiroshima

work page 2004

[6] [6]

Martin Brümmer, Milan Dojchinovski, and Sebastian Hellmann. 2016. DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus. InProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, H...

work page 2016

[7] [7]

Ringwald Celian, Gandon, Fabien, Faron Catherine, Michel Franck, and Abi Akl Hanna. 2025. A systematic review of relation extraction task since the emergence of Transformers. https://doi.org/10.48550/arXiv.2511.03610 arXiv:2511.03610 [cs]

work page doi:10.48550/arxiv.2511.03610 2025

[8] [8]

Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data.ACM Comput. Surv.53, 6 (2021), 127:1–127:42. https://doi.org/10.1145/ 3418896

work page 2021

[9] [9]

Carolina Cortes, Lisa Ehrlinger, Lorena Etcheverry, and Felix Naumann. 2025. Is SHACL Suitable for Data Quality Assessment?CoRRabs/2507.22305 (2025). https://doi.org/10.48550/ARXIV.2507.22305 arXiv:2507.22305

work page doi:10.48550/arxiv.2507.22305 2025

[10] [10]

Meiji Cui, Li Li, Zhihong Wang, and Mingyu You. 2017. A Survey on Relation Extraction. InKnowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence - Second China Conference, CCKS 2017, Chengdu, China, August 26-29, 2017, Revised Selected Papers (Communications in Computer and Information Science), Juanzi Li, Ming Zhou, Guilin Qi, Ni La...

work page doi:10.1007/978-981-10-7359-5_6 2017

[11] [11]

Xi Deng, Volker Haarslev, and Nematollaah Shiri. 2007. Measuring Inconsisten- cies in Ontologies. InThe Semantic Web: Research and Applications, Enrico Fran- coni, Michael Kifer, and Wolfgang May (Eds.). Vol. 4519. Springer Berlin Heidel- berg, Berlin, Heidelberg, 326–340. https://doi.org/10.1007/978-3-540-72667-8_24 Series Title: Lecture Notes in Compute...

work page doi:10.1007/978-3-540-72667-8_24 2007

[12] [12]

Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, and Cássia Trojahn. 2011. Ontology alignment evaluation initiative: six years of experience. InJournal on data semantics XV. Springer, 158–192

work page 2011

[13] [13]

Juliana Freire, Grace Fan, Benjamin Feuer, Christos Koutras, Yurong Liu, Eduardo Peña, and Eden Wu. 2025. Large Language Models for Data Discovery and Integration: Challenges and Opportunities.IEEE Data Eng. Bull49, 1 (2025), 3–31

work page 2025

[14] [14]

Hamed Babaei Giglou, Jennifer D’Souza, Oliver Karras, and Sören Auer. 2025. OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment. https://doi.org/10.48550/arXiv.2503.21902 arXiv:2503.21902 [cs]

work page doi:10.48550/arxiv.2503.21902 2025

[15] [15]

Nicolas Heist, Sven Hertling, and Heiko Paulheim. 2023. KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks. https://doi.org/10.48550/ arXiv.2308.10537 arXiv:2308.10537 [cs]

work page arXiv 2023

[16] [16]

Sven Hertling and Heiko Paulheim. 2020. The Knowledge Graph Track at OAEI. The Semantic Web12123 (May 2020), 343–359. https://doi.org/10.1007/978-3- 030-49461-2_20

work page doi:10.1007/978-3- 2020

[17] [17]

Marvin Hofer, Daniel Obraczka, Alieh Saeedi, Hanna Köpcke, and Erhard Rahm

work page

[18] [18]

https://doi.org/10.3390/INFO15080509

Construction of Knowledge Graphs: Current State and Challenges.Inf.15, 8 (2024), 509. https://doi.org/10.3390/INFO15080509

work page doi:10.3390/info15080509 2024

[19] [19]

Marvin Hofer and Erhard Rahm. 2025. KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs.CoRRabs/2511.18364 (2025). https://doi.org/10.48550/arXiv.2511.18364

work page doi:10.48550/arxiv.2511.18364 2025

[20] [20]

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. 2021. Knowledge graphs.ACM Computing Surveys (Csur)54, 4 (2021), 1–37

work page 2021

[21] [21]

Elwin Huaman. 2022. Steps to Knowledge Graphs Quality Assessment. https: //doi.org/10.48550/arXiv.2208.07779 arXiv:2208.07779 [cs]. Evaluation of Pipelines for Data Integration into Knowledge Graphs

work page doi:10.48550/arxiv.2208.07779 2022

[22] [22]

Elwin Huaman, Amar Tauqeer, and Anna Fensel. 2021. Towards Knowledge Graphs Validation through Weighted Knowledge Sources. InKnowledge Graphs and Semantic Web (KGSWC 2021) (Communications in Computer and Information Science), Vol. 1459. Springer, 45–60. https://doi.org/10.1007/978-3-030-91305-2_4

work page doi:10.1007/978-3-030-91305-2_4 2021

[23] [23]

Pere-Lluís Huguet Cabot, Simone Tedeschi, Axel-Cyrille Ngonga Ngomo, and Roberto Navigli. 2023. RED FM: a Filtered and Multilingual Relation Extraction Dataset. InProceedings of the 61st Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers). Association for Computational Linguis- tics, Toronto, Canada, 4326–4343. https:...

work page doi:10.18653/v1/2023.acl-long.237 2023

[24] [24]

Jan Martin Keil. [n.d.]. ABECTO: Assessing Accuracy and Completeness of RDF Knowledge Graphs. ([n. d.])

work page

[25] [26]

Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014, Chin-Wan Chung, Andrei Z. Broder, Kyuseok Shim, and Torsten Suel (Eds.). ACM...

work page arXiv 2014

[26] [27]

Christos Koutras, George Siachamis, Andra Ionescu, Kyriakos Psarakis, Jerry Brons, Marios Fragkoulis, Christoph Lofi, Angela Bonifati, and Asterios Katsifodi- mos. 2021. Valentine: Evaluating Matching Techniques for Dataset Discovery. In2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 468–479

work page 2021

[27] [28]

Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. [n.d.]. Inconsistency-tolerant Query Answering in Ontology-based Data Access. ([n. d.])

work page

[28] [29]

Thomas Lukasiewicz, Enrico Malizia, Maria Vanina Martinez, Cristian Moli- naro, Andreas Pieris, and Gerardo I. Simari. 2022. Inconsistency-tolerant query answering for existential rules.Artificial Intelligence307 (June 2022), 103685. https://doi.org/10.1016/j.artint.2022.103685

work page doi:10.1016/j.artint.2022.103685 2022

[29] [30]

Stefano Marchesin and Gianmaria Silvello. 2024. Efficient and Reliable Estimation of Knowledge Graph Accuracy.Proceedings of the VLDB Endowment17, 9 (May 2024), 2392–2403. https://doi.org/10.14778/3665844.3665865

work page doi:10.14778/3665844.3665865 2024

[30] [31]

Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo

Jose L. Martinez-Rodriguez, Aidan Hogan, and Ivan Lopez-Arevalo. 2020. Infor- mation extraction meets the Semantic Web: A survey.Semantic Web11, 2 (Feb. 2020), 255–335. https://doi.org/10.3233/SW-180333

work page doi:10.3233/sw-180333 2020

[31] [32]

Mendes, Hannes Mühleisen, and Christian Bizer

Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. 2012. Sieve: linked data quality assessment and fusion. InProceedings of the 2012 Joint EDBT/ICDT Workshops, Berlin, Germany, March 30, 2012, Divesh Srivastava and Ismail Ari (Eds.). ACM, 116–123. https://doi.org/10.1145/2320765.2320803

work page doi:10.1145/2320765.2320803 2012

[32] [33]

Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Jung- hanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, and Michael Martin. 2023. LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT. In First Working Conference on Artificial Intelligence Development for a Resilient and Sustainable Tomorrow - AI Tomorrow 2023, Leipzig, ...

work page doi:10.1007/978-3-658-43705-3_8 2023

[33] [34]

Nandana Mihindukulasooriya, Sanju Tiwari, Carlos F Enguix, and Kusum Lata

work page

[34] [35]

InInternational semantic web conference

Text2kgbench: A benchmark for ontology-driven knowledge graph gener- ation from text. InInternational semantic web conference. Springer, 247–265

work page

[35] [36]

Sedir Mohammed, Lisa Ehrlinger, Hazar Harmouch, Felix Naumann, and Divesh Srivastava. 2025. The Five Facets of Data Quality Assessment.SIGMOD Rec.54, 2 (July 2025), 18–27. https://doi.org/10.1145/3749116.3749120

work page doi:10.1145/3749116.3749120 2025

[36] [37]

Talukdar

Prakhar Ojha and Partha P. Talukdar. 2017. KGEval: Accuracy Estimation of Automatically Constructed Knowledge Graphs. InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguis...

work page doi:10.18653/v1/d17-1183 2017

[37] [38]

George Papadakis, Leonidas Tsekouras, Emmanouil Thanos, Nikiforos Pittaras, Giovanni Simonini, Dimitrios Skoutas, Paul Isaris, George Giannakopoulos, Themis Palpanas, and Manolis Koubarakis. 2020. JedAI3 : beyond batch, blocking- based Entity Resolution. InProceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenh...

work page doi:10.5441/002/edbt.2020.74 2020

[38] [39]

Heiko Paulheim. 2016. Knowledge graph refinement: A survey of approaches and evaluation methods.Semantic Web8, 3 (Dec. 2016), 489–508. https://doi. org/10.3233/SW-160218

work page doi:10.3233/sw-160218 2016

[39] [40]

Umair Qudus, Michael Röder, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. [n.d.]. Fact Checking over Knowledge Graphs – A Survey. ([n. d.])

work page

[40] [41]

Kashif Rabbani, Matteo Lissandrini, and Katja Hose. 2022. SHACL and ShEx in the Wild: A Community Survey on Validating Shapes Generation and Adoption. InCompanion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022, Frédérique Laforest, Raphaël Troncy, Elena Simperl, Deepak Agarwal, Aristides Gionis, Ivan Herman, and Lionel Médin...

work page doi:10.1145/3487553.3524253 2022

[41] [42]

André Gomes Regino and Anderson Rossanez. 2026. A Systematic Literature Review on RDF Triple Generation from Natural Language Texts. (2026)

work page 2026

[42] [43]

Farzad Shami, Stefano Marchesin, and Gianmaria Silvello. 2026. Benchmarking Large Language Models for Knowledge Graph Validation. InProceedings 29th International Conference on Extending Database Technology, EDBT 2026, Tampere, Finland, March 24-27, 2026. OpenProceedings.org, 551–565. https://doi.org/10. 48786/EDBT.2026.45

work page 2026

[43] [44]

Suchanek, Serge Abiteboul, and Pierre Senellart

Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Proba- bilistic Alignment of Relations, Instances, and Schema.Proc. VLDB Endow.5, 3 (2011), 157–168. https://doi.org/10.14778/2078331.2078332

work page doi:10.14778/2078331.2078332 2011

[44] [45]

Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs.arXiv preprint arXiv:2003.07743(2020)

work page arXiv 2020

[45] [46]

Gyte Tamasauskaite and Paul Groth. 2023. Defining a Knowledge Graph Devel- opment Process Through a Systematic Review.ACM Trans. Softw. Eng. Methodol. 32, 1 (2023), 27:1–27:40. https://doi.org/10.1145/3522586

work page doi:10.1145/3522586 2023

[46] [47]

Dylan Van Assche, Thomas Delva, Gerald Haesendonck, Pieter Heyvaert, Ben De Meester, and Anastasia Dimou. 2023. Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review.Journal of Web Semantics75 (Jan. 2023), 100753. https://doi.org/10.1016/j.websem.2022. 100753

work page doi:10.1016/j.websem.2022 2023

[47] [48]

Xiangyu Wang, Lyuzhou Chen, Taiyu Ban, Muhammad Usman, Yifeng Guan, Shikang Liu, Tianhao Wu, and Huanhuan Chen. 2021. Knowledge graph quality control: A survey.Fundamental Research1, 5 (Sept. 2021), 607–626. https: //doi.org/10.1016/j.fmre.2021.09.003

work page doi:10.1016/j.fmre.2021.09.003 2021

[48] [49]

Gerhard Weikum, Xin Luna Dong, Simon Razniewski, Fabian Suchanek, et al

work page

[49] [50]

Machine knowledge: Creation and curation of comprehensive knowledge bases.Foundations and Trends®in Databases10, 2-4 (2021), 108–490

work page 2021

[50] [51]

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2015. Quality assessment for Linked Data: A Survey: A sys- tematic literature review and conceptual framework.Semantic Web7, 1 (March 2015), 63–93. https://doi.org/10.3233/SW-150175 Marvin Hofer and Erhard Rahm APPENDIX Tables 4 and 5 report task-specific evaluat...

work page doi:10.3233/sw-150175 2015