BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation
Pith reviewed 2026-05-20 06:19 UTC · model grok-4.3
The pith
LLMs offer promising but limited help in constructing knowledge graphs from diverse data sources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BLINKG is a benchmark for assessing the mapping capabilities of LLMs in KG construction from heterogeneous data sources, consisting of scenarios with increasing complexity based on real-world use cases. Experiments show that state-of-the-art LLMs already provide promising solutions, but their performance is limited in complex scenarios. The paper defines a set of requirements for achieving semi-automated LLM-driven KG construction.
What carries the argument
The BLINKG benchmark consisting of scenarios with increasing complexity based on real-world use cases, which evaluates LLMs' ability to establish correspondences between data schemas and ontology concepts.
Load-bearing premise
The benchmark's scenarios with increasing complexity adequately represent the range of real-world mapping difficulties that knowledge engineers face.
What would settle it
An experiment where a new or improved LLM achieves consistently high accuracy across the most complex scenarios in the benchmark without special prompting or training would challenge the finding of limited performance.
read the original abstract
Generating Knowledge Graphs (KGs) remains one of the most time-consuming and labor-intensive tasks for knowledge engineers, as they need to identify semantic equivalences between input data sources and ontology terms. While declarative solutions (e.g., RML, SPARQL-Anything) have helped to generalize this process, aligning input schema elements with ontology terms still involves intricate transformations and requires considerable manual effort. With the advent of Large Language Models (LLMs), there is growing interest in leveraging their capabilities to assist KG engineers. Although some studies have explored using LLMs to automate KG construction, there is still no standardized framework for assessing how effectively they establish correspondences between data schemes and ontology concepts. Therefore, in this paper, we propose BLINKG, a benchmark designed to evaluate the mapping capabilities of LLMs in constructing KGs from heterogeneous data sources. The benchmark includes a set of scenarios with increasing complexity, based on real-world use cases. We conduct an extensive experimental evaluation of several stateof-the-art LLMs using BLINK and observe that they already offer promising solutions. However, their performance remains limited in complex scenarios. Thanks to this benchmark, we can already assess the current capabilities of LLMs for KG construction. Additionally, we define a set of requirements for achieving (semi)automated (LLM-driven) KG construction, opening new research lines in this area.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes BLINKG, a benchmark for evaluating the mapping capabilities of LLMs when constructing knowledge graphs from heterogeneous data sources. It consists of scenarios with increasing complexity drawn from real-world use cases. The authors perform an experimental evaluation of several state-of-the-art LLMs on the benchmark, report that the models offer promising solutions but remain limited in complex scenarios, and define a set of requirements for achieving semi-automated LLM-driven KG construction.
Significance. If the benchmark scenarios are shown to be representative, the work supplies a much-needed standardized framework for assessing LLM performance on the data-to-ontology mapping step that remains a major bottleneck in KG engineering. The initial evaluation of multiple LLMs and the explicit list of requirements for future automation constitute concrete contributions that can guide subsequent research.
major comments (1)
- [Benchmark description (as referenced in the abstract and methods)] The central claim that BLINKG provides a useful evaluation of LLM mapping capabilities rests on the assertion that its scenarios adequately span real-world mapping difficulties. The manuscript states that the scenarios are 'based on real-world use cases' but supplies no account of selection criteria, practitioner review, or coverage of documented pain points such as ambiguous term alignment, multi-source heterogeneity, or complex transformations. This omission is load-bearing for the benchmark's claimed utility and must be addressed with an explicit methodology or validation step.
minor comments (1)
- [Abstract] Abstract contains the typo 'stateof-the-art' (should be 'state-of-the-art') and the apparent typo 'using BLINK and observe' (should be 'using BLINKG and observe').
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the potential value of BLINKG as a standardized framework for evaluating LLM performance on data-to-ontology mapping. We address the major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: The central claim that BLINKG provides a useful evaluation of LLM mapping capabilities rests on the assertion that its scenarios adequately span real-world mapping difficulties. The manuscript states that the scenarios are 'based on real-world use cases' but supplies no account of selection criteria, practitioner review, or coverage of documented pain points such as ambiguous term alignment, multi-source heterogeneity, or complex transformations. This omission is load-bearing for the benchmark's claimed utility and must be addressed with an explicit methodology or validation step.
Authors: We agree that the current description of scenario selection is insufficient to fully substantiate the representativeness claim. In the revised manuscript we will add an explicit subsection in the Benchmark section that details the curation methodology. This will describe the selection criteria drawn from documented real-world KG construction projects, how the scenarios were chosen to cover pain points including ambiguous term alignment, multi-source heterogeneity, and complex transformations, and any practitioner review or validation steps used during design. These additions will make the benchmark's grounding in real-world difficulties transparent. revision: yes
Circularity Check
No circularity: benchmark proposal and experimental evaluation are self-contained
full rationale
The paper proposes BLINKG as an external benchmark with scenarios drawn from real-world use cases and performs direct experimental evaluation of LLMs against it. No equations, fitted parameters, or derivations are present that reduce any claimed result to quantities defined inside the paper. Self-citations are not load-bearing for the central claim, and the evaluation outcomes are not forced by construction from the benchmark definition itself. The work is therefore independent of the circularity patterns listed.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Real-world use cases can be represented as scenarios of increasing complexity for benchmarking purposes.
Reference graph
Works this paper leans on
- [1]
- [2]
- [3]
-
[4]
On Time versus Space and Related Problems , booktitle =. 1975 , crossref =. doi:10.1109/SFCS.1975.23 , timestamp =
-
[5]
16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975 , publisher =. 1975 , timestamp =
work page 1975
-
[6]
doi:10.5281/zenodo.7919848 , url =
Pano, Maria and Dimou, Anastasia and De Meester, Ben and Iglesias-Molina, Ana and Chaves-Fraga, David and Van Assche, Dylan , year =. doi:10.5281/zenodo.7919848 , url =
-
[7]
Chaves-Fraga, David and Priyatna, Freddy and Cimmino, Andrea and Toledo, Jhon and Ruckhaus, Edna and Corcho, Oscar , journal=. 2020 , doi=
work page 2020
-
[8]
Rebboud, Youssra and Tailhardat, Lionel and Lisena, Pasquale and Troncy, Raphael , booktitle=. 2024 , organization=
work page 2024
-
[9]
European Semantic Web Conference , pages=
Zhang, Bohui and Carriero, Valentina Anita and Schreiberhuber, Katrin and Tsaneva, Stefani and Gonz. European Semantic Web Conference , pages=. 2024 , organization=
work page 2024
-
[10]
Perevalov, Aleksandr and Gashkov, Aleksandr and Eltsova, Maria and Both, Andreas , booktitle=. 2024 , organization=
work page 2024
-
[11]
Proceedings of the ISWC'24, Special Session on LLMs , year =
Daniel Garijo and María Poveda-Villalón and Elvira Amador-Domínguez and ZiYuan Wang and Raúl García-Castro and Oscar Corcho , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =
-
[12]
Proceedings of the ISWC'24, Special Session on LLMs , year =
Stefani Tsaneva and Guntur Budi Herwanto and Marta Sabou , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =
-
[13]
Proceedings of the ISWC'24, Special Session on LLMs , year =
Youssra Rebboud and Pasquale Lisena and Lionel Tailhardat and Raphael Troncy , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =
-
[14]
Proceedings of the ISWC'24, Special Session on LLMs , year =
Reham Alharbi and Jacopo de Berardinis and Floriana Grasso and Terry Payne and Valentina Tamma , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =
-
[15]
Systematic construction of knowledge graphs for research-performing organizations , author=. Information , volume=. 2022 , publisher=
work page 2022
-
[16]
International Semantic Web Conference , pages=
Leveraging semantic technologies for digital interoperability in the European railway domain , author=. International Semantic Web Conference , pages=. 2021 , organization=
work page 2021
-
[17]
Engineering Applications of Artificial Intelligence , volume=
LOT: An industrial oriented ontology engineering framework , author=. Engineering Applications of Artificial Intelligence , volume=. 2022 , publisher=
work page 2022
-
[18]
Domain-specific knowledge graph construction , author=. 2019 , publisher=
work page 2019
-
[19]
Journal on data semantics X , pages=
Linking data to ontologies , author=. Journal on data semantics X , pages=. 2008 , organization=
work page 2008
-
[20]
International Semantic Web Conference , pages=
Iglesias-Molina, Ana and Van Assche, Dylan and Arenas-Guerrero, Juli. International Semantic Web Conference , pages=. 2023 , organization=
work page 2023
-
[21]
Das, Souripriya and Sundara, Seema and Cyganiak, Richard , institution =. 2012 , month = sep, type =
work page 2012
-
[22]
Declarative description of knowledge graphs construction automation: Status & challenges , author=. Proceedings of the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) co-located with 19th Extended Semantic Web Conference (ESWC 2022) , volume=
work page 2022
-
[23]
Jim. The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31--June 4, 2020, Proceedings 17 , pages=. 2020 , organization=
work page 2020
-
[24]
20th International Conference on Semantic Systems (SEMANTiCS 2024) , year=
Populating CSV Files from Unstructured Text with LLMs for KG Generation with RML , author=. 20th International Conference on Semantic Systems (SEMANTiCS 2024) , year=
work page 2024
-
[25]
A direct mapping of relational data to RDF , author=. W3C recommendation , volume=
-
[26]
Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023
Frey, Johannes and Meyer, Lars-Peter and Brei, Felix and Gr \"u nder-Fahrer, Sabine and Martin, Michael. Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023. ESWC 2024 Satellite Events, Hersonissos, Crete, Greece, May 26 - 30, 2024, Proceedings. 2024
work page 2023
-
[27]
Mapping by Example: Towards an RML Mapping Reverse Engineering Pipeline
Freund, Michael and Dorsch, Rene and Schmid, Sebastian and Harth, Andreas. Mapping by Example: Towards an RML Mapping Reverse Engineering Pipeline. KGCW'25: 6th International Workshop on Knowledge Graph Construction, June 1, 2025, Portoroz, SLO. 2025
work page 2025
-
[28]
AI in Civil Engineering , volume=
Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models , author=. AI in Civil Engineering , volume=. 2025 , publisher=
work page 2025
-
[29]
International semantic web conference , pages=
Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text , author=. International semantic web conference , pages=. 2023 , organization=
work page 2023
- [30]
-
[31]
Information Processing & Management , volume=
Ontogenix: Leveraging large language models for enhanced ontology engineering from datasets , author=. Information Processing & Management , volume=. 2025 , publisher=
work page 2025
-
[32]
CEUR Workshop Proceedings , pages=
Torchictab: Semantic table annotation with wikidata and language models , author=. CEUR Workshop Proceedings , pages=. 2023 , organization=
work page 2023
-
[33]
Mtab4d: Semantic annotation of tabular data with dbpedia , author=. Semantic Web , volume=. 2024 , publisher=
work page 2024
-
[34]
The 20th International Semantic Web Conference (ISWC 2021) , volume=
DAGOBAH: table and graph contexts for efficient semantic annotation of tabular data , author=. The 20th International Semantic Web Conference (ISWC 2021) , volume=
work page 2021
-
[35]
European Knowledge Acquisition Workshop , pages=
Sicilia,. European Knowledge Acquisition Workshop , pages=. 2016 , organization=
work page 2016
-
[36]
RODI: Benchmarking relational-to-ontology mapping generation quality , author=. Semantic Web , volume=. 2017 , publisher=
work page 2017
-
[37]
Ontop: Answering SPARQL queries over relational databases , author=. Semantic Web , volume=. 2016 , publisher=
work page 2016
-
[38]
Bizer, Christian and Seaborne, Andy , booktitle=. 2004 , organization=
work page 2004
-
[39]
de Medeiros, Luciano Frontino and Priyatna, Freddy and Corcho, Oscar , booktitle=. 2015 , organization=
work page 2015
-
[40]
Pugnaloni, Francesco and Zecchini, Luca and Paganelli, Matteo and Lissandrini, Matteo and Naumann, Felix and Simonini, Giovanni , journal=. 2025 , publisher=
work page 2025
-
[41]
Chaves-Fraga, David and Endris, Kemele M and Iglesias, Enrique and Corcho, Oscar and Vidal, Maria-Esther , booktitle=. 2019 , organization=
work page 2019
-
[42]
Journal of Web Semantics , volume=
Scaling up knowledge graph creation to large and heterogeneous data sources , author=. Journal of Web Semantics , volume=. 2023 , publisher=
work page 2023
-
[43]
Do Large Language Models Know What They Don ' t Know?
Yin, Zhangyue and Sun, Qiushi and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Huang, Xuanjing. Do Large Language Models Know What They Don ' t Know?. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.551
-
[44]
Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines , author=. Semantic Web , volume=. 2025 , publisher=
work page 2025
-
[45]
International Semantic Web Conference , pages=
BootOX: Practical mapping of RDBs to OWL 2 , author=. International Semantic Web Conference , pages=. 2015 , organization=
work page 2015
- [46]
-
[47]
European Semantic Web Conference , pages=
12 shades of RDF: impact of syntaxes on data extraction with language models , author=. European Semantic Web Conference , pages=. 2024 , organization=
work page 2024
-
[48]
Conformance test cases for the
Heyvaert, Pieter and Chaves-Fraga, David and Priyatna, Freddy and Corcho, Oscar and Mannens, Erik and Verborgh, Ruben and Dimou, Anastasia , booktitle=. Conformance test cases for the. 2019 , organization=
work page 2019
-
[49]
Automation in Construction , volume=
Guidelines for Linked Data generation and publication: An example in building energy consumption , author=. Automation in Construction , volume=. 2015 , publisher=
work page 2015
-
[50]
An ontological approach for representing declarative mapping languages , author=. Semantic Web , volume=. 2024 , publisher=
work page 2024
-
[51]
Heyvaert, Pieter and De Meester, Ben and Dimou, Anastasia and Verborgh, Ruben , booktitle=. 2018 , organization=
work page 2018
-
[52]
Randles, Alex and O’Sullivan, Declan , booktitle=
-
[53]
Towards self-configuring Knowledge Graph Construction Pipelines using LLMs - A Case Study with RML , author=. Proceedings of the 5th International Workshop on Knowledge Graph Construction (KGCW 2024) co-located with 21st Extended Semantic Web Conference (ESWC 2024) , volume=
work page 2024
-
[54]
European Semantic Web Conference , pages=
Schmidt, Wilma Johanna and Grangel-Gonz. European Semantic Web Conference , pages=. 2025 , organization=
work page 2025
-
[55]
Ontology alignment: bridging the semantic gap , author=. 2006 , publisher=
work page 2006
-
[56]
Proceedings of the ISWC'24, Special Session on LLMs , year =
Guntur Budi Herwanto and Stefani Tsaneva and Marta Sabou , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =
-
[57]
Fathallah, Nadeen and Das, Arunav and Giorgis, Stefano De and Poltronieri, Andrea and Haase, Peter and Kovriguina, Liubov , booktitle=. 2024 , organization=
work page 2024
-
[58]
OpenAI , year=. 2410.21276 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[59]
OpenAI , year=. 2412.16720 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
- [60]
-
[61]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI , year=. DeepSeek-. 2501.12948 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Gemini: A Family of Highly Capable Multimodal Models , author=. 2024 , eprint=
work page 2024
- [63]
- [64]
-
[65]
Dylan. Declarative. Journal of Web Semantics , volume =. 2023 , issn =. doi:https://doi.org/10.1016/j.websem.2022.100753 , url =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.