pith. sign in

arxiv: 2605.19518 · v1 · pith:PKNXYTGGnew · submitted 2026-05-19 · 💻 cs.AI

BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation

Pith reviewed 2026-05-20 06:19 UTC · model grok-4.3

classification 💻 cs.AI
keywords knowledge graph constructionlarge language modelsbenchmarkdata mappingontology alignmentheterogeneous data sources
0
0 comments X

The pith

LLMs offer promising but limited help in constructing knowledge graphs from diverse data sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents BLINKG, a benchmark to measure how effectively large language models can identify matches between data source elements and terms in an ontology when building knowledge graphs. The benchmark features multiple scenarios of growing difficulty drawn from actual applications. Tests on leading models indicate they perform adequately on basic mappings yet encounter difficulties with intricate transformations. The work also specifies conditions required to reach partly automated knowledge graph building with these models. Readers might value this because manual alignment work is a major bottleneck in creating usable knowledge graphs for data analysis and integration.

Core claim

BLINKG is a benchmark for assessing the mapping capabilities of LLMs in KG construction from heterogeneous data sources, consisting of scenarios with increasing complexity based on real-world use cases. Experiments show that state-of-the-art LLMs already provide promising solutions, but their performance is limited in complex scenarios. The paper defines a set of requirements for achieving semi-automated LLM-driven KG construction.

What carries the argument

The BLINKG benchmark consisting of scenarios with increasing complexity based on real-world use cases, which evaluates LLMs' ability to establish correspondences between data schemas and ontology concepts.

Load-bearing premise

The benchmark's scenarios with increasing complexity adequately represent the range of real-world mapping difficulties that knowledge engineers face.

What would settle it

An experiment where a new or improved LLM achieves consistently high accuracy across the most complex scenarios in the benchmark without special prompting or training would challenge the finding of limited performance.

read the original abstract

Generating Knowledge Graphs (KGs) remains one of the most time-consuming and labor-intensive tasks for knowledge engineers, as they need to identify semantic equivalences between input data sources and ontology terms. While declarative solutions (e.g., RML, SPARQL-Anything) have helped to generalize this process, aligning input schema elements with ontology terms still involves intricate transformations and requires considerable manual effort. With the advent of Large Language Models (LLMs), there is growing interest in leveraging their capabilities to assist KG engineers. Although some studies have explored using LLMs to automate KG construction, there is still no standardized framework for assessing how effectively they establish correspondences between data schemes and ontology concepts. Therefore, in this paper, we propose BLINKG, a benchmark designed to evaluate the mapping capabilities of LLMs in constructing KGs from heterogeneous data sources. The benchmark includes a set of scenarios with increasing complexity, based on real-world use cases. We conduct an extensive experimental evaluation of several stateof-the-art LLMs using BLINK and observe that they already offer promising solutions. However, their performance remains limited in complex scenarios. Thanks to this benchmark, we can already assess the current capabilities of LLMs for KG construction. Additionally, we define a set of requirements for achieving (semi)automated (LLM-driven) KG construction, opening new research lines in this area.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes BLINKG, a benchmark for evaluating the mapping capabilities of LLMs when constructing knowledge graphs from heterogeneous data sources. It consists of scenarios with increasing complexity drawn from real-world use cases. The authors perform an experimental evaluation of several state-of-the-art LLMs on the benchmark, report that the models offer promising solutions but remain limited in complex scenarios, and define a set of requirements for achieving semi-automated LLM-driven KG construction.

Significance. If the benchmark scenarios are shown to be representative, the work supplies a much-needed standardized framework for assessing LLM performance on the data-to-ontology mapping step that remains a major bottleneck in KG engineering. The initial evaluation of multiple LLMs and the explicit list of requirements for future automation constitute concrete contributions that can guide subsequent research.

major comments (1)
  1. [Benchmark description (as referenced in the abstract and methods)] The central claim that BLINKG provides a useful evaluation of LLM mapping capabilities rests on the assertion that its scenarios adequately span real-world mapping difficulties. The manuscript states that the scenarios are 'based on real-world use cases' but supplies no account of selection criteria, practitioner review, or coverage of documented pain points such as ambiguous term alignment, multi-source heterogeneity, or complex transformations. This omission is load-bearing for the benchmark's claimed utility and must be addressed with an explicit methodology or validation step.
minor comments (1)
  1. [Abstract] Abstract contains the typo 'stateof-the-art' (should be 'state-of-the-art') and the apparent typo 'using BLINK and observe' (should be 'using BLINKG and observe').

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential value of BLINKG as a standardized framework for evaluating LLM performance on data-to-ontology mapping. We address the major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: The central claim that BLINKG provides a useful evaluation of LLM mapping capabilities rests on the assertion that its scenarios adequately span real-world mapping difficulties. The manuscript states that the scenarios are 'based on real-world use cases' but supplies no account of selection criteria, practitioner review, or coverage of documented pain points such as ambiguous term alignment, multi-source heterogeneity, or complex transformations. This omission is load-bearing for the benchmark's claimed utility and must be addressed with an explicit methodology or validation step.

    Authors: We agree that the current description of scenario selection is insufficient to fully substantiate the representativeness claim. In the revised manuscript we will add an explicit subsection in the Benchmark section that details the curation methodology. This will describe the selection criteria drawn from documented real-world KG construction projects, how the scenarios were chosen to cover pain points including ambiguous term alignment, multi-source heterogeneity, and complex transformations, and any practitioner review or validation steps used during design. These additions will make the benchmark's grounding in real-world difficulties transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark proposal and experimental evaluation are self-contained

full rationale

The paper proposes BLINKG as an external benchmark with scenarios drawn from real-world use cases and performs direct experimental evaluation of LLMs against it. No equations, fitted parameters, or derivations are present that reduce any claimed result to quantities defined inside the paper. Self-citations are not load-bearing for the central claim, and the evaluation outcomes are not forced by construction from the benchmark definition itself. The work is therefore independent of the circularity patterns listed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that real-world KG construction tasks can be decomposed into schema-to-ontology mapping problems that LLMs can be tested on via discrete scenarios. No free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Real-world use cases can be represented as scenarios of increasing complexity for benchmarking purposes.
    Invoked when the authors state that scenarios are based on real-world use cases.

pith-pipeline@v0.9.0 · 5791 in / 1244 out tokens · 35777 ms · 2026-05-20T06:19:23.826880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 3 internal anchors

  1. [1]

    Knuth , title =

    Donald E. Knuth , title =. Commun. 1974 , doi =

  2. [2]

    Dijkstra , title =

    Edsger W. Dijkstra , title =. Commun. 1968 , doi =

  3. [3]

    1993 , isbn =

    Jim Gray and Andreas Reuter , title =. 1993 , isbn =

  4. [4]

    1975 , crossref =

    On Time versus Space and Related Problems , booktitle =. 1975 , crossref =. doi:10.1109/SFCS.1975.23 , timestamp =

  5. [5]

    1975 , timestamp =

    16th Annual Symposium on Foundations of Computer Science, Berkeley, California, USA, October 13-15, 1975 , publisher =. 1975 , timestamp =

  6. [6]

    doi:10.5281/zenodo.7919848 , url =

    Pano, Maria and Dimou, Anastasia and De Meester, Ben and Iglesias-Molina, Ana and Chaves-Fraga, David and Van Assche, Dylan , year =. doi:10.5281/zenodo.7919848 , url =

  7. [7]

    2020 , doi=

    Chaves-Fraga, David and Priyatna, Freddy and Cimmino, Andrea and Toledo, Jhon and Ruckhaus, Edna and Corcho, Oscar , journal=. 2020 , doi=

  8. [8]

    2024 , organization=

    Rebboud, Youssra and Tailhardat, Lionel and Lisena, Pasquale and Troncy, Raphael , booktitle=. 2024 , organization=

  9. [9]

    European Semantic Web Conference , pages=

    Zhang, Bohui and Carriero, Valentina Anita and Schreiberhuber, Katrin and Tsaneva, Stefani and Gonz. European Semantic Web Conference , pages=. 2024 , organization=

  10. [10]

    2024 , organization=

    Perevalov, Aleksandr and Gashkov, Aleksandr and Eltsova, Maria and Both, Andreas , booktitle=. 2024 , organization=

  11. [11]

    Proceedings of the ISWC'24, Special Session on LLMs , year =

    Daniel Garijo and María Poveda-Villalón and Elvira Amador-Domínguez and ZiYuan Wang and Raúl García-Castro and Oscar Corcho , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =

  12. [12]

    Proceedings of the ISWC'24, Special Session on LLMs , year =

    Stefani Tsaneva and Guntur Budi Herwanto and Marta Sabou , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =

  13. [13]

    Proceedings of the ISWC'24, Special Session on LLMs , year =

    Youssra Rebboud and Pasquale Lisena and Lionel Tailhardat and Raphael Troncy , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =

  14. [14]

    Proceedings of the ISWC'24, Special Session on LLMs , year =

    Reham Alharbi and Jacopo de Berardinis and Floriana Grasso and Terry Payne and Valentina Tamma , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =

  15. [15]

    Information , volume=

    Systematic construction of knowledge graphs for research-performing organizations , author=. Information , volume=. 2022 , publisher=

  16. [16]

    International Semantic Web Conference , pages=

    Leveraging semantic technologies for digital interoperability in the European railway domain , author=. International Semantic Web Conference , pages=. 2021 , organization=

  17. [17]

    Engineering Applications of Artificial Intelligence , volume=

    LOT: An industrial oriented ontology engineering framework , author=. Engineering Applications of Artificial Intelligence , volume=. 2022 , publisher=

  18. [18]

    2019 , publisher=

    Domain-specific knowledge graph construction , author=. 2019 , publisher=

  19. [19]

    Journal on data semantics X , pages=

    Linking data to ontologies , author=. Journal on data semantics X , pages=. 2008 , organization=

  20. [20]

    International Semantic Web Conference , pages=

    Iglesias-Molina, Ana and Van Assche, Dylan and Arenas-Guerrero, Juli. International Semantic Web Conference , pages=. 2023 , organization=

  21. [21]

    2012 , month = sep, type =

    Das, Souripriya and Sundara, Seema and Cyganiak, Richard , institution =. 2012 , month = sep, type =

  22. [22]

    Proceedings of the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) co-located with 19th Extended Semantic Web Conference (ESWC 2022) , volume=

    Declarative description of knowledge graphs construction automation: Status & challenges , author=. Proceedings of the 3rd International Workshop on Knowledge Graph Construction (KGCW 2022) co-located with 19th Extended Semantic Web Conference (ESWC 2022) , volume=

  23. [23]

    The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31--June 4, 2020, Proceedings 17 , pages=

    Jim. The Semantic Web: 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31--June 4, 2020, Proceedings 17 , pages=. 2020 , organization=

  24. [24]

    20th International Conference on Semantic Systems (SEMANTiCS 2024) , year=

    Populating CSV Files from Unstructured Text with LLMs for KG Generation with RML , author=. 20th International Conference on Semantic Systems (SEMANTiCS 2024) , year=

  25. [25]

    W3C recommendation , volume=

    A direct mapping of relational data to RDF , author=. W3C recommendation , volume=

  26. [26]

    Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023

    Frey, Johannes and Meyer, Lars-Peter and Brei, Felix and Gr \"u nder-Fahrer, Sabine and Martin, Michael. Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023. ESWC 2024 Satellite Events, Hersonissos, Crete, Greece, May 26 - 30, 2024, Proceedings. 2024

  27. [27]

    Mapping by Example: Towards an RML Mapping Reverse Engineering Pipeline

    Freund, Michael and Dorsch, Rene and Schmid, Sebastian and Harth, Andreas. Mapping by Example: Towards an RML Mapping Reverse Engineering Pipeline. KGCW'25: 6th International Workshop on Knowledge Graph Construction, June 1, 2025, Portoroz, SLO. 2025

  28. [28]

    AI in Civil Engineering , volume=

    Utilizing large language models for semantic enrichment of infrastructure condition data: a comparative study of GPT and Llama models , author=. AI in Civil Engineering , volume=. 2025 , publisher=

  29. [29]

    International semantic web conference , pages=

    Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text , author=. International semantic web conference , pages=. 2023 , organization=

  30. [30]

    , author=

    JenTab: Matching Tabular Data to Knowledge Graphs. , author=. SemTab@ ISWC , pages=

  31. [31]

    Information Processing & Management , volume=

    Ontogenix: Leveraging large language models for enhanced ontology engineering from datasets , author=. Information Processing & Management , volume=. 2025 , publisher=

  32. [32]

    CEUR Workshop Proceedings , pages=

    Torchictab: Semantic table annotation with wikidata and language models , author=. CEUR Workshop Proceedings , pages=. 2023 , organization=

  33. [33]

    Semantic Web , volume=

    Mtab4d: Semantic annotation of tabular data with dbpedia , author=. Semantic Web , volume=. 2024 , publisher=

  34. [34]

    The 20th International Semantic Web Conference (ISWC 2021) , volume=

    DAGOBAH: table and graph contexts for efficient semantic annotation of tabular data , author=. The 20th International Semantic Web Conference (ISWC 2021) , volume=

  35. [35]

    European Knowledge Acquisition Workshop , pages=

    Sicilia,. European Knowledge Acquisition Workshop , pages=. 2016 , organization=

  36. [36]

    Semantic Web , volume=

    RODI: Benchmarking relational-to-ontology mapping generation quality , author=. Semantic Web , volume=. 2017 , publisher=

  37. [37]

    Semantic Web , volume=

    Ontop: Answering SPARQL queries over relational databases , author=. Semantic Web , volume=. 2016 , publisher=

  38. [38]

    2004 , organization=

    Bizer, Christian and Seaborne, Andy , booktitle=. 2004 , organization=

  39. [39]

    2015 , organization=

    de Medeiros, Luciano Frontino and Priyatna, Freddy and Corcho, Oscar , booktitle=. 2015 , organization=

  40. [40]

    2025 , publisher=

    Pugnaloni, Francesco and Zecchini, Luca and Paganelli, Matteo and Lissandrini, Matteo and Naumann, Felix and Simonini, Giovanni , journal=. 2025 , publisher=

  41. [41]

    2019 , organization=

    Chaves-Fraga, David and Endris, Kemele M and Iglesias, Enrique and Corcho, Oscar and Vidal, Maria-Esther , booktitle=. 2019 , organization=

  42. [42]

    Journal of Web Semantics , volume=

    Scaling up knowledge graph creation to large and heterogeneous data sources , author=. Journal of Web Semantics , volume=. 2023 , publisher=

  43. [43]

    Do Large Language Models Know What They Don ' t Know?

    Yin, Zhangyue and Sun, Qiushi and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Huang, Xuanjing. Do Large Language Models Know What They Don ' t Know?. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.551

  44. [44]

    Semantic Web , volume=

    Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines , author=. Semantic Web , volume=. 2025 , publisher=

  45. [45]

    International Semantic Web Conference , pages=

    BootOX: Practical mapping of RDBs to OWL 2 , author=. International Semantic Web Conference , pages=. 2015 , organization=

  46. [46]

    2018 , booktitle=

    Ontology-based data access: A survey , author=. 2018 , booktitle=

  47. [47]

    European Semantic Web Conference , pages=

    12 shades of RDF: impact of syntaxes on data extraction with language models , author=. European Semantic Web Conference , pages=. 2024 , organization=

  48. [48]

    Conformance test cases for the

    Heyvaert, Pieter and Chaves-Fraga, David and Priyatna, Freddy and Corcho, Oscar and Mannens, Erik and Verborgh, Ruben and Dimou, Anastasia , booktitle=. Conformance test cases for the. 2019 , organization=

  49. [49]

    Automation in Construction , volume=

    Guidelines for Linked Data generation and publication: An example in building energy consumption , author=. Automation in Construction , volume=. 2015 , publisher=

  50. [50]

    Semantic Web , volume=

    An ontological approach for representing declarative mapping languages , author=. Semantic Web , volume=. 2024 , publisher=

  51. [51]

    2018 , organization=

    Heyvaert, Pieter and De Meester, Ben and Dimou, Anastasia and Verborgh, Ruben , booktitle=. 2018 , organization=

  52. [52]

    Randles, Alex and O’Sullivan, Declan , booktitle=

  53. [53]

    Proceedings of the 5th International Workshop on Knowledge Graph Construction (KGCW 2024) co-located with 21st Extended Semantic Web Conference (ESWC 2024) , volume=

    Towards self-configuring Knowledge Graph Construction Pipelines using LLMs - A Case Study with RML , author=. Proceedings of the 5th International Workshop on Knowledge Graph Construction (KGCW 2024) co-located with 21st Extended Semantic Web Conference (ESWC 2024) , volume=

  54. [54]

    European Semantic Web Conference , pages=

    Schmidt, Wilma Johanna and Grangel-Gonz. European Semantic Web Conference , pages=. 2025 , organization=

  55. [55]

    2006 , publisher=

    Ontology alignment: bridging the semantic gap , author=. 2006 , publisher=

  56. [56]

    Proceedings of the ISWC'24, Special Session on LLMs , year =

    Guntur Budi Herwanto and Stefani Tsaneva and Marta Sabou , title =. Proceedings of the ISWC'24, Special Session on LLMs , year =

  57. [57]

    2024 , organization=

    Fathallah, Nadeen and Das, Arunav and Giorgis, Stefano De and Poltronieri, Andrea and Haase, Peter and Kovriguina, Liubov , booktitle=. 2024 , organization=

  58. [58]

    GPT-4o System Card

    OpenAI , year=. 2410.21276 , archivePrefix=

  59. [59]

    OpenAI o1 System Card

    OpenAI , year=. 2412.16720 , archivePrefix=

  60. [60]

    2025 , url=

    OpenAI o3 , author=. 2025 , url=

  61. [61]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI , year=. DeepSeek-. 2501.12948 , archivePrefix=

  62. [62]

    2024 , eprint=

    Gemini: A Family of Highly Capable Multimodal Models , author=. 2024 , eprint=

  63. [63]

    2024 , eprint=

    The Llama 3 Herd of Models , author=. 2024 , eprint=

  64. [64]

    2024 , url=

    Mixtral-8x22b: Cheaper, better, faster, stronger , author=. 2024 , url=

  65. [65]

    Declarative

    Dylan. Declarative. Journal of Web Semantics , volume =. 2023 , issn =. doi:https://doi.org/10.1016/j.websem.2022.100753 , url =