MUSEKG: A Knowledge Graph Over Museum Collections

Eun-Jung Holden; Jianzhong Qi; Jinhao Li; Soyeon Caren Han

arxiv: 2511.16014 · v2 · pith:OKWAW7GRnew · submitted 2025-11-20 · 💻 cs.AI

MUSEKG: A Knowledge Graph Over Museum Collections

Jinhao Li , Jianzhong Qi , Soyeon Caren Han , Eun-Jung Holden This is my paper

Pith reviewed 2026-05-25 07:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords knowledge graphmuseum collectionscultural heritagenatural language queriesdata integrationgraph retrievalinteractive system

0 comments

The pith

MuseKG integrates fragmented museum data into one typed graph that answers natural language questions with inspectable evidence neighborhoods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs MuseKG as a knowledge graph that merges catalogue records, images, and descriptions from museum collections into a single coherent structure. Objects connect to people, organisations, images, derived labels, and semantic entities through a typed schema. Natural language queries are mapped to graph entities, after which a compact neighbourhood of connected evidence is retrieved to produce the answer. This design keeps the reasoning path visible as explicit graph links rather than opaque output. A demonstration on actual collections confirms it handles attribute lookup, relation exploration, and relation-aware retrieval.

Core claim

MuseKG organises heterogeneous museum data into a typed graph that links objects, people, organisations, images, image-derived labels, and extracted semantic entities within a coherent schema. It supports natural-language queries by grounding user questions to graph entities and retrieving a compact neighbourhood of evidence for answer generation, enabling attribute lookup, relation exploration, and relation-aware retrieval with answers that remain inspectable via explicit graph structures.

What carries the argument

The typed graph schema that unifies objects, people, organisations, images, labels and semantic entities, together with the grounding step that maps queries to entities and pulls evidence neighbourhoods.

If this is right

Attribute lookup works across catalogues and images in one system.
Relation exploration surfaces connections between objects, people and images.
Relation-aware retrieval uses the graph structure to produce answers with visible supporting paths.
Answers stay traceable because every result is tied to an explicit subgraph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same integration pattern could extend to libraries or archives that hold mixed structured and image data.
Visible evidence neighbourhoods may raise user trust compared with systems that return only text answers.
If the schema generalises, museums could keep source data unchanged while offering unified query access.

Load-bearing premise

Heterogeneous museum data sources can be merged into one coherent typed graph schema that keeps all original relations intact without inconsistencies or heavy manual curation.

What would settle it

Merge two real museum datasets into MuseKG, then issue a query whose returned neighbourhood omits a documented relation that exists between entities in one of the source collections.

Figures

Figures reproduced from arXiv: 2511.16014 by Eun-Jung Holden, Jianzhong Qi, Jinhao Li, Soyeon Caren Han.

**Figure 1.** Figure 1: System overview of MuseKG. Module 1 constructs the museum collections KG (MuseKG) from records. Module 2 takes a user query, retrieves KG context, and uses an LLM to generate a natural-language answer grounded in MuseKG. offers a query interface for users to interact with and query the data in Natural Language (NL). We define KG as a typed property graph 𝐺 = (𝑉 , 𝐸, 𝜏, 𝜌, 𝐴), where 𝑉 is a finite set of nod… view at source ↗

**Figure 2.** Figure 2: Visualisation of an example KG subgraph con [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Digitisation in the cultural heritage sector has produced large but fragmented repositories of museum collection data, spanning structured catalogue records, images, and unstructured descriptions. Existing museum information systems often make it difficult to integrate these sources into a unified, queryable representation that supports relation-aware exploration. We present MuseKG, an interactive knowledge graph system that organises heterogeneous museum data into a typed graph that links objects, people, organisations, images, image-derived labels, and extracted semantic entities within a coherent schema. MuseKG supports natural-language queries by grounding user questions to graph entities and retrieving a compact neighbourhood of evidence for answer generation. Through an interactive demonstration on real museum collections, we show that MuseKG supports common exploration tasks such as attribute lookup, relation exploration, and relation-aware retrieval, with answers that remain inspectable via explicit graph structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MuseKG is a standard KG application to museum data with no evidence shown for schema coherence or task performance.

read the letter

The main takeaway is that this paper describes MuseKG as a typed knowledge graph linking museum objects, people, organizations, images, and extracted entities, but supplies zero details on construction, schema, or results. The abstract claims it organizes heterogeneous sources into a coherent graph that supports natural-language queries and inspectable answers for tasks like attribute lookup and relation exploration. That framing is clear enough on paper, yet nothing backs it up beyond the existence of an interactive demo on real collections. Knowledge graphs have been used in cultural heritage before, so the novelty here is limited to this particular domain application and entity mix. The system idea itself is reasonable for making fragmented museum records more queryable without black-box answers. The soft spot is exactly where the stress-test note flags it: the central assumption that sources can be unified without inconsistencies or major curation losses is stated but never tested or shown. No schema is defined, no integration method or conflict rules are described, and no quantitative checks on relation preservation appear. This leaves the claims about supporting exploration tasks unverified. The paper targets practitioners building digital heritage tools who might borrow the entity types or query approach. A reader already working on museum data systems could pick up practical pointers from the listed relations, but anyone needing methods or evidence will get little. It does not rise to the level that warrants sending out for serious peer review in its current state, as the technical core remains undemonstrated.

Referee Report

2 major / 0 minor

Summary. The paper presents MuseKG, an interactive knowledge graph system that integrates heterogeneous museum data (structured catalogue records, images, and unstructured descriptions) into a typed graph linking objects, people, organisations, images, labels, and semantic entities within a coherent schema. It supports natural-language queries by grounding questions to graph entities and retrieving compact neighbourhoods of evidence, with an interactive demonstration on real collections claimed to enable attribute lookup, relation exploration, and relation-aware retrieval while keeping answers inspectable via explicit graph structures.

Significance. If the integration and query mechanisms function as described, MuseKG would address a practical need in cultural heritage informatics by unifying fragmented museum repositories into a relation-preserving, queryable graph that supports transparent exploration. The focus on inspectable graph evidence for answers is a constructive design choice for applied systems in this domain.

major comments (2)

[Abstract] Abstract: the central claim that heterogeneous sources are organised into a 'coherent schema' that 'preserves all relevant relations' without inconsistencies is unsupported, as the manuscript supplies no schema definition, construction procedure, conflict-resolution rules, or validation that relations survive integration.
[Abstract] Abstract: the assertion that MuseKG 'supports common exploration tasks' with 'answers that remain inspectable' rests on an unshown demonstration; the manuscript contains no implementation details, quantitative metrics, error analysis, or results that would allow assessment of completeness or consistency on the claimed tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We appreciate the detailed feedback on the abstract claims and agree that additional details are necessary to substantiate them. We will revise the manuscript to include the requested information on the schema and demonstration.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that heterogeneous sources are organised into a 'coherent schema' that 'preserves all relevant relations' without inconsistencies is unsupported, as the manuscript supplies no schema definition, construction procedure, conflict-resolution rules, or validation that relations survive integration.

Authors: We agree that the current manuscript does not provide the schema definition, construction procedure, conflict-resolution rules, or validation results. The paper is intended as a system description, but to fully support the claims, we will add a dedicated section describing the schema, the integration process including how relations are preserved, any conflict handling, and validation steps in the revised version. revision: yes
Referee: [Abstract] Abstract: the assertion that MuseKG 'supports common exploration tasks' with 'answers that remain inspectable' rests on an unshown demonstration; the manuscript contains no implementation details, quantitative metrics, error analysis, or results that would allow assessment of completeness or consistency on the claimed tasks.

Authors: We concur that the manuscript lacks implementation details, quantitative metrics, error analysis, or results for the demonstration. We will expand the revised manuscript to include these elements, such as a description of the interactive system, any available metrics on query performance or task support, and an analysis of the demonstration's outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with no derivations or predictions

full rationale

The paper is a system description of MuseKG, a knowledge graph for museum collections. It presents no mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems. Claims about schema coherence and task support rest on the described construction process and interactive demo, with no internal steps that reduce to inputs by construction or via self-citation chains. This is a normal non-finding for descriptive systems papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The contribution rests on the domain assumption that museum data sources can be unified under a single typed graph schema. No free parameters or invented entities with independent evidence are introduced.

axioms (1)

domain assumption Heterogeneous museum data (catalogue records, images, unstructured descriptions) can be organised into a coherent typed graph linking objects, people, organisations, images, and semantic entities.
This premise is required for the system to function as described but is not demonstrated or justified in the abstract.

invented entities (1)

MuseKG no independent evidence
purpose: Interactive knowledge graph system for museum data integration and querying
The system itself is the main contribution; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5670 in / 1248 out tokens · 31843 ms · 2026-05-25T07:22:21.105122+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define KG as a typed property graph G=(V,E,τ,ρ,A) ... node types: {object,person,organisation,image_label,...} ... 7 relations ... normalisation, entity identification, deduplication, schema checks
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MuseKG supports ... attribute lookup, relation exploration, and relation-aware retrieval ... LLM-based entity extraction, KG context retrieval

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 3 internal anchors

[1]

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. InNeurIPS. 1877–1901

work page 2020
[3]

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv:2507.06261 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Vincent Emonet, Jerven Bolleman, Severine Duvaud, Tarcisio Mendes de Farias, and Ana Claudia Sima. 2024. LLM-based SPARQL query generation from natural language over federated knowledge graphs.arXiv preprint arXiv:2410.06062 (2024)

work page arXiv 2024
[5]

Aric Hagberg, Pieter J Swart, and Daniel A. Schult. 2008.Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Laboratory, Los Alamos, USA

work page 2008
[6]

Yuexin Huang, Suihuai Yu, Jianjie Chu, Hao Fan, and Bin Du. 2023. Using knowledge graphs and deep learning algorithms to enhance digital cultural heritage management.Heritage Science11, 1 (2023), 204

work page 2023
[7]

Jan Ignatowicz, Krzysztof Kutt, and Grzegorz J. Nalepa. 2025. Position Paper: Metadata enrichment model: Integrating neural networks and semantic knowl- edge graphs for cultural heritage applications.arXiv:2505.23543(2025)

work page arXiv 2025
[8]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. InNeurIPS. 22199–22213

work page 2022
[9]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al

work page
[10]

In NeurIPS

Retrieval-augmented generation for knowledge-intensive NLP tasks. In NeurIPS. 9459–9474

work page
[11]

Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. 2023. Knowl- edge graphs: Opportunities and challenges.Artificial Intelligence Review56, 11 (2023), 13071–13102

work page 2023
[12]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. 2025. Gemma 3 technical report.arXiv:2503.19786(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. 2025. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. InNeurIPS. 1877–1901

work page 2020

[3] [3]

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv:2507.06261 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Vincent Emonet, Jerven Bolleman, Severine Duvaud, Tarcisio Mendes de Farias, and Ana Claudia Sima. 2024. LLM-based SPARQL query generation from natural language over federated knowledge graphs.arXiv preprint arXiv:2410.06062 (2024)

work page arXiv 2024

[5] [5]

Aric Hagberg, Pieter J Swart, and Daniel A. Schult. 2008.Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Laboratory, Los Alamos, USA

work page 2008

[6] [6]

Yuexin Huang, Suihuai Yu, Jianjie Chu, Hao Fan, and Bin Du. 2023. Using knowledge graphs and deep learning algorithms to enhance digital cultural heritage management.Heritage Science11, 1 (2023), 204

work page 2023

[7] [7]

Jan Ignatowicz, Krzysztof Kutt, and Grzegorz J. Nalepa. 2025. Position Paper: Metadata enrichment model: Integrating neural networks and semantic knowl- edge graphs for cultural heritage applications.arXiv:2505.23543(2025)

work page arXiv 2025

[8] [8]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. InNeurIPS. 22199–22213

work page 2022

[9] [9]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al

work page

[10] [10]

In NeurIPS

Retrieval-augmented generation for knowledge-intensive NLP tasks. In NeurIPS. 9459–9474

work page

[11] [11]

Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. 2023. Knowl- edge graphs: Opportunities and challenges.Artificial Intelligence Review56, 11 (2023), 13071–13102

work page 2023

[12] [12]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. 2025. Gemma 3 technical report.arXiv:2503.19786(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025