arxiv: 2605.08222 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI· cs.IR

Recognition: 2 theorem links

· Lean Theorem

From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline

Sarah Binta Alam Shoilee , Victor de Boer , Jacco van Ossenbruggen , Susan Leg\^ene

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.IR

keywords handwritten archival tablesknowledge graphsdata provenancemodular pipelinehistorical datatable reconstructionimage to KG

0 comments

The pith

A modular pipeline with systematic origin tracking converts handwritten archival table images into traceable knowledge graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a workflow that splits the conversion of handwritten historical tables into knowledge graphs into three separate stages so each can be checked and fixed independently. It adds data provenance at every step so that every entity and value in the final graph can be traced back to specific locations in the original image. This structure is meant to replace opaque end-to-end AI systems with one that supports ongoing human inspection and correction. The method is shown on real military career records, where different table reconstruction approaches can be swapped in and evaluated without rebuilding the whole system.

Core claim

The central claim is that a three-stage modular pipeline consisting of table reconstruction, information extraction, and KG construction, when combined with systematic data provenance that keeps every extracted entity and literal linked to its visual and textual source, produces transparent and collaboratively controllable image-to-KG conversions for complex handwritten archival material.

What carries the argument

The three-stage modular pipeline with integrated data provenance, which decomposes the workflow into inspectable steps and maintains traceability from original image pixels to final graph elements.

If this is right

Different table reconstruction algorithms can be inserted and compared while keeping the rest of the extraction and graph-building stages unchanged.
Every node and literal in the final knowledge graph carries explicit links to its source region in the original handwritten image.
The separation of stages reduces the risk that an error in one part of the process remains hidden inside a single black-box model.
Experiments on military career tables show that modularity allows targeted evaluation of each reconstruction variant on real archival images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged approach with provenance could be tested on other historical document formats such as handwritten ledgers or correspondence.
Provenance records could be used to generate automatic confidence scores or to route uncertain items to human reviewers.
The pipeline design suggests a template for making other image-to-structured-data tasks in cultural heritage more auditable.

Load-bearing premise

Making intermediate representations visible and attaching origin information to every data item will be enough for humans to perform meaningful oversight, evaluation, and correction in practice.

What would settle it

A controlled user study in which participants supplied with the intermediate outputs and provenance links still fail to detect or correct a measurable fraction of extraction errors in the resulting knowledge graphs would show the approach does not deliver the claimed collaborative control.

Figures

Figures reproduced from arXiv: 2605.08222 by Jacco van Ossenbruggen, Sarah Binta Alam Shoilee, Susan Leg\^ene, Victor de Boer.

**Figure 1.** Figure 1: Illustrative architecture of our provenance-aware pipeline. Blue elements correspond to core components, yellow elements to operations within each component, purple shapes to input resources, and green shapes to generated outputs. Solid arrows indicate the direction of data flow throughout the pipeline and dotted lines visually illustrates some of the intermediate outputs. 2 Image NL-HaNA 2.10.50 45 0143 … view at source ↗

**Figure 2.** Figure 2: Automatically detected cell bounding box visualisation of image NL-HaNA 2.10.50 45 0143 ing the textual content. More details on how these metrics can be implemented in practice are provided in the Appendix A with the illustrative running example. 3.1.2. Implementation Specification To demonstrate the possibility of implementing different approaches, we implemented three table reconstruction variants drawn… view at source ↗

**Figure 3.** Figure 3: On the right, the image displays a snippet of the structured information extracted from the HTML table (in the left), together with its corresponding cell- and text-level provenance, where available. Note that, some cell provenance is lost when the extracted information cannot be directly (string) matched to the original cell text as explained in Section 3.2 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Snippet of assertion graph for image NL-HaNA 2.10.50 45 0143 URIs, assigning entity types, and mapping property assertions according to the schema. To maintain traceability, every triple is also added to three provenance-specific named graphs: row-level (e.g., :prov row 1), cell-level (e.g., :prov cell 12), and text-span (e.g., :prov span 241 250). Each links the assertion to its corresponding row, cell, a… view at source ↗

**Figure 5.** Figure 5: Snippet of provenance graph for image NL-HaNA 2.10.50 45 0143 3.3.1. Evaluation Metrics As the assertion graph is constructed via a direct mapping from the structured information produced by the previous component, we assume that precision, recall, and F1-score directly reflect the semantic correctness of the resulting assertion graph. The consistency of the provenance graph is evaluated using SHACL shape… view at source ↗

**Figure 6.** Figure 6: Event extraction from a degraded handwritten military career register using proposed pipeline and human correction. (a) Predicted cell bounding boxes (maP=0.1), (b) Reconstructed HTML table (with TED=0.549, TED-struct=0.622), (c) Human-corrected HTML, (d) Pipeline-extracted events (e) Human-corrected events, Military biography reconstruction offers an important use-case: tracing personnel movements across… view at source ↗

**Figure 7.** Figure 7: Predicted Cell Bounding Box [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Ground truth Cell Bounding Box [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Predicted HTML table [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Ground truth HTML table B. Information Extraction Evaluation Metrics For the evaluation of the Information Extraction component, we compute Precision, Recall, and F1-score for 1 by comparing it against 2. We first apply the Hungarian algorithm [35] to obtain an optimal one-to-one alignment between the predicted and gold-standard person entities. For each aligned entity pair, corresponding property value… view at source ↗

read the original abstract

Handwritten archival tables contain rich historical information, yet transforming them into structured representations, such as Knowledge Graphs, requires integrating table structure recognition, handwriting recognition, and semantic interpretation - a complex multimodal process. End-to-end AI implementations can obscure these steps, resulting in opaque algorithmic operations that hinder human oversight, critical assessment, and trust. To address this, we present a modular, provenance-aware pipeline to convert handwritten tabular images into KGs supporting human-AI collaboration. The pipeline decomposes the workflow into three stages - table reconstruction, information extraction, and KG construction - while exposing intermediate representations for inspection, evaluation, and correction. A key contribution of our approach is the systematic integration of data provenance at every stage, ensuring that all extracted entities and literals remain traceable to their visual and textual origins. The proposed pipeline is demonstrated through a number of experiments on real-world archival material concerning military careers. The results across three different table reconstruction variants highlight the importance of modularisation. By coupling modularity with data provenance, our work advances transparent and collaboratively controllable image-to-KG pipelines for complex historical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a modular three-stage pipeline that adds provenance tracking to handwritten archival tables-to-KG conversion, but the experiments compare reconstruction variants without measuring whether the provenance actually improves human oversight or correction.

read the letter

The main takeaway is a concrete pipeline that splits handwritten table processing into table reconstruction, information extraction, and KG construction while logging provenance at each step so every entity stays traceable to its source image or text. They test this on real military career archives and run three different table reconstruction methods to show that modularity changes the results in practice.

Referee Report

2 major / 2 minor

Summary. The paper presents a modular, provenance-aware pipeline that decomposes handwritten tabular image processing into table reconstruction, information extraction, and knowledge graph construction stages. It integrates systematic data provenance to expose intermediate representations for human inspection, evaluation, and correction, and demonstrates the approach on real-world military-career archival tables using three table-reconstruction variants to illustrate the value of modularity.

Significance. A well-executed demonstration of this pipeline could advance transparent, human-AI collaborative systems for historical document digitization by making complex multimodal workflows auditable and correctable. The conceptual integration of provenance with modularity addresses a genuine need in archival AI, but the manuscript's contribution remains primarily architectural rather than empirically validated on the oversight benefits.

major comments (2)

[experimental demonstration / results on archival material] The experimental demonstration on military-career archives compares three table reconstruction variants and concludes that modularity matters, yet reports no quantitative metrics, user studies, correction-success rates, time-to-fix measurements, or ablation comparing provenance-enabled versus provenance-free versions. This leaves the central claim that 'coupling modularity with data provenance' produces 'collaboratively controllable' pipelines untested and unsupported by evidence.
[pipeline description and abstract] The abstract and pipeline description assert that exposing intermediates and provenance traces enables 'meaningful human oversight, evaluation, and correction,' but no protocols, interfaces, or evaluation criteria for human correction are specified or tested. This assumption is load-bearing for the paper's contribution yet remains a design hypothesis rather than a demonstrated outcome.

minor comments (2)

[abstract] The abstract would be strengthened by including at least high-level quantitative indicators (e.g., reconstruction accuracy or entity extraction F1) from the three variants if they exist in the full manuscript.
[pipeline overview] Figure or diagram clarity: a visual overview of the three stages with provenance arrows would help readers trace how entities remain linked to visual origins.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical grounding of the pipeline's oversight benefits. We address each major comment below, acknowledging the primarily architectural nature of the contribution while outlining targeted revisions to improve clarity and framing.

read point-by-point responses

Referee: The experimental demonstration on military-career archives compares three table reconstruction variants and concludes that modularity matters, yet reports no quantitative metrics, user studies, correction-success rates, time-to-fix measurements, or ablation comparing provenance-enabled versus provenance-free versions. This leaves the central claim that 'coupling modularity with data provenance' produces 'collaboratively controllable' pipelines untested and unsupported by evidence.

Authors: We agree that the experiments provide a qualitative demonstration on real archival material rather than quantitative metrics, user studies, or ablations of provenance-enabled versus free variants. The manuscript uses the three variants to illustrate how modularity supports different reconstruction strategies with full traceability, but does not empirically measure oversight improvements such as correction success or time savings. In revision we will add a limitations subsection explicitly stating the current evaluation's scope, include concrete examples of provenance traces supporting inspection and correction, and note that controlled user studies remain future work. This will better contextualize the architectural contribution without overstating empirical validation. revision: partial
Referee: The abstract and pipeline description assert that exposing intermediates and provenance traces enables 'meaningful human oversight, evaluation, and correction,' but no protocols, interfaces, or evaluation criteria for human correction are specified or tested. This assumption is load-bearing for the paper's contribution yet remains a design hypothesis rather than a demonstrated outcome.

Authors: The current manuscript describes the exposure of intermediates and provenance for inspection but does not detail specific correction protocols, interfaces, or evaluation criteria. We accept that this leaves the oversight claim as a design hypothesis. We will revise the abstract and pipeline section to use more precise language stating that the pipeline provides traceable intermediates that support human oversight, rather than asserting it directly enables 'meaningful' correction. We will also add a short subsection outlining example correction workflows that leverage the provenance links, thereby making the hypothesis more concrete as a design proposal. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural description with empirical demonstration on real data

full rationale

The paper presents a modular pipeline decomposing image-to-KG conversion into table reconstruction, information extraction, and KG construction stages, with systematic provenance tracking. It demonstrates the approach via experiments comparing three table-reconstruction variants on military-career archives, concluding that modularity matters. No equations, first-principles derivations, fitted parameters, or predictions appear in the text. The central claim (coupling modularity with provenance advances transparent, controllable pipelines) is a design assertion supported by the described architecture and reconstruction results, without reducing to self-definition, self-citation chains, or renaming of known results. The untested assumption regarding human oversight is a limitation of evidence strength, not a circular reduction of any derivation to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard assumptions from computer vision (table structure recognition is feasible) and knowledge representation (entities and relations can be extracted and linked). No free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5507 in / 1125 out tokens · 37599 ms · 2026-05-12T00:45:34.579098+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The pipeline decomposes the workflow into three stages—table reconstruction, information extraction, and KG construction—while exposing intermediate representations... systematic integration of data provenance at every stage
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We report precision, recall, and F1-score... Tree Edit Distance (TED) and its text content ignorant variant TED-Struct

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Knowledge Graphs

Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo Gd, Gutierrez C, et al. Knowledge Graphs. ACM Computing Surveys. 2021;54(4):1-37. May 2026

work page 2021
[2]

Linked Open Data for Cultural Heritage

de Boer V , Shoilee SBA. Linked Open Data for Cultural Heritage. In: The Palgrave Encyclopedia of Cultural Heritage and Conflict. Cham: Springer Nature Switzerland; 2025. p. 1-7. Available from: https://doi.org/10.1007/978-3-030-61493-5_274-1

work page doi:10.1007/978-3-030-61493-5_274-1 2025
[3]

Analyzing Knowledge Graph Innovations and Emerging AI technologies for Cul- tural Heritage Data Management - Bulgarian Digital Mathematics Library (BulDML)

Kraev M, Luchev D. Analyzing Knowledge Graph Innovations and Emerging AI technologies for Cul- tural Heritage Data Management - Bulgarian Digital Mathematics Library (BulDML). Digital Presenta- tion and Preservation of Cultural and Scientific Heritage. 2025;15:247-58

work page 2025
[4]

Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-Loop

Zhang B, Mero&#241, Pe&#241 O, Uela A, Simperl E. Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-Loop. In: HHAI 2023: Augmenting Human Intellect. IOS Press

work page 2023
[5]

p. 274-89. Available from:https://ebooks.iospress.nl/doi/10.3233/FAIA230091

work page doi:10.3233/faia230091
[6]

Ontologies in CLARIAH: Towards Interoperability in History, Language and Media

Mero ˜no-Pe˜nuela A, de Boer V , van Erp M, Zijdeman R, Mourits R, Melder W, et al. Ontologies in CLARIAH: Towards Interoperability in History, Language and Media. arXiv:200402845 [cs]. 2020 Jul. ArXiv: 2004.02845. Available from:http://arxiv.org/abs/2004.02845

work page arXiv 2020
[7]

Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research

Tuominen J, Hyvonen E, Leskinen P. Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research. Biographical Data in a Digital World. 2017:8

work page 2017
[8]

A Novel Connectionist Sys- tem for Unconstrained Handwriting Recognition

Graves A, Liwicki M, Fern ´andez S, Bertolami R, Bunke H, Schmidhuber J. A Novel Connectionist Sys- tem for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009;31(5):855-68

work page 2009
[9]

A Comprehensive Review on Document Image Binarization

Bataineh B, Tounsi M, Zamzami N, Janbi J, Abu-Ain W AK, AbuAin T, et al. A Comprehensive Review on Document Image Binarization. Journal of Imaging. 2025 Apr;11(5):133

work page 2025
[10]

Enhancing table recognition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner

Zhou Y , Cheng M, Mao Q, Wang J, Xu F, Li X. Enhancing table recognition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. IJCAI ’25; 2025. Available from:https://doi.org/10. 24963/ijcai.2025/279

work page 2025
[11]

DocSpiral: A Platform for Integrated As- sistive Document Annotation through Human-in-the-Spiral

Sun Q, Li S, Bi T, Huynh DQ, Reynolds M, Luo Y , et al. DocSpiral: A Platform for Integrated As- sistive Document Annotation through Human-in-the-Spiral. In: Mishra P, Muresan S, Yu T, editors. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume 3: System Demonstrations). Vienna, Austria: Association for Comput...

work page 2025
[12]

Image-Based Table Recognition: Data, Model, and Eval- uation

Zhong X, ShafieiBavani E, Jimeno Yepes A. Image-Based Table Recognition: Data, Model, and Eval- uation. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI. Berlin, Heidelberg: Springer-Verlag; 2020. p. 564–580. Available from: https://doi.org/10.1007/978-3-030-58589-1_34

work page doi:10.1007/978-3-030-58589-1_34 2020
[13]

Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents

Klut S, Van Koert R, Sluijter R. Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents. In: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing. San Jose CA USA: ACM; 2023. p. 67-72. Available from:https://dl.acm. org/doi/10.1145/3604951.3605520

work page doi:10.1145/3604951.3605520 2023
[14]

Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents

Sun Q, Luo Y , Zhang W, Li S, Li J, Niu K, et al. Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents. In: Companion Proceedings of the ACM on Web Conference 2025. Sydney NSW Australia: ACM; 2025. p. 801-4. Available from: https://dl.acm.org/doi/10.1145/3701716.3715309

work page doi:10.1145/3701716.3715309 2025
[15]

Uncertainty Management in the Construction of Knowledge Graphs: A Survey

Jarnac L, Chabot Y , Couceiro M. Uncertainty Management in the Construction of Knowledge Graphs: A Survey. Transactions on Graph Data and Knowledge. 2025;3(1):3:1-3:48. Available from:https: //drops.dagstuhl.de/entities/document/10.4230/TGDK.3.1.3

work page doi:10.4230/tgdk.3.1.3 2025
[16]

LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities

Zhu Y , Wang X, Chen J, Qiao S, Ou Y , Yao Y , et al. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities. World Wide Web. 2024 Aug;27(5). Available from:https://doi.org/10.1007/s11280-024-01297-w

work page doi:10.1007/s11280-024-01297-w 2024
[17]

Trustworthy Knowledge Graphs: Practices and Approaches

Zhang B, Koutsiana E, Zhao Y , Mero&#241, o Pe&#241, Uela A. Trustworthy Knowledge Graphs: Practices and Approaches. In: Handbook on Neurosymbolic AI and Knowledge Graphs. IOS Press

work page
[18]

p. 363-84. Available from:https://ebooks.iospress.nl/doi/10.3233/FAIA250215

work page doi:10.3233/faia250215
[19]

Hybrid Intelligence for Digital Humanities

de Boer V , Stork L. Hybrid Intelligence for Digital Humanities. In: HHAI 2024: Hybrid Human AI Sys- tems for the Social Good. IOS Press; 2024. p. 94-104. Available from:https://ebooks.iospress. nl/doi/10.3233/FAIA240186

work page doi:10.3233/faia240186 2024
[20]

FIDES: An ontology-based approach for making machine learning systems accountable

Fernandez I, Aceta C, Gilabert E, Esnaola-Gonzalez I. FIDES: An ontology-based approach for making machine learning systems accountable. Journal of Web Semantics. 2023 Dec;79:100808. Available from:https://www.sciencedirect.com/science/article/pii/S1570826823000379

work page 2023
[21]

Semantic May 2026 technologies for historical research: A survey

Mero ˜no-Pe˜nuela A, Ashkpour A, Van Erp M, Mandemakers K, Breure L, Scharnhorst A, et al. Semantic May 2026 technologies for historical research: A survey. Semantic Web. 2014 Oct;6(6):539-64. Available from: https://journals.sagepub.com/doi/full/10.3233/SW-140158

work page doi:10.3233/sw-140158 2026
[22]

BiographyNet: managing provenance at multiple levels and from different perspectives

Ockeloen N, Fokkens A, Ter Braake S, V ossen P, De Boer V , Schreiber G, et al. BiographyNet: managing provenance at multiple levels and from different perspectives. In: Proceedings of the 3rd International Conference on Linked Science - V olume 1116. LISC’13. Aachen, DEU: CEUR-WS.org; 2013. p. 59-71

work page 2013
[23]

LORE: logical location regression network for table structure recognition

Xing H, Gao F, Long R, Bu J, Zheng Q, Li L, et al. LORE: logical location regression network for table structure recognition. In: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Sym- posium on Educational Advances in Artificial Int...

work page
[24]

Available from:https://doi.org/10.1609/aaai.v37i3.25402

work page doi:10.1609/aaai.v37i3.25402
[25]

Segment anything

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision; 2023. p. 4015-26

work page 2023
[26]

Advancements and challenges in handwritten Text Recognition: A comprehensive survey

AlKendi W, Gechter F, Heyberger L, Guyeux C. Advancements and challenges in handwritten Text Recognition: A comprehensive survey. J Imaging. 2024 jan;10(1):18

work page 2024
[27]

Transkribus - A Service Platform for Transcription, Recog- nition and Retrieval of Historical Documents

Kahle P, Colutto S, Hackl G, M ¨uhlberger G. Transkribus - A Service Platform for Transcription, Recog- nition and Retrieval of Historical Documents. In: 2017 14th IAPR International Conference on Docu- ment Analysis and Recognition (ICDAR). vol. 04; 2017. p. 19-24

work page 2017
[28]

Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable

van Koert R, Klut S, Koornstra T, Maas M, Peters L. Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable. In: Mouch `ere H, Zhu A, editors. Document Analysis and Recognition – ICDAR 2024 Workshops. Cham: Springer Nature Switzerland; 2024. p. 73-88

work page 2024
[29]

Towards a digital infrastruc- ture for illustrated handwritten archives

Weber A, Ameryan M, Wolstencroft K, Stork L, Heerlien M, Schomaker L. Towards a digital infrastruc- ture for illustrated handwritten archives. Lecture Notes in Computer Science. 2018 jan;2018:155-66

work page 2018
[30]

Ehrmann, A

Ehrmann M, Hamdi A, Pontes EL, Romanello M, Doucet A. Named Entity Recognition and Clas- sification in Historical Documents: A Survey. ACM Comput Surv. 2023 Sep;56(2). Available from: https://doi.org/10.1145/3604931

work page doi:10.1145/3604931 2023
[31]

Managing Provenance Data in Knowledge Graph Management Platforms

Kleinsteuber E, Al Mustafa T, Zander F, K ¨onig-Ries B, Babalou S. Managing Provenance Data in Knowledge Graph Management Platforms. Datenbank-Spektrum. 2024 Mar;24(1):43-52. Available from:https://doi.org/10.1007/s13222-023-00463-0

work page doi:10.1007/s13222-023-00463-0 2024
[32]

Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks; 2025

Jain N. Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks; 2025. Available from:https://arxiv.org/abs/2501.09029

work page arXiv 2025
[33]

A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence

Akata Z, Balliet D, de Rijke M, Dignum F, Dignum V , Eiben G, et al. A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence. Computer. 2020;53(8):18-28

work page 2020
[34]

The pascal visual object classes (voc) challenge

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. International journal of computer vision. 2010;88(2):303-38

work page 2010
[35]

Tree edit distance: Robust and memory-efficient

Pawlik M, Augsten N. Tree edit distance: Robust and memory-efficient. Information Systems. 2016;56:157-73. Available from:https://www.sciencedirect.com/science/article/pii/ S0306437915001611

work page 2016
[36]

Complicated Table Structure Recognition

Chi Z, Huang H, Xu HD, Yu H, Yin W, Mao XL. Complicated Table Structure Recognition. arXiv; 2019. Available from:https://arxiv.org/abs/1908.04729

work page arXiv 2019
[37]

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning

Caufield JH, Hegde H, Emonet V , Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics. 2024;40(3):btae104. Available from:https://doi. org/10.1093/bioinformatics/btae104

work page doi:10.1093/bioinformatics/btae104 2024
[38]

The hungarian method for the assignment problem,

Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics (NRL). 2005;52(1):7-21. Available from:https://onlinelibrary.wiley.com/doi/abs/10.1002/nav. 20053

work page doi:10.1002/nav 2005
[39]

Novel Event Detection and Classification for Historical Texts

Sprugnoli R, Tonelli S. Novel Event Detection and Classification for Historical Texts. Computational Linguistics. 2019 jun;45(2):229-65. Available from:https://aclanthology.org/J19-2002/

work page 2019
[40]

Documenting Events in Metadata

Doerr M, Kritsotaki A. Documenting Events in Metadata. In: Ioannides M, Arnold D, Niccolucci F, Ma- nia K, editors. he 7th International Symposium on Virtual Reality, Archaeology and Cultural Heritage V AST; 2006. Available from:https://cidoc-crm.org/sites/default/files/Documenting% 20Events%20in%20Metadata.pdf

work page 2006
[41]

WarSampo knowledge graph: Fin- land in the Second World War as Linked Open Data

Koho M, Ikkala E, Leskinen P, Tamper M, Tuominen J, Hyv ¨onen E. WarSampo knowledge graph: Fin- land in the Second World War as Linked Open Data. Semantic Web. 2021 Jan;12(2):265-78. Available from:https://journals.sagepub.com/action/showAbstract

work page 2021