pith. machine review for the scientific record. sign in

arxiv: 2605.08222 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI· cs.IR

Recognition: 2 theorem links

· Lean Theorem

From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.IR
keywords handwritten archival tablesknowledge graphsdata provenancemodular pipelinehistorical datatable reconstructionimage to KG
0
0 comments X

The pith

A modular pipeline with systematic origin tracking converts handwritten archival table images into traceable knowledge graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a workflow that splits the conversion of handwritten historical tables into knowledge graphs into three separate stages so each can be checked and fixed independently. It adds data provenance at every step so that every entity and value in the final graph can be traced back to specific locations in the original image. This structure is meant to replace opaque end-to-end AI systems with one that supports ongoing human inspection and correction. The method is shown on real military career records, where different table reconstruction approaches can be swapped in and evaluated without rebuilding the whole system.

Core claim

The central claim is that a three-stage modular pipeline consisting of table reconstruction, information extraction, and KG construction, when combined with systematic data provenance that keeps every extracted entity and literal linked to its visual and textual source, produces transparent and collaboratively controllable image-to-KG conversions for complex handwritten archival material.

What carries the argument

The three-stage modular pipeline with integrated data provenance, which decomposes the workflow into inspectable steps and maintains traceability from original image pixels to final graph elements.

If this is right

  • Different table reconstruction algorithms can be inserted and compared while keeping the rest of the extraction and graph-building stages unchanged.
  • Every node and literal in the final knowledge graph carries explicit links to its source region in the original handwritten image.
  • The separation of stages reduces the risk that an error in one part of the process remains hidden inside a single black-box model.
  • Experiments on military career tables show that modularity allows targeted evaluation of each reconstruction variant on real archival images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same staged approach with provenance could be tested on other historical document formats such as handwritten ledgers or correspondence.
  • Provenance records could be used to generate automatic confidence scores or to route uncertain items to human reviewers.
  • The pipeline design suggests a template for making other image-to-structured-data tasks in cultural heritage more auditable.

Load-bearing premise

Making intermediate representations visible and attaching origin information to every data item will be enough for humans to perform meaningful oversight, evaluation, and correction in practice.

What would settle it

A controlled user study in which participants supplied with the intermediate outputs and provenance links still fail to detect or correct a measurable fraction of extraction errors in the resulting knowledge graphs would show the approach does not deliver the claimed collaborative control.

Figures

Figures reproduced from arXiv: 2605.08222 by Jacco van Ossenbruggen, Sarah Binta Alam Shoilee, Susan Leg\^ene, Victor de Boer.

Figure 1
Figure 1. Figure 1: Illustrative architecture of our provenance-aware pipeline. Blue elements correspond to core com￾ponents, yellow elements to operations within each component, purple shapes to input resources, and green shapes to generated outputs. Solid arrows indicate the direction of data flow throughout the pipeline and dotted lines visually illustrates some of the intermediate outputs. 2 Image NL-HaNA 2.10.50 45 0143 … view at source ↗
Figure 2
Figure 2. Figure 2: Automatically detected cell bounding box visualisation of image NL-HaNA 2.10.50 45 0143 ing the textual content. More details on how these metrics can be implemented in practice are provided in the Appendix A with the illustrative running example. 3.1.2. Implementation Specification To demonstrate the possibility of implementing different approaches, we implemented three table reconstruction variants drawn… view at source ↗
Figure 3
Figure 3. Figure 3: On the right, the image displays a snippet of the structured information extracted from the HTML table (in the left), together with its corresponding cell- and text-level provenance, where available. Note that, some cell provenance is lost when the extracted information cannot be directly (string) matched to the original cell text as explained in Section 3.2 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Snippet of assertion graph for image NL-HaNA 2.10.50 45 0143 URIs, assigning entity types, and mapping property assertions according to the schema. To maintain traceability, every triple is also added to three provenance-specific named graphs: row-level (e.g., :prov row 1), cell-level (e.g., :prov cell 12), and text-span (e.g., :prov span 241 250). Each links the assertion to its corresponding row, cell, a… view at source ↗
Figure 5
Figure 5. Figure 5: Snippet of provenance graph for image NL-HaNA 2.10.50 45 0143 3.3.1. Evaluation Metrics As the assertion graph is constructed via a direct mapping from the structured informa￾tion produced by the previous component, we assume that precision, recall, and F1-score directly reflect the semantic correctness of the resulting assertion graph. The consistency of the provenance graph is evaluated using SHACL shape… view at source ↗
Figure 6
Figure 6. Figure 6: Event extraction from a degraded handwritten military career register using proposed pipeline and human correction. (a) Predicted cell bounding boxes (maP=0.1), (b) Reconstructed HTML table (with TED=0.549, TED-struct=0.622), (c) Human-corrected HTML, (d) Pipeline-extracted events (e) Human-cor￾rected events, Military biography reconstruction offers an important use-case: tracing personnel movements across… view at source ↗
Figure 7
Figure 7. Figure 7: Predicted Cell Bounding Box [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ground truth Cell Bounding Box [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Predicted HTML table [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ground truth HTML table B. Information Extraction Evaluation Metrics For the evaluation of the Information Extraction component, we compute Precision, Re￾call, and F1-score for 1 by comparing it against 2. We first apply the Hungarian algorithm [35] to obtain an optimal one-to-one align￾ment between the predicted and gold-standard person entities. For each aligned entity pair, corresponding property value… view at source ↗
read the original abstract

Handwritten archival tables contain rich historical information, yet transforming them into structured representations, such as Knowledge Graphs, requires integrating table structure recognition, handwriting recognition, and semantic interpretation - a complex multimodal process. End-to-end AI implementations can obscure these steps, resulting in opaque algorithmic operations that hinder human oversight, critical assessment, and trust. To address this, we present a modular, provenance-aware pipeline to convert handwritten tabular images into KGs supporting human-AI collaboration. The pipeline decomposes the workflow into three stages - table reconstruction, information extraction, and KG construction - while exposing intermediate representations for inspection, evaluation, and correction. A key contribution of our approach is the systematic integration of data provenance at every stage, ensuring that all extracted entities and literals remain traceable to their visual and textual origins. The proposed pipeline is demonstrated through a number of experiments on real-world archival material concerning military careers. The results across three different table reconstruction variants highlight the importance of modularisation. By coupling modularity with data provenance, our work advances transparent and collaboratively controllable image-to-KG pipelines for complex historical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a modular, provenance-aware pipeline that decomposes handwritten tabular image processing into table reconstruction, information extraction, and knowledge graph construction stages. It integrates systematic data provenance to expose intermediate representations for human inspection, evaluation, and correction, and demonstrates the approach on real-world military-career archival tables using three table-reconstruction variants to illustrate the value of modularity.

Significance. A well-executed demonstration of this pipeline could advance transparent, human-AI collaborative systems for historical document digitization by making complex multimodal workflows auditable and correctable. The conceptual integration of provenance with modularity addresses a genuine need in archival AI, but the manuscript's contribution remains primarily architectural rather than empirically validated on the oversight benefits.

major comments (2)
  1. [experimental demonstration / results on archival material] The experimental demonstration on military-career archives compares three table reconstruction variants and concludes that modularity matters, yet reports no quantitative metrics, user studies, correction-success rates, time-to-fix measurements, or ablation comparing provenance-enabled versus provenance-free versions. This leaves the central claim that 'coupling modularity with data provenance' produces 'collaboratively controllable' pipelines untested and unsupported by evidence.
  2. [pipeline description and abstract] The abstract and pipeline description assert that exposing intermediates and provenance traces enables 'meaningful human oversight, evaluation, and correction,' but no protocols, interfaces, or evaluation criteria for human correction are specified or tested. This assumption is load-bearing for the paper's contribution yet remains a design hypothesis rather than a demonstrated outcome.
minor comments (2)
  1. [abstract] The abstract would be strengthened by including at least high-level quantitative indicators (e.g., reconstruction accuracy or entity extraction F1) from the three variants if they exist in the full manuscript.
  2. [pipeline overview] Figure or diagram clarity: a visual overview of the three stages with provenance arrows would help readers trace how entities remain linked to visual origins.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical grounding of the pipeline's oversight benefits. We address each major comment below, acknowledging the primarily architectural nature of the contribution while outlining targeted revisions to improve clarity and framing.

read point-by-point responses
  1. Referee: The experimental demonstration on military-career archives compares three table reconstruction variants and concludes that modularity matters, yet reports no quantitative metrics, user studies, correction-success rates, time-to-fix measurements, or ablation comparing provenance-enabled versus provenance-free versions. This leaves the central claim that 'coupling modularity with data provenance' produces 'collaboratively controllable' pipelines untested and unsupported by evidence.

    Authors: We agree that the experiments provide a qualitative demonstration on real archival material rather than quantitative metrics, user studies, or ablations of provenance-enabled versus free variants. The manuscript uses the three variants to illustrate how modularity supports different reconstruction strategies with full traceability, but does not empirically measure oversight improvements such as correction success or time savings. In revision we will add a limitations subsection explicitly stating the current evaluation's scope, include concrete examples of provenance traces supporting inspection and correction, and note that controlled user studies remain future work. This will better contextualize the architectural contribution without overstating empirical validation. revision: partial

  2. Referee: The abstract and pipeline description assert that exposing intermediates and provenance traces enables 'meaningful human oversight, evaluation, and correction,' but no protocols, interfaces, or evaluation criteria for human correction are specified or tested. This assumption is load-bearing for the paper's contribution yet remains a design hypothesis rather than a demonstrated outcome.

    Authors: The current manuscript describes the exposure of intermediates and provenance for inspection but does not detail specific correction protocols, interfaces, or evaluation criteria. We accept that this leaves the oversight claim as a design hypothesis. We will revise the abstract and pipeline section to use more precise language stating that the pipeline provides traceable intermediates that support human oversight, rather than asserting it directly enables 'meaningful' correction. We will also add a short subsection outlining example correction workflows that leverage the provenance links, thereby making the hypothesis more concrete as a design proposal. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural description with empirical demonstration on real data

full rationale

The paper presents a modular pipeline decomposing image-to-KG conversion into table reconstruction, information extraction, and KG construction stages, with systematic provenance tracking. It demonstrates the approach via experiments comparing three table-reconstruction variants on military-career archives, concluding that modularity matters. No equations, first-principles derivations, fitted parameters, or predictions appear in the text. The central claim (coupling modularity with provenance advances transparent, controllable pipelines) is a design assertion supported by the described architecture and reconstruction results, without reducing to self-definition, self-citation chains, or renaming of known results. The untested assumption regarding human oversight is a limitation of evidence strength, not a circular reduction of any derivation to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard assumptions from computer vision (table structure recognition is feasible) and knowledge representation (entities and relations can be extracted and linked). No free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5507 in / 1125 out tokens · 37599 ms · 2026-05-12T00:45:34.579098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Knowledge Graphs

    Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo Gd, Gutierrez C, et al. Knowledge Graphs. ACM Computing Surveys. 2021;54(4):1-37. May 2026

  2. [2]

    Linked Open Data for Cultural Heritage

    de Boer V , Shoilee SBA. Linked Open Data for Cultural Heritage. In: The Palgrave Encyclopedia of Cultural Heritage and Conflict. Cham: Springer Nature Switzerland; 2025. p. 1-7. Available from: https://doi.org/10.1007/978-3-030-61493-5_274-1

  3. [3]

    Analyzing Knowledge Graph Innovations and Emerging AI technologies for Cul- tural Heritage Data Management - Bulgarian Digital Mathematics Library (BulDML)

    Kraev M, Luchev D. Analyzing Knowledge Graph Innovations and Emerging AI technologies for Cul- tural Heritage Data Management - Bulgarian Digital Mathematics Library (BulDML). Digital Presenta- tion and Preservation of Cultural and Scientific Heritage. 2025;15:247-58

  4. [4]

    Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-Loop

    Zhang B, Mero&#241, Pe&#241 O, Uela A, Simperl E. Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-Loop. In: HHAI 2023: Augmenting Human Intellect. IOS Press

  5. [5]

    p. 274-89. Available from:https://ebooks.iospress.nl/doi/10.3233/FAIA230091

  6. [6]

    Ontologies in CLARIAH: Towards Interoperability in History, Language and Media

    Mero ˜no-Pe˜nuela A, de Boer V , van Erp M, Zijdeman R, Mourits R, Melder W, et al. Ontologies in CLARIAH: Towards Interoperability in History, Language and Media. arXiv:200402845 [cs]. 2020 Jul. ArXiv: 2004.02845. Available from:http://arxiv.org/abs/2004.02845

  7. [7]

    Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research

    Tuominen J, Hyvonen E, Leskinen P. Bio CRM: A Data Model for Representing Biographical Data for Prosopographical Research. Biographical Data in a Digital World. 2017:8

  8. [8]

    A Novel Connectionist Sys- tem for Unconstrained Handwriting Recognition

    Graves A, Liwicki M, Fern ´andez S, Bertolami R, Bunke H, Schmidhuber J. A Novel Connectionist Sys- tem for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009;31(5):855-68

  9. [9]

    A Comprehensive Review on Document Image Binarization

    Bataineh B, Tounsi M, Zamzami N, Janbi J, Abu-Ain W AK, AbuAin T, et al. A Comprehensive Review on Document Image Binarization. Journal of Imaging. 2025 Apr;11(5):133

  10. [10]

    Enhancing table recognition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner

    Zhou Y , Cheng M, Mao Q, Wang J, Xu F, Li X. Enhancing table recognition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. IJCAI ’25; 2025. Available from:https://doi.org/10. 24963/ijcai.2025/279

  11. [11]

    DocSpiral: A Platform for Integrated As- sistive Document Annotation through Human-in-the-Spiral

    Sun Q, Li S, Bi T, Huynh DQ, Reynolds M, Luo Y , et al. DocSpiral: A Platform for Integrated As- sistive Document Annotation through Human-in-the-Spiral. In: Mishra P, Muresan S, Yu T, editors. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume 3: System Demonstrations). Vienna, Austria: Association for Comput...

  12. [12]

    Image-Based Table Recognition: Data, Model, and Eval- uation

    Zhong X, ShafieiBavani E, Jimeno Yepes A. Image-Based Table Recognition: Data, Model, and Eval- uation. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI. Berlin, Heidelberg: Springer-Verlag; 2020. p. 564–580. Available from: https://doi.org/10.1007/978-3-030-58589-1_34

  13. [13]

    Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents

    Klut S, Van Koert R, Sluijter R. Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents. In: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing. San Jose CA USA: ACM; 2023. p. 67-72. Available from:https://dl.acm. org/doi/10.1145/3604951.3605520

  14. [14]

    Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents

    Sun Q, Luo Y , Zhang W, Li S, Li J, Niu K, et al. Docs2KG: A Human-LLM Collaborative Approach to Unified Knowledge Graph Construction from Heterogeneous Documents. In: Companion Proceedings of the ACM on Web Conference 2025. Sydney NSW Australia: ACM; 2025. p. 801-4. Available from: https://dl.acm.org/doi/10.1145/3701716.3715309

  15. [15]

    Uncertainty Management in the Construction of Knowledge Graphs: A Survey

    Jarnac L, Chabot Y , Couceiro M. Uncertainty Management in the Construction of Knowledge Graphs: A Survey. Transactions on Graph Data and Knowledge. 2025;3(1):3:1-3:48. Available from:https: //drops.dagstuhl.de/entities/document/10.4230/TGDK.3.1.3

  16. [16]

    LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities

    Zhu Y , Wang X, Chen J, Qiao S, Ou Y , Yao Y , et al. LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities. World Wide Web. 2024 Aug;27(5). Available from:https://doi.org/10.1007/s11280-024-01297-w

  17. [17]

    Trustworthy Knowledge Graphs: Practices and Approaches

    Zhang B, Koutsiana E, Zhao Y , Mero&#241, o Pe&#241, Uela A. Trustworthy Knowledge Graphs: Practices and Approaches. In: Handbook on Neurosymbolic AI and Knowledge Graphs. IOS Press

  18. [18]

    p. 363-84. Available from:https://ebooks.iospress.nl/doi/10.3233/FAIA250215

  19. [19]

    Hybrid Intelligence for Digital Humanities

    de Boer V , Stork L. Hybrid Intelligence for Digital Humanities. In: HHAI 2024: Hybrid Human AI Sys- tems for the Social Good. IOS Press; 2024. p. 94-104. Available from:https://ebooks.iospress. nl/doi/10.3233/FAIA240186

  20. [20]

    FIDES: An ontology-based approach for making machine learning systems accountable

    Fernandez I, Aceta C, Gilabert E, Esnaola-Gonzalez I. FIDES: An ontology-based approach for making machine learning systems accountable. Journal of Web Semantics. 2023 Dec;79:100808. Available from:https://www.sciencedirect.com/science/article/pii/S1570826823000379

  21. [21]

    Semantic May 2026 technologies for historical research: A survey

    Mero ˜no-Pe˜nuela A, Ashkpour A, Van Erp M, Mandemakers K, Breure L, Scharnhorst A, et al. Semantic May 2026 technologies for historical research: A survey. Semantic Web. 2014 Oct;6(6):539-64. Available from: https://journals.sagepub.com/doi/full/10.3233/SW-140158

  22. [22]

    BiographyNet: managing provenance at multiple levels and from different perspectives

    Ockeloen N, Fokkens A, Ter Braake S, V ossen P, De Boer V , Schreiber G, et al. BiographyNet: managing provenance at multiple levels and from different perspectives. In: Proceedings of the 3rd International Conference on Linked Science - V olume 1116. LISC’13. Aachen, DEU: CEUR-WS.org; 2013. p. 59-71

  23. [23]

    LORE: logical location regression network for table structure recognition

    Xing H, Gao F, Long R, Bu J, Zheng Q, Li L, et al. LORE: logical location regression network for table structure recognition. In: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Sym- posium on Educational Advances in Artificial Int...

  24. [24]

    Available from:https://doi.org/10.1609/aaai.v37i3.25402

  25. [25]

    Segment anything

    Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision; 2023. p. 4015-26

  26. [26]

    Advancements and challenges in handwritten Text Recognition: A comprehensive survey

    AlKendi W, Gechter F, Heyberger L, Guyeux C. Advancements and challenges in handwritten Text Recognition: A comprehensive survey. J Imaging. 2024 jan;10(1):18

  27. [27]

    Transkribus - A Service Platform for Transcription, Recog- nition and Retrieval of Historical Documents

    Kahle P, Colutto S, Hackl G, M ¨uhlberger G. Transkribus - A Service Platform for Transcription, Recog- nition and Retrieval of Historical Documents. In: 2017 14th IAPR International Conference on Docu- ment Analysis and Recognition (ICDAR). vol. 04; 2017. p. 19-24

  28. [28]

    Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable

    van Koert R, Klut S, Koornstra T, Maas M, Peters L. Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable. In: Mouch `ere H, Zhu A, editors. Document Analysis and Recognition – ICDAR 2024 Workshops. Cham: Springer Nature Switzerland; 2024. p. 73-88

  29. [29]

    Towards a digital infrastruc- ture for illustrated handwritten archives

    Weber A, Ameryan M, Wolstencroft K, Stork L, Heerlien M, Schomaker L. Towards a digital infrastruc- ture for illustrated handwritten archives. Lecture Notes in Computer Science. 2018 jan;2018:155-66

  30. [30]

    Ehrmann, A

    Ehrmann M, Hamdi A, Pontes EL, Romanello M, Doucet A. Named Entity Recognition and Clas- sification in Historical Documents: A Survey. ACM Comput Surv. 2023 Sep;56(2). Available from: https://doi.org/10.1145/3604931

  31. [31]

    Managing Provenance Data in Knowledge Graph Management Platforms

    Kleinsteuber E, Al Mustafa T, Zander F, K ¨onig-Ries B, Babalou S. Managing Provenance Data in Knowledge Graph Management Platforms. Datenbank-Spektrum. 2024 Mar;24(1):43-52. Available from:https://doi.org/10.1007/s13222-023-00463-0

  32. [32]

    Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks; 2025

    Jain N. Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks; 2025. Available from:https://arxiv.org/abs/2501.09029

  33. [33]

    A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence

    Akata Z, Balliet D, de Rijke M, Dignum F, Dignum V , Eiben G, et al. A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence. Computer. 2020;53(8):18-28

  34. [34]

    The pascal visual object classes (voc) challenge

    Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. International journal of computer vision. 2010;88(2):303-38

  35. [35]

    Tree edit distance: Robust and memory-efficient

    Pawlik M, Augsten N. Tree edit distance: Robust and memory-efficient. Information Systems. 2016;56:157-73. Available from:https://www.sciencedirect.com/science/article/pii/ S0306437915001611

  36. [36]

    Complicated Table Structure Recognition

    Chi Z, Huang H, Xu HD, Yu H, Yin W, Mao XL. Complicated Table Structure Recognition. arXiv; 2019. Available from:https://arxiv.org/abs/1908.04729

  37. [37]

    Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning

    Caufield JH, Hegde H, Emonet V , Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics. 2024;40(3):btae104. Available from:https://doi. org/10.1093/bioinformatics/btae104

  38. [38]

    The hungarian method for the assignment problem,

    Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics (NRL). 2005;52(1):7-21. Available from:https://onlinelibrary.wiley.com/doi/abs/10.1002/nav. 20053

  39. [39]

    Novel Event Detection and Classification for Historical Texts

    Sprugnoli R, Tonelli S. Novel Event Detection and Classification for Historical Texts. Computational Linguistics. 2019 jun;45(2):229-65. Available from:https://aclanthology.org/J19-2002/

  40. [40]

    Documenting Events in Metadata

    Doerr M, Kritsotaki A. Documenting Events in Metadata. In: Ioannides M, Arnold D, Niccolucci F, Ma- nia K, editors. he 7th International Symposium on Virtual Reality, Archaeology and Cultural Heritage V AST; 2006. Available from:https://cidoc-crm.org/sites/default/files/Documenting% 20Events%20in%20Metadata.pdf

  41. [41]

    WarSampo knowledge graph: Fin- land in the Second World War as Linked Open Data

    Koho M, Ikkala E, Leskinen P, Tamper M, Tuominen J, Hyv ¨onen E. WarSampo knowledge graph: Fin- land in the Second World War as Linked Open Data. Semantic Web. 2021 Jan;12(2):265-78. Available from:https://journals.sagepub.com/action/showAbstract