DELICATE: Diachronic Entity LInking using Classes And Temporal Evidence
Pith reviewed 2026-05-25 08:03 UTC · model grok-4.3
The pith
DELICATE uses temporal and type information from Wikidata to link entities in historical Italian more accurately than larger models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that filtering candidate entities using temporal evidence and class consistency from Wikidata, in combination with contextual embeddings, allows for more accurate and interpretable entity linking in diachronic Italian texts than purely neural alternatives.
What carries the argument
Neuro-symbolic selection mechanism that applies temporal plausibility and entity type consistency constraints from Wikidata to candidates generated by a BERT-based encoder.
If this is right
- DELICATE achieves higher performance than competing EL models on historical Italian data.
- The system yields more explainable results via analysis of confidence scores and feature sensitivity.
- It handles long-tail entities effectively in texts from the 19th to 20th centuries.
- The ENEIDE corpus provides a new resource for training and evaluating EL models in this domain.
Where Pith is reading between the lines
- This filtering strategy might generalize to entity linking tasks in other languages with structured knowledge bases containing temporal data.
- Combining symbolic constraints with neural encoders could address similar challenges in other specialized domains like legal or scientific historical documents.
- Further work could test whether updating Wikidata with more historical details would enhance the method's coverage for rare entities.
Load-bearing premise
Wikidata supplies sufficiently accurate and complete temporal and type information to correctly filter candidate entities for long-tail historical mentions in Italian texts.
What would settle it
Running DELICATE on a collection of historical Italian texts for which independent verification shows Wikidata's temporal or type data to be inaccurate or missing for many entities, and observing that performance drops below that of baseline models.
read the original abstract
In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document typologies, lack of domain-specific datasets and models, and long-tail entities, i.e., entities under-represented in Knowledge Bases (KBs). The goal of this paper is to address these issues with two main contributions. The first contribution is DELICATE, a novel neuro-symbolic method for EL on historical Italian which combines a BERT-based encoder with contextual information from Wikidata to select appropriate KB entities using temporal plausibility and entity type consistency. The second contribution is ENEIDE, a multi-domain EL corpus in historical Italian semi-automatically extracted from two annotated editions spanning from the 19th to the 20th century and including literary and political texts. Results show how DELICATE outperforms other EL models in historical Italian even if compared with larger architectures with billions of parameters. Moreover, further analyses reveal how DELICATE confidence scores and features sensitivity provide results which are more explainable and interpretable than purely neural methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DELICATE, a neuro-symbolic entity linking approach for historical Italian that augments a BERT encoder with Wikidata-derived temporal plausibility and entity-type consistency filters, and presents the ENEIDE corpus extracted from 19th–20th century literary and political texts. It claims that DELICATE outperforms existing EL models—including architectures with billions of parameters—and yields more interpretable results via confidence scores and feature sensitivity.
Significance. If the performance and interpretability claims are substantiated with rigorous evaluation, the work would advance diachronic EL for humanities texts by addressing long-tail entities through lightweight symbolic constraints rather than scale alone; the ENEIDE resource could also support further research in low-resource historical domains.
major comments (2)
- [Abstract, §4] Abstract and §4 (results): the central claim that DELICATE outperforms larger models rests on quantitative evidence that is not supplied in the abstract and whose details (metrics, baselines, dataset statistics, error analysis) must be verified in the full results section; without these, the headline result cannot be evaluated.
- [§3] §3 (method): the neuro-symbolic advantage is predicated on Wikidata supplying accurate and high-coverage temporal dates and type information for long-tail historical Italian entities in ENEIDE; no coverage statistics, ablation on filter accuracy, or failure-case analysis for sparse Wikidata records is provided, which directly bears on whether the filtering step improves or degrades the underlying BERT encoder.
minor comments (1)
- [Abstract] The abstract states that further analyses reveal explainability advantages but does not specify the exact features or sensitivity metrics used.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify how to strengthen the presentation of our results and the justification for the neuro-symbolic components. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (results): the central claim that DELICATE outperforms larger models rests on quantitative evidence that is not supplied in the abstract and whose details (metrics, baselines, dataset statistics, error analysis) must be verified in the full results section; without these, the headline result cannot be evaluated.
Authors: The full manuscript already supplies the requested details in Section 4, including precision/recall/F1 scores, comparisons against baselines (including models with billions of parameters), ENEIDE dataset statistics, and error analysis. To make the central claim immediately verifiable from the abstract, we will revise the abstract to include a concise summary of the key quantitative results. revision: yes
-
Referee: [§3] §3 (method): the neuro-symbolic advantage is predicated on Wikidata supplying accurate and high-coverage temporal dates and type information for long-tail historical Italian entities in ENEIDE; no coverage statistics, ablation on filter accuracy, or failure-case analysis for sparse Wikidata records is provided, which directly bears on whether the filtering step improves or degrades the underlying BERT encoder.
Authors: We agree that explicit coverage statistics and targeted ablations would strengthen the justification for the symbolic filters. In the revised version we will add (i) Wikidata coverage statistics for the entities appearing in ENEIDE, (ii) an ablation that isolates the contribution of the temporal and type-consistency filters, and (iii) a short failure-case analysis highlighting instances where sparse Wikidata records limit the filters. These additions will clarify when the neuro-symbolic step improves versus degrades the BERT encoder. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents DELICATE as a neuro-symbolic EL method that augments a BERT encoder with external Wikidata temporal plausibility and entity-type consistency checks. No equations, parameter-fitting steps, or predictions are described. No self-citations appear in the provided text, and the performance claims rest on empirical comparison against other models on the independently constructed ENEIDE corpus rather than any reduction of outputs to inputs by construction. The approach is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wikidata contains accurate temporal information and entity types usable for filtering candidate links in historical texts
Reference graph
Works this paper leans on
-
[1]
J Comput Cult Herit https://doi.org/10.1145/3725534, URL https: //doi.org/10.1145/3725534
Barzaghi S, Palmero Aprosio A, Paolucci F, et al (2025) The semantic digi- tal edition of aldo moro’s writings: A workflow supporting data sharing and replicability. J Comput Cult Herit https://doi.org/10.1145/3725534, URL https: //doi.org/10.1145/3725534
-
[2]
Carriero VA, Gangemi A, Mancinelli ML, et al (2019) Arco: The italian cultural heritage knowledge graph. In: The Semantic Web–ISWC 2019: 18th Interna- tional Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part II 18, Springer, pp 36–52
work page 2019
-
[3]
Journal of Open Humanities Data 3:2–2
Colavizza G, Romanello M (2017) Annotated References in the Historiography on Venice: 19th–21st centuries. Journal of Open Humanities Data 3:2–2. https:// 23 doi.org/10.5334/johd.9, URL https://account.openhumanitiesdata.metajnl.com/ index.php/up-j-johd/article/view/johd.9
-
[4]
International Journal of Information Science and Technology 9(1):42–53
Cristofaro S, Del Grosso AM, Mazzagufo L, et al (2025) Implementing collab- orative digital scholarly editions: Insights from bellini digital correspondence. International Journal of Information Science and Technology 9(1):42–53
work page 2025
-
[5]
Transactions of the Association for Computational Linguistics 10:274–290
De Cao N, Wu L, Popat K, et al (2022) Multilingual autoregressive entity linking. Transactions of the Association for Computational Linguistics 10:274–290
work page 2022
-
[6]
Devlin J, Chang MW, Lee K, et al (2019) BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association fo...
-
[7]
Ehrmann M, Romanello M, Najem-Meyer S, et al (2022) Overview of hipe-2022: named entity recognition and linking in multilingual historical documents. In: International Conference of the Cross-Language Evaluation Forum for European Languages, Springer, pp 423–446
work page 2022
-
[8]
Ehrmann M, Hamdi A, Pontes EL, et al (2023) Named entity recognition and classification in historical documents: A survey. ACM Comput Surv 56(2). https: //doi.org/10.1145/3604931, URL https://doi.org/10.1145/3604931
-
[9]
Artificial Intelligence Review 58(5):140
Graciotti A, Lazzari N, Presutti V, et al (2025) Musical heritage historical entity linking. Artificial Intelligence Review 58(5):140
work page 2025
-
[10]
Graciotti A, Piano L, Lazzari N, et al (2025) Ke-mhisto: Towards a multilingual historical knowledge extraction benchmark for addressing the long-tail problem. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria
work page 2025
-
[11]
Grattafiori A, et al (2024) The llama 3 herd of models. URL https://arxiv.org/ abs/2407.21783, arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
IEEE Transactions on Big Data 7(3):535–547
Johnson J, Douze M, J´ egou H (2019) Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7(3):535–547
work page 2019
-
[13]
Kolitsas N, Ganea OE, Hofmann T (2018) End-to-end neural entity linking. In: Korhonen A, Titov I (eds) Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, Brus- sels, Belgium, pp 519–529, https://doi.org/10.18653/v1/K18-1050, URL https: //aclanthology.org/K18-1050 24
-
[14]
Labusch K, Neudecker C (2020) Named Entity Disambiguation and Linking Historic Newspaper OCR with BERT. In: CLEF (Working Notes), p 33, URL https://ceur-ws.org/Vol-2696/paper 163.pdf?ref=https://githubhelp.com
work page 2020
-
[15]
Limkonchotiwat P, Cheng W, Christodoulopoulos C, et al (2023) mReFinED: An efficient end-to-end multilingual entity linking system. In: Bouamor H, Pino J, Bali K (eds) Findings of the Association for Computational Lin- guistics: EMNLP 2023. Association for Computational Linguistics, Singapore, pp 15080–15089, https://doi.org/10.18653/v1/2023.findings-emnl...
-
[16]
International journal on digital libraries pp 1–28
Linhares Pontes E, Cabrera-Diego LA, Moreno JG, et al (2022) Melhissa: a multilingual entity linking architecture for historical press articles. International journal on digital libraries pp 1–28
work page 2022
-
[17]
Paccosi T, Palmero Aprosio A (2022) KIND: an Italian multi-domain dataset for named entity recognition. In: Calzolari N, B´ echet F, Blache P, et al (eds) Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 501–507, URL https://aclanthology.org/2022.lrec-1.52
work page 2022
-
[18]
Pozzi R, Rubini R, Bernasconi C, et al (2023) Named entity recognition and linking for entity extraction from italian civil judgements. In: International Conference of the Italian Association for Artificial Intelligence, Springer, pp 187–201
work page 2023
-
[19]
Journal of Open Humanities Data 10(1)
Romanello M, Najem-Meyer S (2024) A Named Entity-Annotated Cor- pus of 19th Century Classical Commentaries. Journal of Open Humanities Data 10(1). https://doi.org/10.5334/johd.150, URL https://openhumanitiesdata. metajnl.com/articles/10.5334/johd.150
-
[20]
Sahle P (2016) What is a scholarly digital edition? Digital scholarly edit- ing: Theories and practices 1:19–39. URL https://m.m88hub.co/dlib/bitstream/ HUST/23724/1/OER000002860.pdf#page=38, publisher: Open Book Publishers Cambridge
work page 2016
-
[21]
Frontiers in Computer Science 6:1472512
Santini C (2024) Combining language models for knowledge extraction from italian tei editions. Frontiers in Computer Science 6:1472512
work page 2024
-
[22]
CEUR Workshop Proceedings 3234
Santini C, Tan MA, Bruns O, et al (2022) Knowledge Extraction for Art His- tory: the Case of Vasari’s The Lives of The Artists (1568). CEUR Workshop Proceedings 3234. URL https://ceur-ws.org/Vol-3234/paper7.pdf
work page 2022
-
[23]
In: CEUR Workshop Proceedings, p 11, URL https://ceur-ws.org/Vol-3967/X-TAIL-2024 paper 1.pdf 25
Santini C, Melosi L, Frontoni E (2024) Named Entity Recognition in Histor- ical Italian: The Case of Giacomo Leopardi’s Zibaldone. In: CEUR Workshop Proceedings, p 11, URL https://ceur-ws.org/Vol-3967/X-TAIL-2024 paper 1.pdf 25
work page 2024
-
[24]
Sevgili ¨O, Shelmanov A, Arkhipov M, et al (2022) Neural entity linking: A survey of models based on deep learning. Semantic Web 13(3):527–570
work page 2022
-
[25]
IJCoL Italian Journal of Computational Linguistics 2(2-2):89–99
Sprugnoli R, Tonelli S, Moretti G, et al (2016) Fifty years of european history through the lens of computational linguistics: the de gasperi project. IJCoL Italian Journal of Computational Linguistics 2(2-2):89–99
work page 2016
-
[26]
Stoyanova S (2023) Working with the Digital Edition of Giacomo Leopardi’s Zibaldone. magaz´ en 4(3):13
work page 2023
-
[27]
Suchanek FM, Alam M, Bonald T, et al (2024) Yago 4.5: A large and clean knowledge base with a rich taxonomy. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 131–140
work page 2024
-
[28]
Communications of the ACM 57(10):78–85
Vrandeˇ ci´ c D, Kr¨ otzsch M (2014) Wikidata: a free collaborative knowledgebase. Communications of the ACM 57(10):78–85
work page 2014
-
[29]
Wu L, Petroni F, Josifoski M, et al (2020) Scalable zero-shot entity linking with dense entity retrieval. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 6397–6407
work page 2020
-
[30]
Yasodhara A, Asgarian A, Huang D, et al (2021) On the trustworthiness of tree en- semble explainability methods. In: Machine Learning and Knowledge Extraction: 5th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2021, Virtual Event, August 17–20, 2021, Proceedings 5, Springer, pp 293–308
work page 2021
-
[31]
arXiv preprint arXiv:231108526 26
Zaratiana U, Tomeh N, Holat P, et al (2023) Gliner: Generalist model for named entity recognition using bidirectional transformer. arXiv preprint arXiv:231108526 26
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.