Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms
Pith reviewed 2026-05-17 00:17 UTC · model grok-4.3
The pith
Integrating entity linking into RAG improves accuracy for educational QA in specialized domains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ELERAG integrates a Wikidata entity linking module to supply factual signals and applies a hybrid re-ranking strategy based on Reciprocal Rank Fusion. This architecture significantly outperforms baseline RAG and cross-encoder configurations on domain-specific academic datasets for Italian educational question answering. On general-domain datasets, however, cross-encoder approaches yield superior results, illustrating the domain mismatch effect and the advantage of tailored hybrid methods that avoid computationally expensive models trained on mismatched distributions.
What carries the argument
The entity linking module that extracts factual signals from Wikidata to complement semantic similarity in a hybrid re-ranking process using Reciprocal Rank Fusion.
If this is right
- In domain-specific contexts, factual signals from entity linking enhance retrieval relevance beyond pure semantic similarity.
- Hybrid re-ranking strategies like RRF allow effective combination of signals without relying on large cross-encoder models.
- Domain-adapted RAG systems are essential for maintaining factual accuracy in educational applications.
- Entity-aware approaches foster the creation of reliable AI-based tutoring tools in specialized domains.
Where Pith is reading between the lines
- Entity linking enhancements may generalize to other knowledge-intensive domains with specialized terminology, such as scientific or technical fields.
- Reducing errors in the entity linking step for educational terms could yield even stronger performance gains.
- Testing the approach in additional languages beyond Italian would clarify its broader applicability.
Load-bearing premise
Wikidata-based entity linking supplies accurate and unambiguous factual signals that improve retrieval for educational terminology without introducing noise from linking errors or coverage gaps.
What would settle it
Running the system on the custom academic dataset with the entity linking module disabled or with erroneous links, and finding that performance does not decrease or even improves, would contradict the central claim.
Figures
read the original abstract
In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their effectiveness, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes ELERAG, an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements a hybrid re-ranking strategy based on Reciprocal Rank Fusion (RRF). To validate our approach, we compared it against standard baselines and state-of-the-art methods, including a Weighted-Score Re-ranking, a standalone Cross-Encoder and a combined RRF+Cross-Encoder pipeline. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, ELERAG significantly outperforms both the baseline and the Cross-Encoder configurations. Conversely, the Cross-Encoder approaches achieve the best results on the general-domain dataset. These findings provide strong experimental evidence of the domain mismatch effect, highlighting the importance of domain-adapted hybrid strategies to enhance factual precision in educational RAG systems without relying on computationally expensive models trained on disparate data distributions. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ELERAG, an enhanced RAG architecture for Italian educational question-answering that integrates a Wikidata-based Entity Linking module with a hybrid re-ranking strategy based on Reciprocal Rank Fusion (RRF). It compares ELERAG against standard baselines, a Weighted-Score Re-ranking method, a standalone Cross-Encoder, and an RRF+Cross-Encoder pipeline on a custom academic dataset and the SQuAD-it benchmark, claiming significant outperformance by ELERAG in domain-specific contexts and superior Cross-Encoder results on general-domain data. The work highlights a domain mismatch effect and the value of entity-aware approaches for factual precision in educational RAG without expensive domain-specific training.
Significance. If the empirical claims hold after additional validation, the results would be moderately significant for cs.IR and educational AI, as they provide evidence that domain-adapted hybrid retrieval strategies can outperform both pure semantic and cross-encoder baselines in specialized terminology settings. The emphasis on avoiding computationally heavy models trained on mismatched distributions is a practical strength, though the absence of quantitative metrics, ablations, and error analysis currently limits the strength of this contribution.
major comments (3)
- [Abstract] Abstract: the central claim that 'ELERAG significantly outperforms both the baseline and the Cross-Encoder configurations' on the custom academic dataset is unsupported by any reported metrics, error bars, statistical tests, or implementation details, rendering the headline empirical result unverifiable from the given evidence.
- [Experiments] Experiments section: no precision/recall or error analysis is supplied for the Wikidata Entity Linking module on Italian educational terminology, and no ablation isolates the EL signal from the RRF re-ranker; without these, it is impossible to attribute reported gains to the entity-aware component rather than other pipeline choices.
- [Results] Results and discussion: the custom academic dataset is described only at a high level; details on its construction, size, annotation process, and potential biases that might favor entity-linking methods are missing, weakening the domain-specific outperformance claim.
minor comments (2)
- [Abstract] Abstract: the phrase 'domain mismatch effect' is used without a definition or citation to prior domain-adaptation literature in retrieval or RAG.
- [Method] Notation: the description of the hybrid re-ranking strategy would benefit from an explicit equation or pseudocode for the RRF fusion step and any learned or fixed weights.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each of the major comments point by point below, indicating the changes we will make to the revised version.
read point-by-point responses
-
Referee: [Abstract] the central claim that 'ELERAG significantly outperforms both the baseline and the Cross-Encoder configurations' on the custom academic dataset is unsupported by any reported metrics, error bars, statistical tests, or implementation details, rendering the headline empirical result unverifiable from the given evidence.
Authors: We agree that the abstract would be strengthened by including supporting quantitative evidence. The manuscript reports comparative results in the Experiments and Results sections, but to improve verifiability, we will revise the abstract to reference specific performance metrics from our evaluations on the custom academic dataset. We will also incorporate mentions of error analysis and statistical tests in the results discussion and provide more implementation details in the methods section. revision: yes
-
Referee: [Experiments] no precision/recall or error analysis is supplied for the Wikidata Entity Linking module on Italian educational terminology, and no ablation isolates the EL signal from the RRF re-ranker; without these, it is impossible to attribute reported gains to the entity-aware component rather than other pipeline choices.
Authors: This is a valid point. The current manuscript emphasizes the overall system performance. In the revision, we will add an evaluation of the Entity Linking module, including precision and recall metrics on a sample of Italian educational terminology. We will also conduct and report an ablation study that removes the entity linking component to isolate its impact on the final results, allowing better attribution of the observed improvements. revision: yes
-
Referee: [Results] the custom academic dataset is described only at a high level; details on its construction, size, annotation process, and potential biases that might favor entity-linking methods are missing, weakening the domain-specific outperformance claim.
Authors: We acknowledge that additional details are required for full reproducibility and to substantiate the domain-specific claims. We will expand the dataset description to include its size, the process of construction from Italian academic sources, the annotation procedures used for creating question-answer pairs, and a discussion of possible biases, including how the prevalence of specific entities in educational content may interact with our entity linking approach. This expanded description will be added to the Experiments section. revision: yes
Circularity Check
No circularity: empirical pipeline comparison without derivations or self-referential predictions
full rationale
This is a systems paper that implements and experimentally compares an ELERAG pipeline (Wikidata entity linking + RRF re-ranking) against explicit baselines on a custom academic dataset and SQuAD-it. No equations, first-principles derivations, or predictions are presented that could reduce by construction to fitted parameters or prior self-citations. Performance claims rest on direct experimental measurements rather than any load-bearing self-definition or ansatz smuggling. The absence of precision/recall figures for the EL module is a methodological gap but does not constitute circularity in the derivation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- RRF fusion parameters or score weights
axioms (1)
- domain assumption Entity linking to Wikidata supplies reliable factual signals that increase retrieval precision for domain-specific educational terminology
Reference graph
Works this paper leans on
-
[1]
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems 2025, 43. https://doi.org/10.1145/3703155
-
[2]
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; et al. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. Computational Linguistics 2025, pp. 1–46. https://doi.org/10.1162/coli.a.16
-
[3]
Asgari, E.; Montaña-Brown, N.; Dubois, M.; Khalil, S.; et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. npj Digital Medicine 2025, 8. https://doi.org/10.1038/s41746-025-01670-7
-
[4]
Med-HALT: Medical Domain Hallucination Test for Large Language Models
Pal, A.; Umapathi, L.K.; Sankarasubbu, M. Med-HALT: Medical Domain Hallucination Test for Large Language Models. In Proceedings of the Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, 2023. https://doi.org/10.18653/v1/2023.conll-1.21
-
[5]
Qian, K.; Liu, S.; Li, T.; Rakovi´ c, M.; et al. Towards reliable generative AI-driven scaffolding: Reducing hallucinations and enhancing quality in self-regulated learning support. Computers and Education 2026, 240. https://doi.org/10.1016/j.compedu.20 25.105448
-
[6]
Vrdoljak, J.; Boban, Z.; Vilovi´ c, M.; Kumri´ c, M.; Boži´ c, J. A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration. Healthcare 2025, 13. https://doi.org/10.3390/healthcare13060603
-
[7]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, P .; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V .; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the Advances in Neural Information Processing Systems, 2020, Vol. 33, pp. 9459–9474
work page 2020
-
[8]
GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents
Mongiovì, M.; Gangemi, A. GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents. Information 2024, 15, 318. https://doi.org/10.3390/info15060318
- [9]
-
[10]
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions
Shen, W.; Wang, J.; Han, J. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 2015, 27, 443–460. https://doi.org/10.1109/TKDE.2014.2327028
-
[11]
Survey on English Entity Linking on Wikidata: Datasets and Approaches
Möller, C.; Lehmann, J.; Usbeck, R. Survey on English Entity Linking on Wikidata: Datasets and Approaches. Semantic Web 2022, 13, 925–966. https://doi.org/10.3233/SW-212986
-
[12]
Scalable Zero-Shot Entity Linking with Dense Entity Retrieval
Wu, L.; Petroni, F.; Josifoski, M.; Riedel, S.; Zettlemoyer, L. Scalable Zero-Shot Entity Linking with Dense Entity Retrieval. arXiv preprint arXiv:1911.03814 2019. https://doi.org/10.48550/arXiv.1911.03814
-
[13]
Orlando, R.; Cabot, P .L.H.; Barba, E.; Navigli, R. ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget. arXiv preprint arXiv:2408.00103 2024. https://doi.org/10.48550/arXiv.2408.00103
-
[14]
OpenTapioca: Lightweight Entity Linking for Wikidata
Delpeuch, A. OpenTapioca: Lightweight Entity Linking for Wikidata. arXiv preprint arXiv:1904.09131 2019. https://doi.org/10.4 8550/arXiv.1904.09131
-
[15]
Educational AI Chatbots for Content and Language Integrated Learning
Mageira, K.; Pittou, D.; Papasalouros, A.; Kotis, K.; Zangogianni, P .; Daradoumis, A. Educational AI Chatbots for Content and Language Integrated Learning. Applied Sciences 2022, 12, 3239. https://doi.org/10.3390/app12073239. Appl. Sci. 2025, 1, 0 15 of 15
-
[16]
Retrieval-Augmented Generation (RAG) Chatbots for Education: A Survey of Applications
Swacha, J.; Gracel, M. Retrieval-Augmented Generation (RAG) Chatbots for Education: A Survey of Applications. Applied Sciences 2025, 15, 4234. https://doi.org/10.3390/app15084234
-
[17]
Retrieval-Augmented Generation for Educational Application: A Systematic Survey
Li, Z.; Wang, Z.; Wang, W.; Hung, K.; Xie, H.; Wang, F.L. Retrieval-Augmented Generation for Educational Application: A Systematic Survey. Computers and Education: Artificial Intelligence 2025, p. 100417. https://doi.org/10.1016/j.caeai.2024.100417
-
[18]
REAL: A Retrieval-Augmented Entity Linking Approach for Biomedical Concept Recognition
Shlyk, D.; Groza, T.; Montanelli, S.; Cavalleri, E.; Mesiti, M. REAL: A Retrieval-Augmented Entity Linking Approach for Biomedical Concept Recognition. In Proceedings of the Proceedings of the 23rd Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2024, pp. 380–389. https://doi.org/10.18653/v1/2024.bionlp-1.34
-
[19]
Billion-scale similarity search with GPUs
Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 2019, 7, 535–547. https://doi.org/10.1109/TBDATA.2019.2921572
-
[20]
Multilingual E5 Text Embeddings: A Technical Report
Wang, L.; Yang, N.; Huang, X.; Yang, L.; Majumder, R.; Wei, F. Multilingual E5 Text Embeddings: A Technical Report. arXiv preprint arXiv:2402.05672 2024. https://doi.org/10.48550/arXiv.2402.05672
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.05672 2024
- [21]
-
[22]
spaCy: Industrial-Strength Natural Language Processing in Python
Explosion AI. spaCy: Industrial-Strength Natural Language Processing in Python. https://spacy.io, 2023. Version 3.7.2
work page 2023
-
[23]
Scalable Zero-shot Entity Linking with Dense Entity Retrieval
Wu, L.; Petroni, F.; Josifoski, M.; Riedel, S.; Zettlemoyer, L. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 6397–6405. https://doi.org/10.18653/v1/2020.emnlp-main.519
-
[24]
Cormack, G.V .; Clarke, C.L.A. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009, pp. 758–759. https://doi.org/10.1145/1571941.1572114
-
[25]
Cross-Encoder: Sentence Transformers
Reimers, N.; Gurevych, I. Cross-Encoder: Sentence Transformers. https://www.sbert.net/examples/applications/cross- encoder/, 2020. Accessed: September 2025
work page 2020
-
[26]
Robust Speech Recognition via Large-Scale Weak Supervision
Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv preprint arXiv:2212.04356 2022. https://doi.org/10.48550/arXiv.2212.04356
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.04356 2022
-
[27]
Neural Learning for Question Answering in Italian
Croce, D.; Zelenanska, A.; Basili, R. Neural Learning for Question Answering in Italian. In Proceedings of the AI*IA 2018 – Advances in Artificial Intelligence. Springer, 2018, pp. 389–402. https://doi.org/10.1007/978-3-030-03840-3_29. Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the indi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.