Generative AI-Based Virtual Assistant using Retrieval-Augmented Generation: An evaluation study for bachelor projects
Pith reviewed 2026-05-13 22:43 UTC · model grok-4.3
The pith
Retrieval-augmented generation lets a virtual assistant give reliable answers to bachelor students' questions on project regulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a virtual assistant based on a Retrieval-Augmented Generation system that enhances the accuracy and reliability of responses by integrating up-to-date, domain-specific knowledge. Through a robust evaluation framework and real-life testing, we demonstrate that our virtual assistant can effectively meet the needs of students while addressing the inherent challenges of applying Large Language Models to a specialized educational context.
What carries the argument
Retrieval-Augmented Generation system that pulls relevant project-regulation documents into each prompt so the language model produces context-specific answers.
If this is right
- The assistant supplies timely and accurate information on project rules without requiring staff to answer every query.
- RAG integration reduces the rate of incorrect or incomplete responses compared with a plain language model.
- A repeatable evaluation framework can measure success for similar assistants in other specialized university contexts.
- Real-life student testing provides evidence that the system handles the practical demands of an educational setting.
Where Pith is reading between the lines
- Periodic refresh of the retrieved document store would be required to keep answers current when rules change.
- The same retrieval approach could support queries in other narrow administrative domains such as course registration or exam policies.
- Combining the assistant with a feedback loop that logs and corrects errors could further improve reliability over time.
- Deployment at additional universities would test whether the same RAG setup transfers without major redesign.
Load-bearing premise
Adding current domain documents through retrieval is enough to cut hallucinations and missing facts so the assistant gives correct answers on project regulations.
What would settle it
Run the same student queries on the live system and check whether any answers still contain wrong or missing regulation details that the source documents actually cover.
Figures
read the original abstract
Large Language Models have been increasingly employed in the creation of Virtual Assistants due to their ability to generate human-like text and handle complex inquiries. While these models hold great promise, challenges such as hallucinations, missing information, and the difficulty of providing accurate and context-specific responses persist, particularly when applied to highly specialized content domains. In this paper, we focus on addressing these challenges by developing a virtual assistant designed to support students at Maastricht University in navigating project-specific regulations. We propose a virtual assistant based on a Retrieval-Augmented Generation system that enhances the accuracy and reliability of responses by integrating up-to-date, domain-specific knowledge. Through a robust evaluation framework and real-life testing, we demonstrate that our virtual assistant can effectively meet the needs of students while addressing the inherent challenges of applying Large Language Models to a specialized educational context. This work contributes to the ongoing discourse on improving LLM-based systems for specific applications and highlights areas for further research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the development of a generative AI-based virtual assistant using Retrieval-Augmented Generation (RAG) to assist bachelor students at Maastricht University with project-specific regulations. The authors claim that integrating domain-specific knowledge via RAG addresses challenges such as hallucinations and missing information in LLMs, and through a robust evaluation framework and real-life testing, demonstrate that the system effectively meets student needs.
Significance. If substantiated with quantitative evidence, the work could provide a practical case study on applying RAG to specialized educational domains. However, the lack of reported metrics, baselines, or detailed evaluation protocols significantly diminishes its potential impact and contribution to the field.
major comments (1)
- Abstract: The central claim that the virtual assistant 'can effectively meet the needs of students while addressing the inherent challenges' via 'a robust evaluation framework and real-life testing' is unsupported by any quantitative data. No accuracy, F1, hallucination rates, test-set size, baseline comparisons to non-RAG LLMs, or statistical tests are reported, leaving the assertion that RAG 'sufficiently reduces' hallucinations unverifiable.
minor comments (1)
- Abstract: The phrase 'robust evaluation framework' is used without even a high-level indication of the metrics or protocol, which weakens the summary of the contribution.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We appreciate the constructive feedback on the evaluation aspects of our RAG-based virtual assistant for Maastricht University bachelor project regulations. We address the major comment point by point below and commit to revisions that strengthen the quantitative support for our claims.
read point-by-point responses
-
Referee: Abstract: The central claim that the virtual assistant 'can effectively meet the needs of students while addressing the inherent challenges' via 'a robust evaluation framework and real-life testing' is unsupported by any quantitative data. No accuracy, F1, hallucination rates, test-set size, baseline comparisons to non-RAG LLMs, or statistical tests are reported, leaving the assertion that RAG 'sufficiently reduces' hallucinations unverifiable.
Authors: We agree that the abstract's claims would be more compelling with explicit quantitative metrics. The full manuscript details a real-life testing process involving student queries on project regulations, with qualitative observations on response relevance and reduced hallucinations due to the RAG retrieval step. However, to address this concern directly, we will revise the abstract to report key figures from our evaluation (e.g., number of test queries, observed accuracy on factual correctness, and notes on hallucination instances before/after RAG). We will also expand the evaluation section with a table summarizing the test-set size, protocol, and any internal baseline comparisons to a non-RAG LLM setup. revision: yes
Circularity Check
No circularity in empirical evaluation study
full rationale
The paper is an empirical evaluation of a RAG-based virtual assistant with no equations, derivations, parameter fittings, or self-referential definitions. Claims rest on a described evaluation framework and real-life testing rather than any mathematical chain that reduces to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear. This is a standard applied AI evaluation paper with no detectable circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Virtual assistants for learning: A systematic literature review
Regina Gubareva and Rui Lopes. Virtual assistants for learning: A systematic literature review. InProceedings of the 12th International Conference on Computer Supported Education - Volume 1: CSEDU,, pages 97–103. Institute for Systems and Technologies of Information, Control and Communication, SciTePress, 2020
work page 2020
-
[2]
Retrieval-augmented generation for knowledge-intensive nlp tasks, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2020
work page 2020
-
[3]
A survey on rag meeting llms: Towards retrieval-augmented large language models
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. A survey on rag meeting llms: Towards retrieval-augmented large language models. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Knowledge Discovery and Data Mining ’24, page 6491–6501, New York, NY, USA, 2024. Asso...
work page 2024
-
[4]
Grape: Knowledge graph enhanced passage reader for open-domain question answering
Mingxuan Ju, Wenhao Yu, Tong Zhao, Chuxu Zhang, and Yanfang Ye. Grape: Knowledge graph enhanced passage reader for open-domain question answering. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, edi- tors,Findings of the Association for Computational Linguistics: The 2022 Conference on Empirical Methods in Natural Language Processing, pages 169–181, A...
work page 2022
-
[5]
Generative ai based virtual assistant for reconciliation research
Daksha Yadav, Sabrina Zhang, Tom Jin, Prakash Krishnan, and Des Clarke. Generative ai based virtual assistant for reconciliation research. InThe Association for the Advancement of Artificial Intelligence 2024 Workshop on AI for Financial Services, 2024
work page 2024
-
[6]
Retrieval-augmented generation for natural language processing: A survey, 2024
Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, and Chun Jason Xue. Retrieval-augmented generation for natural language processing: A survey, 2024
work page 2024
-
[7]
Retrieval-augmented generation for large language models: A survey, 2024
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024
work page 2024
-
[8]
Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy
Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Lin-
-
[9]
Association for Computational Linguistics
CONCLUSIONS 17 guistics: The 2023 Conference on Empirical Methods in Natural Language Processing, pages 9248–9274, Singapore, December 2023. Association for Computational Linguistics
work page 2023
-
[10]
Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge.Computing Research Repository, abs/2310.11703, 2023
-
[11]
https://cloud.google.com/vertex-ai/generative-ai/docs/ model-reference/text-embeddings-api
Text embeddings API, Generative AI on Vertex AI, Google Cloud. https://cloud.google.com/vertex-ai/generative-ai/docs/ model-reference/text-embeddings-api. [Accessed 11-07-2024]
work page 2024
-
[12]
OpenAI. Open AI Text Embedding Model. https://platform.openai. com/docs/guides/embeddings. [Accessed 11-07-2024]
work page 2024
-
[13]
https: //docs.mistral.ai/capabilities/embeddings/
Embeddings — Mistral AI Large Language Models — docs.mistral.ai. https: //docs.mistral.ai/capabilities/embeddings/. [Accessed 11-07-2024]
work page 2024
-
[14]
Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2024
work page 2024
-
[15]
Rajvardhan Patil, Sorio Boit, Venkat Gudivada, and Jagadeesh Nandigam. A survey of text representation and embedding techniques in nlp.Institute of Electrical and Electronics Engineers Access, 11:36120–36146, 2023
work page 2023
-
[16]
Retrieve anything to augment large language models, 2023
Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, and Jian-Yun Nie. Retrieve anything to augment large language models, 2023
work page 2023
-
[17]
OpenAI. Prompt engineering. https://platform.openai.com/docs/ guides/prompt-engineering/strategy-provide-reference-text . [Ac- cessed 12-07-2024]
work page 2024
-
[18]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua...
work page 2024
-
[19]
RA- GAs: Automated evaluation of retrieval augmented generation
Shahul Es, Jithin James, Luis Espinosa Anke, and Steven Schockaert. RA- GAs: Automated evaluation of retrieval augmented generation. In Nikolaos Aletras and Orphee De Clercq, editors,Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 150–158, St. Julians, Malta, March 2...
work page 2024
-
[20]
Metrics — component-wise evaluation
Ragas Documentation. Metrics — component-wise evaluation. https: //docs.ragas.io/en/stable/concepts/metrics/index.html, 2024. [Ac- cessed July 2024]. 18 D. Ver¸ sebeniuc et al
work page 2024
-
[21]
Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. Benchmarking large language models in retrieval-augmented generation.Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17754–17762, 03 2024
work page 2024
-
[22]
arXiv preprint arXiv:2404.02060 , year=
Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, and Wenhu Chen. Long- context llms struggle with long in-context learning.Computing Research Repository, abs/2404.02060, 2024
-
[23]
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630(8017):625–630, Jun 2024
work page 2024
-
[24]
Precise zero- shot dense retrieval without relevance labels
Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. Precise zero- shot dense retrieval without relevance labels. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1762–1777, Toronto, Canada, July 2023. Association for Comput...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.