RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions

Angus Roberts; Chloe Simela; Gregory Kell; Iain J. Marshall; Jack Coumbe; Julian Rozario; Najma Ahmed; Nikhil Patel; Ryan-Rhys Griffiths; Serge Umansky

arxiv: 2408.08624 · v1 · pith:2EFITWGInew · submitted 2024-08-16 · 💻 cs.CL · cs.AI

RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions

Gregory Kell , Angus Roberts , Serge Umansky , Yuti Khare , Najma Ahmed , Nikhil Patel , Chloe Simela , Jack Coumbe

show 3 more authors

Julian Rozario Ryan-Rhys Griffiths Iain J. Marshall

This is my paper

classification 💻 cs.CL cs.AI

keywords questionsclinicalanswersdatasetrealmedqaansweringassessbeen

0 comments

read the original abstract

Clinical question answering systems have the potential to provide clinicians with relevant and timely answers to their questions. Nonetheless, despite the advances that have been made, adoption of these systems in clinical settings has been slow. One issue is a lack of question-answering datasets which reflect the real-world needs of health professionals. In this work, we present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We describe the process for generating and verifying the QA pairs and assess several QA models on BioASQ and RealMedQA to assess the relative difficulty of matching answers to questions. We show that the LLM is more cost-efficient for generating "ideal" QA pairs. Additionally, we achieve a lower lexical similarity between questions and answers than BioASQ which provides an additional challenge to the top two QA models, as per the results. We release our code and our dataset publicly to encourage further research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation
cs.CL 2026-04 unverdicted novelty 7.0

EuropeMedQA is presented as the first comprehensive multilingual and multimodal medical examination dataset drawn from official regulatory exams in four European countries.