MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.
arXiv preprint arXiv:2501.09213 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Across 504 configurations on five-year ADRD prediction, rationale-based supervised fine-tuning consistently degrades performance relative to label-only fine-tuning, despite high-quality rationales validated by experts.
MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.
citing papers explorer
-
MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation
MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.