MRI-Eval benchmark shows frontier LLMs scoring 93-97% on MRI MCQs but falling to 37-61% on stem-only questions, with GE scanner operations as the weakest category for all models.
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Med- ical domain Question Answering
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.IV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge
MRI-Eval benchmark shows frontier LLMs scoring 93-97% on MRI MCQs but falling to 37-61% on stem-only questions, with GE scanner operations as the weakest category for all models.