Medriskeval: Medical risk evaluation benchmark of language models, on the importance of user perspectives in healthcare settings

Jean-Philippe Corbeil, Minseon Kim, Maxime Griot, Sheela Agarwal, Alessandro Sordoni, François Beaulieu, Paul V ozila · 2026 · DOI 10.18653/v1/2026.eacl-industry.39

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

LLMs drop from 71.1% to 38.0% accuracy on medical questions when misleading context is injected, measured via new MedMisBench benchmark with 10,932 items.

citing papers explorer

Showing 1 of 1 citing paper.

Measuring Epistemic Resilience of LLMs Under Misleading Medical Context cs.CL · 2026-06-10 · unverdicted · none · ref 8
LLMs drop from 71.1% to 38.0% accuracy on medical questions when misleading context is injected, measured via new MedMisBench benchmark with 10,932 items.

Medriskeval: Medical risk evaluation benchmark of language models, on the importance of user perspectives in healthcare settings

fields

years

verdicts

representative citing papers

citing papers explorer