LLMs hallucinate in 19.7% of textbook-grounded medical QA answers despite high plausibility scores, indicating they remain unfit for unsupervised clinical use.
Trustworthy medical question answering: An evaluation-centric survey
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Quantifying Hallucinations in Language Language Models on Medical Textbooks
LLMs hallucinate in 19.7% of textbook-grounded medical QA answers despite high plausibility scores, indicating they remain unfit for unsupervised clinical use.