Multi-turn evidence seeking reduces LLM diagnostic accuracy by 12.75% and supporting-evidence quality by 24.36% versus full-context evaluation in a new OSCE-inspired benchmark across 468 cases and 15 models.
None”) - (2) Laboratory tests: [Key abnormal results] (or “None
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Active Evidence-Seeking and Diagnostic Reasoning in Large Language Models for Clinical Decision Support
Multi-turn evidence seeking reduces LLM diagnostic accuracy by 12.75% and supporting-evidence quality by 24.36% versus full-context evaluation in a new OSCE-inspired benchmark across 468 cases and 15 models.