LLMs rush to diagnose in multi-turn medical scenarios with over 55% committing in the first two turns, show strong self-correction potential, and are vulnerable to salient lures, but deferring questions can raise accuracy by up to 62.6%.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking Multi-turn Medical Diagnosis: Hold, Lure, and Self-Correction
LLMs rush to diagnose in multi-turn medical scenarios with over 55% committing in the first two turns, show strong self-correction potential, and are vulnerable to salient lures, but deferring questions can raise accuracy by up to 62.6%.