MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
Nature, pages 1–7
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
APP is a multi-turn LLM framework for medical dialogue that combines empathetic questioning, Bayesian active learning, and guideline-based reasoning, outperforming baselines on a new simulated-patient benchmark in accuracy, uncertainty reduction, and user experience.
citing papers explorer
-
MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
-
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
APP is a multi-turn LLM framework for medical dialogue that combines empathetic questioning, Bayesian active learning, and guideline-based reasoning, outperforming baselines on a new simulated-patient benchmark in accuracy, uncertainty reduction, and user experience.