MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
Beyond idealized patients: Evaluating LLMs under challenging patient behaviors in medical consultations.arXiv preprint arXiv:2603.29373
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MedBench v5 is a dynamic benchmark with process auditing, information stressors, and hallucination monitoring for clinical multimodal models across 63 tasks.
citing papers explorer
-
MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
-
MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models
MedBench v5 is a dynamic benchmark with process auditing, information stressors, and hallucination monitoring for clinical multimodal models across 63 tasks.