Beyond idealized patients: Evaluating LLMs under challenging patient behaviors in medical consultations.arXiv preprint arXiv:2603.29373

Yahan Li, Xinyi Jie, Wanjia Ruan, Xubei Zhang, Huaijie Zhu, Yicheng Gao, Chaohao Du, Ruishan Liu · arXiv 2603.29373

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

cs.CL · 2026-06-23 · unverdicted · novelty 5.0 · 2 refs

MedBench v5 is a dynamic benchmark with process auditing, information stressors, and hallucination monitoring for clinical multimodal models across 63 tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors cs.CL · 2026-04-08 · unverdicted · none · ref 7
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models cs.CL · 2026-06-23 · unverdicted · none · ref 11 · 2 links
MedBench v5 is a dynamic benchmark with process auditing, information stressors, and hallucination monitoring for clinical multimodal models across 63 tasks.

Beyond idealized patients: Evaluating LLMs under challenging patient behaviors in medical consultations.arXiv preprint arXiv:2603.29373

fields

years

verdicts

representative citing papers

citing papers explorer