A parallel multi-turn medical dialogue dataset spanning English and nine Indic languages is created from synthetic consultations to enable personalized AI healthcare interactions.
Meddialogrubrics: A com- prehensive benchmark and evaluation framework for multi-turn medical consultations in large language models.arXiv preprint arXiv:2601.03023
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
MedConceal provides 300 cases and a simulator that withholds hidden concerns to evaluate confirmation and intervention in medical dialogue, finding frontier models vary on surfacing concerns while humans outperform on guiding patients to care plans.
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
citing papers explorer
-
IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages
A parallel multi-turn medical dialogue dataset spanning English and nine Indic languages is created from synthetic consultations to enable personalized AI healthcare interactions.
-
MedConceal: A Benchmark for Clinical Hidden-Concern Reasoning Under Partial Observability
MedConceal provides 300 cases and a simulator that withholds hidden concerns to evaluate confirmation and intervention in medical dialogue, finding frontier models vary on surfacing concerns while humans outperform on guiding patients to care plans.
-
MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.