MEDSYN benchmark shows MLLMs match experts on differential diagnosis lists but have much larger gaps to final diagnosis selection than humans, due to text overreliance and cross-modal evidence gaps.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
MEDSYN: Benchmarking Multi-EviDence SYNthesis in Complex Clinical Cases for Multimodal Large Language Models
MEDSYN benchmark shows MLLMs match experts on differential diagnosis lists but have much larger gaps to final diagnosis selection than humans, due to text overreliance and cross-modal evidence gaps.