On 240 clinician-graded decisions from 20 wound cases, ChatGPT scored 72.5% while the best medical VLM (HuluMed) scored 40%.
How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Evaluation of Medical Vision Language Models HuluMed and MedGemma, and general purpose chatbots Gemma 3, ChatGPT Plus, and Claude Pro on real previously unseen wound images
On 240 clinician-graded decisions from 20 wound cases, ChatGPT scored 72.5% while the best medical VLM (HuluMed) scored 40%.