How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study,

· 2025 · arXiv 2507.11200

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.CV · 2026-06-16 · unverdicted · novelty 3.0

On 240 clinician-graded decisions from 20 wound cases, ChatGPT scored 72.5% while the best medical VLM (HuluMed) scored 40%.

Showing 1 of 1 citing paper after filters.

Evaluation of Medical Vision Language Models HuluMed and MedGemma, and general purpose chatbots Gemma 3, ChatGPT Plus, and Claude Pro on real previously unseen wound images cs.CV · 2026-06-16 · unverdicted · none · ref 10
On 240 clinician-graded decisions from 20 wound cases, ChatGPT scored 72.5% while the best medical VLM (HuluMed) scored 40%.