Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
A review summarizing LLM applications for diagnostics and treatment in oncology, dermatology, dentistry, neurodegenerative disorders, and mental health, plus integration challenges.
citing papers explorer
-
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
-
LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties
A review summarizing LLM applications for diagnostics and treatment in oncology, dermatology, dentistry, neurodegenerative disorders, and mental health, plus integration challenges.