Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
REVEAL uses vision-language alignment of retinal morphometry and clinical risk narratives plus group contrastive learning to predict AD and dementia about 8 years early.
citing papers explorer
-
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation
Auditing five frontier VLMs reveals severe grounding failures (max 0.23 IoU, 19.1% Acc@0.5) and format collapse (up to 99% parse failure) in medical VQA; fine-tuning yields 85.5% SLAKE recall but perception remains the primary trustworthiness issue.
-
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
REVEAL uses vision-language alignment of retinal morphometry and clinical risk narratives plus group contrastive learning to predict AD and dementia about 8 years early.