Introduces the UCSF-PDGM-VQA dataset of 2387 QA pairs from 473 glioma MRI studies and demonstrates that state-of-the-art VLMs exhibit modality collapse on multi-sequence 3D medical images.
BiomedCLIP: A multimodal biomedical foun- dation model pretrained from fifteen million scientific image-text pairs.NEJM AI
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.
MediSyn is a generalist latent diffusion model that synthesizes text-guided medical images across multiple specialties and modalities from public data and improves downstream classifiers in low-data settings.
citing papers explorer
-
UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation
Introduces the UCSF-PDGM-VQA dataset of 2387 QA pairs from 473 glioma MRI studies and demonstrates that state-of-the-art VLMs exhibit modality collapse on multi-sequence 3D medical images.
-
Representation geometry shapes task performance in vision-language modeling for CT enterography
Mean pooling and multi-window RGB encoding optimize vision-language performance on CT enterography, with retrieval-augmented generation substantially improving automated report severity accuracy over fine-tuning alone.
-
A Generalist Model for Diverse Text-Guided Medical Image Synthesis
MediSyn is a generalist latent diffusion model that synthesizes text-guided medical images across multiple specialties and modalities from public data and improves downstream classifiers in low-data settings.