Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
Med-flamingo: a multimodal medical few-shot learner (2023).URL: https://arxiv
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Introduces Wasserstein equilibrium decoding that improves accuracy and convergence speed for small VLMs on medical VQA benchmarks by using semantic consensus instead of lexical order.
BiomedAP improves robustness of biomedical VLMs to prompt variations using gated cross-modal fusion and dual-anchor constraints, outperforming baselines on 11 benchmarks.
FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.
citing papers explorer
-
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
-
Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering
Introduces Wasserstein equilibrium decoding that improves accuracy and convergence speed for small VLMs on medical VQA benchmarks by using semantic consensus instead of lexical order.
-
BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation
BiomedAP improves robustness of biomedical VLMs to prompt variations using gated cross-modal fusion and dual-anchor constraints, outperforming baselines on 11 benchmarks.
-
FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning
FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.
-
Data-Centric Foundation Models in Computational Healthcare: A Survey
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.