pith. sign in

arxiv: 2307.15189 · v1 · pith:ZBX3NHJ3new · submitted 2023-07-27 · 💻 cs.CV · cs.AI

Med-Flamingo: a Multimodal Medical Few-shot Learner

classification 💻 cs.CV cs.AI
keywords medicalmed-flamingofew-shotgenerativemodelsmultimodalapplicationsdata
0
0 comments X
read the original abstract

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

    cs.CV 2026-05 conditional novelty 7.0

    Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.

  2. Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation

    cs.CL 2026-05 unverdicted novelty 6.0

    DIVE improves in-context vector distillation for medical report generation via decisive-token supervision on pathology terms and EOS plus state-conditioned dynamic steering, achieving top BLEU-4, ROUGE-L and RadGraph ...

  3. Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

    cs.CV 2026-05 unverdicted novelty 6.0

    Introduces Wasserstein equilibrium decoding that improves accuracy and convergence speed for small VLMs on medical VQA benchmarks by using semantic consensus instead of lexical order.

  4. Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA

    cs.CV 2026-05 unverdicted novelty 5.0

    Ask4VG learns a risk estimator from counterfactual visual probes to rerank question rewrites, reducing held-out hallucination risk from 0.658 to 0.623 and raising accuracy from 0.337 to 0.356 on VQA-RAD.

  5. BiomedAP: A Vision-Informed Dual-Anchor Framework with Gated Cross-Modal Fusion for Robust Medical Vision-Language Adaptation

    cs.CV 2026-05 unverdicted novelty 5.0

    BiomedAP improves robustness of biomedical VLMs to prompt variations using gated cross-modal fusion and dual-anchor constraints, outperforming baselines on 11 benchmarks.

  6. FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

    cs.LG 2026-05 unverdicted novelty 5.0

    FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.

  7. Data-Centric Foundation Models in Computational Healthcare: A Survey

    cs.LG 2024-01 unverdicted novelty 3.0

    The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.