pith. sign in

arxiv: 2308.12453 · v1 · pith:STJ6MXHInew · submitted 2023-08-23 · 💻 cs.CV · cs.AI· cs.LG

Augmenting medical image classifiers with synthetic data from latent diffusion models

classification 💻 cs.CV cs.AIcs.LG
keywords datasyntheticdiseaseimageimageslatentmodelskin
0
0 comments X
read the original abstract

While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Generalist Model for Diverse Text-Guided Medical Image Synthesis

    cs.CV 2024-05 unverdicted novelty 6.0

    MediSyn is a generalist latent diffusion model that synthesizes text-guided medical images across multiple specialties and modalities from public data and improves downstream classifiers in low-data settings.

  2. Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

    eess.IV 2026-06 unverdicted novelty 4.0

    Extending Med-DDPM to AD, synthetic MRIs conditioned on anatomical masks produce segmentation models with Dice 0.6532 (synthetic-only) and 0.7244 (hybrid real+synthetic), outperforming real-only training at 0.6513.