pith. sign in

arxiv: 2302.14679 · v2 · pith:OQ7B4E7Inew · submitted 2023-02-28 · 💻 cs.LG · cs.CL

Synthesizing Mixed-type Electronic Health Records using Diffusion Models

classification 💻 cs.LG cs.CL
keywords datamodelsprivacydiffusionehrselectronicgansgenerating
0
0 comments X
read the original abstract

Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Diffusion and Flow Matching Models for Tabular Data: A Survey

    cs.LG 2025-02 unverdicted novelty 7.0

    First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.

  2. OncoSynth: Synthetic data generation for treatment effect estimation in oncology

    cs.LG 2026-06 unverdicted novelty 6.0

    OncoSynth uses a diffusion-based sequential generative model to create synthetic oncology cohorts that preserve causal structures and reduce treatment effect estimation errors by up to 66% at the population level comp...