Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Grzegorz Jacenk\'ow; Jorge Cardoso; Pedro Sanchez; Sotirios A. Tsaftaris; Virginia Fernandez; Walter Hugo Lopez Pinaya

arxiv: 2306.01322 · v1 · pith:KPWDJ2GNnew · submitted 2023-06-02 · 💻 cs.LG · cs.CR· cs.CV

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Virginia Fernandez , Pedro Sanchez , Walter Hugo Lopez Pinaya , Grzegorz Jacenk\'ow , Sotirios A. Tsaftaris , Jorge Cardoso This is my paper

classification 💻 cs.LG cs.CRcs.CV

keywords modeldatadistillationprivacydiffusiongenerativeriskdataset

0 comments

read the original abstract

Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is ``How can a data provider ensure that the generative model is not leaking identifiable information about a patient?''. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

This paper has not been read by Pith yet.

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

discussion (0)