Statistical Inference for Privatized Data with Unknown Sample Size
read the original abstract
We develop both theory and algorithms to analyze privatized data in unbounded differential privacy (DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that Approximate Bayesian Computation (ABC)-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to infinity, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. To facilitate valid, finite-sample Bayesian inference on privatized data under unbounded DP, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al, (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Large-Sample Bayesian Approximations for Privatized Data
A two-step approximate Bayesian sampler for privatized data is shown to be asymptotically valid under mild assumptions, with conservative frequentist properties in simulations and an application to 2022 American Commu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.