pith. sign in

arxiv: 2406.06231 · v3 · submitted 2024-06-10 · 🧮 math.ST · cs.CR· stat.CO· stat.TH

Statistical Inference for Privatized Data with Unknown Sample Size

classification 🧮 math.ST cs.CRstat.COstat.TH
keywords dataunboundedprivatizeddistributionsgoesprivacysamplesize
0
0 comments X
read the original abstract

We develop both theory and algorithms to analyze privatized data in unbounded differential privacy (DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ is at an appropriate rate; we also establish that Approximate Bayesian Computation (ABC)-type posterior distributions converge under similar assumptions. We further give asymptotic results in the regime where the privacy budget for $n$ goes to infinity, establishing similarity of sampling distributions as well as showing that the MLE in the unbounded setting converges to the bounded-DP MLE. To facilitate valid, finite-sample Bayesian inference on privatized data under unbounded DP, we propose a reversible jump MCMC algorithm which extends the data augmentation MCMC of Ju et al, (2022). We also propose a Monte Carlo EM algorithm to compute the MLE from privatized data in both bounded and unbounded DP. We apply our methodology to analyze a linear regression model as well as a 2019 American Time Use Survey Microdata File which we model using a Dirichlet distribution.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large-Sample Bayesian Approximations for Privatized Data

    stat.ME 2026-04 unverdicted novelty 6.0

    A two-step approximate Bayesian sampler for privatized data is shown to be asymptotically valid under mild assumptions, with conservative frequentist properties in simulations and an application to 2022 American Commu...