Within-Dataset Disclosure Risk for Differential Privacy
Pith reviewed 2026-05-24 06:39 UTC · model grok-4.3
The pith
A relative disclosure risk indicator shows how the privacy parameter ε affects disclosure risk for individuals inside a specific dataset.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing ε on the within-dataset individuals' disclosure risk. We then design an algorithm to find ε based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases ε while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget.
What carries the argument
The relative disclosure risk indicator (RDR), a quantity that measures how a chosen ε value changes disclosure risk specifically for the individuals present in one dataset.
If this is right
- Controllers can state privacy goals directly in terms of RDR values computed from their own dataset rather than abstract worst-case bounds.
- An algorithm produces an ε that respects those stated RDR preferences.
- A separate algorithm outputs both the ε value and a DP guarantee on that output itself.
- Multiple queries can be answered while keeping total leakage bounded without the controller declaring a global privacy budget.
Where Pith is reading between the lines
- The approach could let organizations move from a single global ε to per-dataset choices that reflect the actual records they hold.
- Similar risk indicators might be developed for other privacy definitions to give controllers concrete selection criteria.
- The user-study evidence suggests the RDR could be incorporated into privacy-management tools used by non-expert controllers.
Load-bearing premise
The derived RDR faithfully captures the disclosure risk that applies to the particular individuals whose data appear in the dataset under study.
What would settle it
A direct measurement of actual re-identification success rates on the dataset that fails to match the ordering or magnitude of RDR values computed for its members.
Figures
read the original abstract
Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $\epsilon$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $\epsilon$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $\epsilon$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $\epsilon$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $\epsilon$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $\epsilon$, and experimental evaluations showing our algorithms are efficient and scalable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to derive a Relative Disclosure Risk (RDR) indicator that quantifies the effect of the differential privacy parameter ε on the disclosure risk for individuals present in a specific dataset. It then introduces two algorithms for selecting ε based on RDR values reflecting controller preferences, one of which satisfies DP, and a method to bound cumulative privacy leakage across multiple queries. The contributions are evaluated via an IRB-approved user study demonstrating the RDR's usefulness in ε selection and experiments showing algorithmic efficiency and scalability.
Significance. If the RDR is shown to faithfully track individual disclosure risks induced by the mechanism, the work could offer a practical approach for data controllers to interpret and choose ε in real deployments, addressing a key usability challenge in differential privacy. The DP-satisfying algorithm and the multi-query leakage bound represent potentially valuable technical contributions, provided they are rigorously established. The user study provides evidence of perceived utility but does not substitute for validation of the risk measure itself.
major comments (3)
- [RDR Derivation (likely §3 or equivalent)] The derivation of the RDR must be explicitly shown to correspond to the posterior disclosure probabilities for the specific individuals in the dataset under the chosen mechanism; without this, the claim that it indicates within-dataset disclosure risk (as opposed to a worst-case or average-case proxy) remains unverified and is central to the paper's motivation and algorithms.
- [Evaluation (user study section)] The IRB-approved user study demonstrates perceived usefulness of the RDR for choosing ε, but does not include any validation that the RDR values correctly reflect actual disclosure risks for the individuals; this weakens the support for the claim that controllers can use it to manage within-dataset risk.
- [DP-satisfying algorithm (likely §4.2)] Details are needed on how the alternative algorithm that finds and releases ε while satisfying DP incorporates the RDR without introducing additional privacy leakage or circularity in the privacy guarantee.
minor comments (1)
- The abstract would benefit from a brief mention of the key equations or properties of the RDR to allow readers to assess the derivation at a high level.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below, providing clarifications from the manuscript and indicating revisions where the presentation can be strengthened.
read point-by-point responses
-
Referee: The derivation of the RDR must be explicitly shown to correspond to the posterior disclosure probabilities for the specific individuals in the dataset under the chosen mechanism; without this, the claim that it indicates within-dataset disclosure risk (as opposed to a worst-case or average-case proxy) remains unverified and is central to the paper's motivation and algorithms.
Authors: Section 3 derives RDR directly from the posterior probability of an individual's sensitive value given the mechanism output and the observed dataset, expressing RDR as the ratio of posteriors with versus without the DP mechanism. We will revise to add an explicit lemma and step-by-step mapping from the Bayes posterior to each term in the RDR formula, making the correspondence to within-dataset individual posteriors unambiguous rather than implicit. revision: yes
-
Referee: The IRB-approved user study demonstrates perceived usefulness of the RDR for choosing ε, but does not include any validation that the RDR values correctly reflect actual disclosure risks for the individuals; this weakens the support for the claim that controllers can use it to manage within-dataset risk.
Authors: The study evaluates perceived usefulness and controller decision-making with RDR, not empirical validation against ground-truth risks (which cannot be measured without violating privacy). We will revise the evaluation section and abstract to explicitly state the study's scope as usability evidence and to note that the risk correspondence rests on the Section 3 derivation rather than the study. revision: partial
-
Referee: Details are needed on how the alternative algorithm that finds and releases ε while satisfying DP incorporates the RDR without introducing additional privacy leakage or circularity in the privacy guarantee.
Authors: The algorithm in Section 4.2 feeds RDR values into a DP selection mechanism (e.g., exponential mechanism) whose privacy guarantee is independent of the RDR computation; the released ε satisfies DP by construction and the overall leakage remains bounded by the chosen ε. We will expand the section with a formal argument showing absence of circularity and no extra leakage beyond the DP guarantee of the selection step. revision: yes
Circularity Check
No significant circularity; RDR derivation is self-contained from DP definition
full rationale
The paper presents the RDR as derived directly from the differential privacy definition to quantify ε's effect on within-dataset disclosure risk for specific individuals. No equations or steps are shown to reduce by construction to fitted inputs, self-referential definitions, or load-bearing self-citations. The algorithms and bounds follow from this derivation and standard DP properties without renaming known results or smuggling ansatzes. The derivation chain remains independent of the target claims, consistent with a self-contained analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Differential privacy definition: neighboring datasets differ by one record and the output distributions are close within factor e^ε
invented entities (1)
-
Relative Disclosure Risk indicator (RDR)
no independent evidence
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
[n.d.]. UCI Adult dataset. https://www.kaggle.com/datasets/uciml/adult-census- income
-
[4]
John M Abowd. 2018. The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 2867–2867
work page 2018
-
[5]
Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simpli- fications, extensions, and lower bounds. In Theory of cryptography conference . Springer, 635–658
work page 2016
-
[6]
US Census Bureau. 2021. Census Bureau Sets Key Parameters to Protect Privacy in 2020 Census Results. https://www.census.gov/newsroom/press-releases/2021/ 2020-census-key-parameters.html
work page 2021
-
[7]
L Citrome. 2010. Relative vs. absolute measures of benefit and risk: what’s the difference? Acta Psychiatrica Scandinavica 121, 2 (2010), 94–102
work page 2010
-
[8]
Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, and Wanr...
-
[9]
Harvard Data Science Review 6, 1 (jan 16 2024)
Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment. Harvard Data Science Review 6, 1 (jan 16 2024). https://hdsr.mitpress.mit.edu/pub/sl9we8gh
work page 2024
-
[10]
Rachel Cummings, Gabriel Kaptchuk, and Elissa M Redmiles. 2021. " I need a better description": An Investigation Into User Expectations For Differential Privacy. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3037–3052
work page 2021
-
[11]
Damien Desfontaines. 2024. What’s up with all these large privacy budgets? https://desfontain.es/blog/large-epsilons.html. Ted is writing things (personal blog)
work page 2024
-
[12]
Cynthia Dwork. 2006. Differential Privacy. In 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006) (33rd international colloquium on automata, languages and programming, part ii (icalp 2006) ed.) (Lecture Notes in Computer Science) , Vol. 4052. Springer Verlag, 1–12. https: //www.microsoft.com/en-us/research/publicati...
work page 2006
-
[13]
Cynthia Dwork, Nitin Kohli, and Deirdre Mulligan. 2019. Differential privacy in practice: Expose your epsilons! Journal of Privacy and Confidentiality 9, 2 (2019)
work page 2019
-
[14]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265–284
work page 2006
-
[15]
Cynthia Dwork, Moni Naor, Omer Reingold, Guy N Rothblum, and Salil Vadhan
-
[16]
In Proceedings of the forty-first annual ACM symposium on Theory of computing
On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 381–390
-
[17]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- ential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407
work page 2014
-
[18]
Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differ- ential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51–60
work page 2010
-
[19]
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms . SIAM, 2468–2479
work page 2019
-
[20]
Vitaly Feldman and Tijana Zrnic. 2021. Individual privacy accounting via a renyi filter. Advances in Neural Information Processing Systems 34 (2021), 28080–28091
work page 2021
- [21]
-
[22]
Chang Ge, Xi He, Ihab F Ilyas, and Ashwin Machanavajjhala. 2019. Apex: Accuracy-aware differentially private data exploration. In Proceedings of the 2019 International Conference on Management of Data . 177–194
work page 2019
-
[23]
Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang, and George Bissias. 2016. Exploring privacy-accuracy tradeoffs using dpcomp. In Proceedings of the 2016 International Conference on Management of Data . 2101– 2104
work page 2016
-
[24]
Xi He, Ashwin Machanavajjhala, and Bolin Ding. 2014. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data . 1447–1458
work page 2014
-
[25]
Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan, Benjamin C Pierce, and Aaron Roth. 2014. Differential privacy: An economic method for choosing epsilon. In 2014 IEEE 27th Computer Security Foundations Symposium. IEEE, 398–410
work page 2014
-
[26]
Mark F St John, Grit Denker, Peeter Laud, Karsten Martiny, Alisa Pankova, and Dusko Pavlovic. 2021. Decision Support for Sharing Data Using Differential Privacy. In 2021 IEEE Symposium on Visualization for Cyber Security (VizSec) . IEEE, 26–35
work page 2021
-
[27]
Zach Jorgensen, Ting Yu, and Graham Cormode. 2015. Conservative or liberal? Personalized differential privacy. In 2015 IEEE 31St international conference on data engineering. IEEE, 1023–1034
work page 2015
-
[28]
Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhod- nikova, and Adam Smith. 2011. What can we learn privately? SIAM J. Comput. 40, 3 (2011), 793–826
work page 2011
-
[29]
Timothy L Keiningham, Alexander Buoye, and Joan Ball. 2015. Competitive context is everything: Moving from absolute to relative metrics.Global Economics and Management Review 20, 2 (2015), 18–25
work page 2015
-
[30]
Nitin Kohli and Paul Laskowski. 2018. Epsilon voting: Mechanism design for parameter selection in differential privacy. In 2018 IEEE Symposium on Privacy- A ware Computing (PAC). IEEE, 19–30
work page 2018
-
[31]
Jaewoo Lee and Chris Clifton. 2011. How much is enough? choosing 𝜖 for differential privacy. In International Conference on Information Security . Springer, 325–340
work page 2011
-
[32]
Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, and Steven Z Wu. 2017. Accuracy first: Selecting a differential privacy level for accuracy constrained erm. Advances in Neural Information Processing Systems 30 (2017)
work page 2017
-
[33]
Min Lyu, Dong Su, and Ninghui Li. 2016. Understanding the sparse vector technique for differential privacy. arXiv preprint arXiv:1603.01699 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test. The Corsini encyclopedia of psychology (2010), 1–1
work page 2010
-
[35]
Jack Murtagh, Kathryn Taylor, George Kellaris, and Salil Vadhan. 2018. Usable differential privacy: A case study with psi.arXiv preprint arXiv:1809.04103 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
Priyanka Nanayakkara, Johes Bater, Xi He, Jessica Hullman, and Jennie Rogers
-
[37]
arXiv preprint arXiv:2201.05964 (2022)
Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Re- leases. arXiv preprint arXiv:2201.05964 (2022)
-
[38]
Priyanka Nanayakkara, Mary Anne Smart, Rachel Cummings, Gabriel Kaptchuk, and Elissa M Redmiles. 2023. What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy. In 32nd USENIX Security Symposium (USENIX Security 23). 1613–1630
work page 2023
-
[39]
Joseph P Near, Xi He, et al. 2021. Differential Privacy for Databases. Foundations and Trends® in Databases 11, 2 (2021), 109–225
work page 2021
-
[40]
Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing . 75–84
work page 2007
-
[41]
Rachel Redberg and Yu-Xiang Wang. 2021. Privately publishable per-instance privacy. Advances in Neural Information Processing Systems 34 (2021), 17335– 17346
work page 2021
-
[42]
Nicholas G Reich, Justin Lessler, Krzysztof Sakrejda, Stephen A Lauer, Sopon Iamsirithaworn, and Derek AT Cummings. 2016. Case study in evaluating time series prediction models using the relative mean absolute error. The American Statistician 70, 3 (2016), 285–292
work page 2016
-
[43]
Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. 2016. Pri- vacy odometers and filters: Pay-as-you-go composition. Advances in Neural Information Processing Systems 29 (2016)
work page 2016
-
[44]
Jeremy Seeman, William Sexton, David Pujol, and Ashwin Machanavajjhala
-
[45]
Privately Answering Queries on Skewed Data via Per-Record Differential Privacy. 17, 11 (Aug. 2024), 3138–3150. https://doi.org/10.14778/3681954.3681989
-
[46]
Mary Anne Smart, Dhruv Sood, and Kristen Vaccaro. 2022. Understanding risks of privacy theater with differential privacy. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–24
work page 2022
-
[47]
Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, and Matei Zaharia
-
[48]
arXiv preprint arXiv:2006.12018 (2020)
Overlook: Differentially Private Exploratory Visualization for Big Data. arXiv preprint arXiv:2006.12018 (2020)
-
[49]
Sameer Wagh, Xi He, Ashwin Machanavajjhala, and Prateek Mittal. 2021. Dp- cryptography: marrying differential privacy and cryptography in emerging applications. Commun. ACM 64, 2 (2021), 84–93
work page 2021
-
[50]
Yu-Xiang Wang. 2019. Per-instance differential privacy. Journal of Privacy and Confidentiality 9, 1 (2019)
work page 2019
-
[51]
Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, and Steven Wu. 2023. Fully- adaptive composition in differential privacy. In International Conference on Ma- chine Learning. PMLR, 36990–37007
work page 2023
-
[52]
Aiping Xiong, Tianhao Wang, Ninghui Li, and Somesh Jha. 2020. Towards effective differential privacy communication for users’ data sharing decision and comprehension. In 2020 IEEE Symposium on Security and Privacy (SP) . IEEE, 392–410
work page 2020
-
[53]
Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. 2022. Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics. PMLR, 4782–4817. 13
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.