Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Karl Knopf; Shufan Zhang; Thomas Humphries; Tim Li; Xi He

arxiv: 2604.07581 · v1 · submitted 2026-04-08 · 💻 cs.CR · cs.DB

Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Thomas Humphries , Tim Li , Shufan Zhang , Karl Knopf , Xi He This is my paper

Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3

classification 💻 cs.CR cs.DB

keywords differentially private medianrandomization intervalpost-processingprivacy-utility tradeofferror interpretationnoise boundsdata-dependent mechanisms

0 comments

The pith

PostRI computes a randomization interval after releasing a differentially private median to preserve higher utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that differentially private median queries can return both the statistic and a useful bound on the noise-induced error without forcing a degradation in the median's accuracy. Prior methods achieved narrow randomization intervals only by adding extra noise or otherwise weakening the released median itself. PostRI instead releases the median under standard DP and then derives the interval afterward. A sympathetic reader cares because medians are a basic building block for many private data analyses, and reducing the forced accuracy loss makes DP more practical for real workloads.

Core claim

PostRI enables the release of a differentially private median followed by a post-hoc computation of a randomization interval that bounds the error introduced by the DP noise mechanism. Because the interval is derived after the median release, the median itself can be computed with substantially less noise than in earlier approaches that had to entangle the two steps. The result is a median whose utility is 14 to 850 percent higher than related work while the accompanying interval remains narrow.

What carries the argument

PostRI, a post-release procedure that constructs a randomization interval for the already-released differentially private median without additional privacy cost.

If this is right

Median estimates under differential privacy can be made closer to the non-private value while still supplying an interpretable error bound.
Analysts no longer face an explicit tradeoff between median accuracy and the ability to understand the scale of the added noise.
The same separation of release and interval computation may extend to other statistics whose noise scale depends on the input data.
Libraries implementing differential privacy can add automatic post-release interval support for median queries without changing the underlying privacy mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If PostRI works for medians, similar post-processing might be developed for other order statistics or for quantiles that also have data-dependent noise.
Widespread adoption would let data curators publish more accurate private medians for applications such as income or health statistics while still giving users concrete error ranges.
The method could be tested on real-world datasets to measure how often the post-computed intervals are narrow enough to be useful in practice.

Load-bearing premise

Computing and releasing the randomization interval after the median has already been released does not create new privacy leakage or invalidate the interval guarantees.

What would settle it

An attack or simulation in which an adversary, given only the released median and the later randomization interval, recovers more information about the private dataset than the original differential privacy budget permits.

Figures

Figures reproduced from arXiv: 2604.07581 by Karl Knopf, Shufan Zhang, Thomas Humphries, Tim Li, Xi He.

**Figure 2.** Figure 2: Average median error and RI length vs. vary [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Average median error and RI length vs. vary [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PostRI lets you release a DP median first then compute a randomization interval afterward, avoiding the accuracy trade-off that earlier methods required.

read the letter

The main takeaway is that this paper shows how to add a randomization interval to a differentially private median release after the fact, without having to reduce the accuracy of the median itself. What is new is the PostRI construction for computing the interval post-estimation when the noise depends on the input data. Earlier methods had to degrade the median to achieve a usable interval, but PostRI separates the two steps. This leads to reported utility improvements ranging from 14% to 850% over related work, while the interval stays narrow. The paper does well in laying out the problem clearly and backing the claims with experiments that compare against baselines. The gains are substantial enough to matter for practical use of DP medians in databases. One soft spot is around the privacy guarantee for the joint release of the median and the interval. Since the interval computation can depend on the raw data, standard post-processing theorems may not directly apply. The paper should include a detailed argument or proof that the combined output satisfies differential privacy without extra budget or invalidating the interval properties. If that holds up, the rest follows. The citation pattern looks standard, building on existing DP literature without self-referential issues. This is useful for practitioners and researchers in differential privacy who want to make private query results more interpretable for end users. Readers dealing with median queries or similar statistics in private settings will get the most out of it. The work has clear thinking and addresses a real usability issue, so it deserves a serious referee. I would recommend sending this to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes PostRI, a post-processing method to compute a randomization interval (RI) after first releasing a differentially private median estimate. Unlike prior work that degrades median utility to obtain a high-quality RI for input-dependent noise, PostRI claims to deliver 14%-850% higher utility for the median while preserving a narrow RI that bounds the DP-induced error.

Significance. If the joint privacy guarantee and coverage properties hold, the result would be a useful practical advance for making DP medians more interpretable without utility sacrifice. The approach correctly exploits the fact that the initial median release consumes the privacy budget, but its value hinges on whether the subsequent RI step can be shown to add no extra leakage when the noise distribution depends on the data.

major comments (2)

[§4] §4 (Privacy Analysis of PostRI): The claim that the joint (median, RI) output satisfies the original (ε,δ)-DP guarantee without additional budget is load-bearing. The RI computation is data-dependent and occurs after the median release; please supply the explicit composition argument or reduction showing that re-accessing the input for the RI does not violate post-processing or inflate the effective privacy loss. Standard post-processing applies only to functions of the already-released output.
[§5.3] §5.3 (Utility Experiments): The reported 14%-850% utility gains are central to the contribution. Clarify the exact baselines, privacy budgets, datasets, and RI-width controls used for each end of the range; without these details it is unclear whether the gains are robust or arise only under specific parameter regimes.

minor comments (2)

[Abstract] Abstract: The utility improvement range is stated without reference to the privacy parameter or mechanism; a single sentence on the DP setting would improve context.
[Notation] Notation: The symbols for the lower and upper bounds of the RI and for the data-dependent noise scale should be defined once and used consistently across sections and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications on privacy analysis and experimental details.

read point-by-point responses

Referee: [§4] §4 (Privacy Analysis of PostRI): The claim that the joint (median, RI) output satisfies the original (ε,δ)-DP guarantee without additional budget is load-bearing. The RI computation is data-dependent and occurs after the median release; please supply the explicit composition argument or reduction showing that re-accessing the input for the RI does not violate post-processing or inflate the effective privacy loss. Standard post-processing applies only to functions of the already-released output.

Authors: We agree that an explicit argument is necessary to substantiate the joint privacy claim. The manuscript currently invokes the post-processing theorem after the median release consumes the full budget, but does not detail how the data-dependent RI step avoids additional leakage. We will revise Section 4 to include a formal reduction: the RI is computed from the released median value together with publicly known parameters (noise distribution family, sensitivity bounds, and the fixed privacy parameters), without further queries to the private dataset. This reduction shows that the joint output is a (possibly randomized) function of the already-released DP median alone, preserving the original (ε,δ) guarantee. The revised section will contain the full argument. revision: yes
Referee: [§5.3] §5.3 (Utility Experiments): The reported 14%-850% utility gains are central to the contribution. Clarify the exact baselines, privacy budgets, datasets, and RI-width controls used for each end of the range; without these details it is unclear whether the gains are robust or arise only under specific parameter regimes.

Authors: We acknowledge that the range of reported gains requires precise contextualization. We will expand Section 5.3 with a table that enumerates, for each reported percentage, the exact baseline method, the privacy budget ε (and δ if applicable), the dataset (synthetic and real-world instances), the number of repetitions, and the RI-width control (e.g., fixed absolute width or quantile-based). This will demonstrate that the improvements hold across the tested regimes rather than in isolated settings. The revision will be made. revision: yes

Circularity Check

0 steps flagged

No significant circularity; PostRI derivation is self-contained

full rationale

The paper introduces PostRI as a method to compute a randomization interval after releasing a DP median estimate, claiming improved utility over prior approaches that degrade the median quality. The abstract and described approach rely on standard differential privacy post-processing and composition properties applied to an existing median mechanism, with utility gains presented via direct empirical comparison rather than any fitted parameter renamed as a prediction or self-referential definition. No load-bearing step in the provided description reduces by construction to its own inputs, self-citation chains, or ansatz smuggling; the central claims remain independent of the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the method appears to rest on standard differential privacy definitions and properties of the Laplace or exponential mechanism for medians. No free parameters, ad-hoc axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5424 in / 1030 out tokens · 35023 ms · 2026-05-10T16:54:43.651781+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Karan Chadha, John Duchi, and Rohith Kuditipudi. 2024. Re- sampling methods for private statistical inference.arXiv preprint arXiv:2402.07131(2024)

work page arXiv 2024
[2]

Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, and Uri Stemmer

work page
[3]

InThe Thirty Seventh Annual Conference on Learning Theory

Lower bounds for differential privacy under continual obser- vation and online threshold queries. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1200–1222

work page
[4]

to appear

Christian Covington, Xi He, James Honaker, and Gautam Kamath. to appear. Unbiased statistical estimation and valid confidence intervals under differential privacy.Statistica Sinica(to appear)

work page
[5]

Irit Dinur and Kobbi Nissim. 2003. Revealing information while pre- serving privacy. InProceedings of the Twenty-Second ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, Frank Neven, Catriel Beeri, and Tova Milo (Eds.). ACM, 202–210. doi:10.1145/773153.773173

work page doi:10.1145/773153.773173 2003
[6]

Jörg Drechsler, Ira Globus-Harris, Audra Mcmillan, Jayshree Sarathy, and Adam Smith. 2022. Nonparametric differentially private con- fidence intervals for the median.Journal of Survey Statistics and Methodology10, 3 (2022), 804–829

work page 2022
[7]

Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. 2020. Differentially private confidence intervals.arXiv preprint arXiv:2001.02285(2020)

work page arXiv 2020
[8]

Cynthia Dwork. 2006. Differential privacy. InIN ICALP. Springer

work page 2006
[9]

Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. InProceedings of the forty-first annual ACM symposium on Theory of computing. 371–380

work page 2009
[10]

Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy.Found. Trends Theor. Comput. Sci.(2014)

work page 2014
[11]

2022-07-17/2022-07-23

Jennifer Gillenwater, Matthew Joseph, Andres Munoz, and Mon- ica Ribero Diaz. 2022-07-17/2022-07-23. A Joint Exponential Mech- anism for Differentially Private Top-k. InProceedings of the 39th In- ternational Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari...

work page 2022
[12]

Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled evaluation of differentially private algorithms using dpbench. InProceedings of the 2016 International Conference on Management of Data. 139–154

work page 2016
[13]

2017.Differential privacy: From theory to practice

Ninghui Li, Min Lyu, Dong Su, and Weining Yang. 2017.Differential privacy: From theory to practice. Springer

work page 2017
[14]

Katrina Ligett, Moshe Shenfeld, Tomer Shoham, and Noa Velner- Harris. 2025. DIFFERENTIALLY PRIVATE NON-PARAMETRIC CON- FIDENCE INTERVALS.Journal of Privacy and Confidentiality(2025)

work page 2025
[15]

Jiaxiang Liu, Karl Knopf, Yiqing Tan, Bolin Ding, and Xi He. 2021. Catch a blowfish alive: a demonstration of policy-aware differential privacy for interactive data exploration.Proceedings of the VLDB Endowment14, 12 (2021), 2859–2862

work page 2021
[16]

Min Lyu, Dong Su, and Ninghui Li. 2017. Understanding the Sparse Vector Technique for Differential Privacy. 10, 6 (2017), 637–648. doi:10. 14778/3055330.3055331

work page arXiv 2017
[17]

Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. InProceedings of the 48th Annual IEEE Sympo- sium on Foundations of Computer Science(USA, 2007)(FOCS ’07). IEEE Computer Society, 94–103. doi:10.1109/FOCS.2007.41

work page doi:10.1109/focs.2007.41 2007
[18]

Priyanka Nanayakkara, Johes Bater, Xi He, Jessica Hullman, and Jennie Rogers. 2022. Visualizing privacy-utility trade-offs in differentially private data releases.arXiv preprint arXiv:2201.05964(2022)

work page arXiv 2022
[19]

Liudas Panavas, Amit Sarker, Sara Di Bartolomeo, Ali Sarvghad, Cody Dunne, and Narges Mahyar. 2024. Illuminating the Landscape of Differential Privacy: An Interview Study on the Use of Visualization in Real-World Deployments.IEEE Transactions on Visualization and Computer Graphics(2024)

work page 2024
[20]

Dajun Sun, Wei Dong, and Ke Yi. 2023. Confidence Intervals for Private Query Processing.Proceedings of the VLDB Endowment17, 3 (2023), 373–385

work page 2023
[21]

Siyuan Xia, Beizhen Chang, Karl Knopf, Yihan He, Yuchao Tao, and Xi He. 2021. Dpgraph: A benchmark platform for differentially private graph analysis. InProceedings of the 2021 International Conference on Management of Data. 2808–2812. A Proofs A.1 Proof of Theorem 3.2 We first prove the following lemma. Lemma A.1.𝑓has sensitivity1. Formally: max 𝐷,𝐷 ′ ∈ ...

work page 2021

[1] [1]

Karan Chadha, John Duchi, and Rohith Kuditipudi. 2024. Re- sampling methods for private statistical inference.arXiv preprint arXiv:2402.07131(2024)

work page arXiv 2024

[2] [2]

Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, and Uri Stemmer

work page

[3] [3]

InThe Thirty Seventh Annual Conference on Learning Theory

Lower bounds for differential privacy under continual obser- vation and online threshold queries. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1200–1222

work page

[4] [4]

to appear

Christian Covington, Xi He, James Honaker, and Gautam Kamath. to appear. Unbiased statistical estimation and valid confidence intervals under differential privacy.Statistica Sinica(to appear)

work page

[5] [5]

Irit Dinur and Kobbi Nissim. 2003. Revealing information while pre- serving privacy. InProceedings of the Twenty-Second ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, Frank Neven, Catriel Beeri, and Tova Milo (Eds.). ACM, 202–210. doi:10.1145/773153.773173

work page doi:10.1145/773153.773173 2003

[6] [6]

Jörg Drechsler, Ira Globus-Harris, Audra Mcmillan, Jayshree Sarathy, and Adam Smith. 2022. Nonparametric differentially private con- fidence intervals for the median.Journal of Survey Statistics and Methodology10, 3 (2022), 804–829

work page 2022

[7] [7]

Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. 2020. Differentially private confidence intervals.arXiv preprint arXiv:2001.02285(2020)

work page arXiv 2020

[8] [8]

Cynthia Dwork. 2006. Differential privacy. InIN ICALP. Springer

work page 2006

[9] [9]

Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. InProceedings of the forty-first annual ACM symposium on Theory of computing. 371–380

work page 2009

[10] [10]

Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy.Found. Trends Theor. Comput. Sci.(2014)

work page 2014

[11] [11]

2022-07-17/2022-07-23

Jennifer Gillenwater, Matthew Joseph, Andres Munoz, and Mon- ica Ribero Diaz. 2022-07-17/2022-07-23. A Joint Exponential Mech- anism for Differentially Private Top-k. InProceedings of the 39th In- ternational Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari...

work page 2022

[12] [12]

Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled evaluation of differentially private algorithms using dpbench. InProceedings of the 2016 International Conference on Management of Data. 139–154

work page 2016

[13] [13]

2017.Differential privacy: From theory to practice

Ninghui Li, Min Lyu, Dong Su, and Weining Yang. 2017.Differential privacy: From theory to practice. Springer

work page 2017

[14] [14]

Katrina Ligett, Moshe Shenfeld, Tomer Shoham, and Noa Velner- Harris. 2025. DIFFERENTIALLY PRIVATE NON-PARAMETRIC CON- FIDENCE INTERVALS.Journal of Privacy and Confidentiality(2025)

work page 2025

[15] [15]

Jiaxiang Liu, Karl Knopf, Yiqing Tan, Bolin Ding, and Xi He. 2021. Catch a blowfish alive: a demonstration of policy-aware differential privacy for interactive data exploration.Proceedings of the VLDB Endowment14, 12 (2021), 2859–2862

work page 2021

[16] [16]

Min Lyu, Dong Su, and Ninghui Li. 2017. Understanding the Sparse Vector Technique for Differential Privacy. 10, 6 (2017), 637–648. doi:10. 14778/3055330.3055331

work page arXiv 2017

[17] [17]

Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. InProceedings of the 48th Annual IEEE Sympo- sium on Foundations of Computer Science(USA, 2007)(FOCS ’07). IEEE Computer Society, 94–103. doi:10.1109/FOCS.2007.41

work page doi:10.1109/focs.2007.41 2007

[18] [18]

Priyanka Nanayakkara, Johes Bater, Xi He, Jessica Hullman, and Jennie Rogers. 2022. Visualizing privacy-utility trade-offs in differentially private data releases.arXiv preprint arXiv:2201.05964(2022)

work page arXiv 2022

[19] [19]

Liudas Panavas, Amit Sarker, Sara Di Bartolomeo, Ali Sarvghad, Cody Dunne, and Narges Mahyar. 2024. Illuminating the Landscape of Differential Privacy: An Interview Study on the Use of Visualization in Real-World Deployments.IEEE Transactions on Visualization and Computer Graphics(2024)

work page 2024

[20] [20]

Dajun Sun, Wei Dong, and Ke Yi. 2023. Confidence Intervals for Private Query Processing.Proceedings of the VLDB Endowment17, 3 (2023), 373–385

work page 2023

[21] [21]

Siyuan Xia, Beizhen Chang, Karl Knopf, Yihan He, Yuchao Tao, and Xi He. 2021. Dpgraph: A benchmark platform for differentially private graph analysis. InProceedings of the 2021 International Conference on Management of Data. 2808–2812. A Proofs A.1 Proof of Theorem 3.2 We first prove the following lemma. Lemma A.1.𝑓has sensitivity1. Formally: max 𝐷,𝐷 ′ ∈ ...

work page 2021