pith. sign in

arxiv: 2604.07581 · v1 · submitted 2026-04-08 · 💻 cs.CR · cs.DB

Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3

classification 💻 cs.CR cs.DB
keywords differentially private medianrandomization intervalpost-processingprivacy-utility tradeofferror interpretationnoise boundsdata-dependent mechanisms
0
0 comments X

The pith

PostRI computes a randomization interval after releasing a differentially private median to preserve higher utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that differentially private median queries can return both the statistic and a useful bound on the noise-induced error without forcing a degradation in the median's accuracy. Prior methods achieved narrow randomization intervals only by adding extra noise or otherwise weakening the released median itself. PostRI instead releases the median under standard DP and then derives the interval afterward. A sympathetic reader cares because medians are a basic building block for many private data analyses, and reducing the forced accuracy loss makes DP more practical for real workloads.

Core claim

PostRI enables the release of a differentially private median followed by a post-hoc computation of a randomization interval that bounds the error introduced by the DP noise mechanism. Because the interval is derived after the median release, the median itself can be computed with substantially less noise than in earlier approaches that had to entangle the two steps. The result is a median whose utility is 14 to 850 percent higher than related work while the accompanying interval remains narrow.

What carries the argument

PostRI, a post-release procedure that constructs a randomization interval for the already-released differentially private median without additional privacy cost.

If this is right

  • Median estimates under differential privacy can be made closer to the non-private value while still supplying an interpretable error bound.
  • Analysts no longer face an explicit tradeoff between median accuracy and the ability to understand the scale of the added noise.
  • The same separation of release and interval computation may extend to other statistics whose noise scale depends on the input data.
  • Libraries implementing differential privacy can add automatic post-release interval support for median queries without changing the underlying privacy mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If PostRI works for medians, similar post-processing might be developed for other order statistics or for quantiles that also have data-dependent noise.
  • Widespread adoption would let data curators publish more accurate private medians for applications such as income or health statistics while still giving users concrete error ranges.
  • The method could be tested on real-world datasets to measure how often the post-computed intervals are narrow enough to be useful in practice.

Load-bearing premise

Computing and releasing the randomization interval after the median has already been released does not create new privacy leakage or invalidate the interval guarantees.

What would settle it

An attack or simulation in which an adversary, given only the released median and the later randomization interval, recovers more information about the private dataset than the original differential privacy budget permits.

Figures

Figures reproduced from arXiv: 2604.07581 by Karl Knopf, Shufan Zhang, Thomas Humphries, Tim Li, Xi He.

Figure 1
Figure 1. Figure 1: Average median error and RI length vs. vary [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average median error and RI length vs. vary [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average median error and RI length vs. vary [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes PostRI, a post-processing method to compute a randomization interval (RI) after first releasing a differentially private median estimate. Unlike prior work that degrades median utility to obtain a high-quality RI for input-dependent noise, PostRI claims to deliver 14%-850% higher utility for the median while preserving a narrow RI that bounds the DP-induced error.

Significance. If the joint privacy guarantee and coverage properties hold, the result would be a useful practical advance for making DP medians more interpretable without utility sacrifice. The approach correctly exploits the fact that the initial median release consumes the privacy budget, but its value hinges on whether the subsequent RI step can be shown to add no extra leakage when the noise distribution depends on the data.

major comments (2)
  1. [§4] §4 (Privacy Analysis of PostRI): The claim that the joint (median, RI) output satisfies the original (ε,δ)-DP guarantee without additional budget is load-bearing. The RI computation is data-dependent and occurs after the median release; please supply the explicit composition argument or reduction showing that re-accessing the input for the RI does not violate post-processing or inflate the effective privacy loss. Standard post-processing applies only to functions of the already-released output.
  2. [§5.3] §5.3 (Utility Experiments): The reported 14%-850% utility gains are central to the contribution. Clarify the exact baselines, privacy budgets, datasets, and RI-width controls used for each end of the range; without these details it is unclear whether the gains are robust or arise only under specific parameter regimes.
minor comments (2)
  1. [Abstract] Abstract: The utility improvement range is stated without reference to the privacy parameter or mechanism; a single sentence on the DP setting would improve context.
  2. [Notation] Notation: The symbols for the lower and upper bounds of the RI and for the data-dependent noise scale should be defined once and used consistently across sections and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications on privacy analysis and experimental details.

read point-by-point responses
  1. Referee: [§4] §4 (Privacy Analysis of PostRI): The claim that the joint (median, RI) output satisfies the original (ε,δ)-DP guarantee without additional budget is load-bearing. The RI computation is data-dependent and occurs after the median release; please supply the explicit composition argument or reduction showing that re-accessing the input for the RI does not violate post-processing or inflate the effective privacy loss. Standard post-processing applies only to functions of the already-released output.

    Authors: We agree that an explicit argument is necessary to substantiate the joint privacy claim. The manuscript currently invokes the post-processing theorem after the median release consumes the full budget, but does not detail how the data-dependent RI step avoids additional leakage. We will revise Section 4 to include a formal reduction: the RI is computed from the released median value together with publicly known parameters (noise distribution family, sensitivity bounds, and the fixed privacy parameters), without further queries to the private dataset. This reduction shows that the joint output is a (possibly randomized) function of the already-released DP median alone, preserving the original (ε,δ) guarantee. The revised section will contain the full argument. revision: yes

  2. Referee: [§5.3] §5.3 (Utility Experiments): The reported 14%-850% utility gains are central to the contribution. Clarify the exact baselines, privacy budgets, datasets, and RI-width controls used for each end of the range; without these details it is unclear whether the gains are robust or arise only under specific parameter regimes.

    Authors: We acknowledge that the range of reported gains requires precise contextualization. We will expand Section 5.3 with a table that enumerates, for each reported percentage, the exact baseline method, the privacy budget ε (and δ if applicable), the dataset (synthetic and real-world instances), the number of repetitions, and the RI-width control (e.g., fixed absolute width or quantile-based). This will demonstrate that the improvements hold across the tested regimes rather than in isolated settings. The revision will be made. revision: yes

Circularity Check

0 steps flagged

No significant circularity; PostRI derivation is self-contained

full rationale

The paper introduces PostRI as a method to compute a randomization interval after releasing a DP median estimate, claiming improved utility over prior approaches that degrade the median quality. The abstract and described approach rely on standard differential privacy post-processing and composition properties applied to an existing median mechanism, with utility gains presented via direct empirical comparison rather than any fitted parameter renamed as a prediction or self-referential definition. No load-bearing step in the provided description reduces by construction to its own inputs, self-citation chains, or ansatz smuggling; the central claims remain independent of the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the method appears to rest on standard differential privacy definitions and properties of the Laplace or exponential mechanism for medians. No free parameters, ad-hoc axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5424 in / 1030 out tokens · 35023 ms · 2026-05-10T16:54:43.651781+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Karan Chadha, John Duchi, and Rohith Kuditipudi. 2024. Re- sampling methods for private statistical inference.arXiv preprint arXiv:2402.07131(2024)

  2. [2]

    Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, and Uri Stemmer

  3. [3]

    InThe Thirty Seventh Annual Conference on Learning Theory

    Lower bounds for differential privacy under continual obser- vation and online threshold queries. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1200–1222

  4. [4]

    to appear

    Christian Covington, Xi He, James Honaker, and Gautam Kamath. to appear. Unbiased statistical estimation and valid confidence intervals under differential privacy.Statistica Sinica(to appear)

  5. [5]

    Irit Dinur and Kobbi Nissim. 2003. Revealing information while pre- serving privacy. InProceedings of the Twenty-Second ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, Frank Neven, Catriel Beeri, and Tova Milo (Eds.). ACM, 202–210. doi:10.1145/773153.773173

  6. [6]

    Jörg Drechsler, Ira Globus-Harris, Audra Mcmillan, Jayshree Sarathy, and Adam Smith. 2022. Nonparametric differentially private con- fidence intervals for the median.Journal of Survey Statistics and Methodology10, 3 (2022), 804–829

  7. [7]

    Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. 2020. Differentially private confidence intervals.arXiv preprint arXiv:2001.02285(2020)

  8. [8]

    Cynthia Dwork. 2006. Differential privacy. InIN ICALP. Springer

  9. [9]

    Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. InProceedings of the forty-first annual ACM symposium on Theory of computing. 371–380

  10. [10]

    Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy.Found. Trends Theor. Comput. Sci.(2014)

  11. [11]

    2022-07-17/2022-07-23

    Jennifer Gillenwater, Matthew Joseph, Andres Munoz, and Mon- ica Ribero Diaz. 2022-07-17/2022-07-23. A Joint Exponential Mech- anism for Differentially Private Top-k. InProceedings of the 39th In- ternational Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari...

  12. [12]

    Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. 2016. Principled evaluation of differentially private algorithms using dpbench. InProceedings of the 2016 International Conference on Management of Data. 139–154

  13. [13]

    2017.Differential privacy: From theory to practice

    Ninghui Li, Min Lyu, Dong Su, and Weining Yang. 2017.Differential privacy: From theory to practice. Springer

  14. [14]

    Katrina Ligett, Moshe Shenfeld, Tomer Shoham, and Noa Velner- Harris. 2025. DIFFERENTIALLY PRIVATE NON-PARAMETRIC CON- FIDENCE INTERVALS.Journal of Privacy and Confidentiality(2025)

  15. [15]

    Jiaxiang Liu, Karl Knopf, Yiqing Tan, Bolin Ding, and Xi He. 2021. Catch a blowfish alive: a demonstration of policy-aware differential privacy for interactive data exploration.Proceedings of the VLDB Endowment14, 12 (2021), 2859–2862

  16. [16]

    Min Lyu, Dong Su, and Ninghui Li. 2017. Understanding the Sparse Vector Technique for Differential Privacy. 10, 6 (2017), 637–648. doi:10. 14778/3055330.3055331

  17. [17]

    Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. InProceedings of the 48th Annual IEEE Sympo- sium on Foundations of Computer Science(USA, 2007)(FOCS ’07). IEEE Computer Society, 94–103. doi:10.1109/FOCS.2007.41

  18. [18]

    Priyanka Nanayakkara, Johes Bater, Xi He, Jessica Hullman, and Jennie Rogers. 2022. Visualizing privacy-utility trade-offs in differentially private data releases.arXiv preprint arXiv:2201.05964(2022)

  19. [19]

    Liudas Panavas, Amit Sarker, Sara Di Bartolomeo, Ali Sarvghad, Cody Dunne, and Narges Mahyar. 2024. Illuminating the Landscape of Differential Privacy: An Interview Study on the Use of Visualization in Real-World Deployments.IEEE Transactions on Visualization and Computer Graphics(2024)

  20. [20]

    Dajun Sun, Wei Dong, and Ke Yi. 2023. Confidence Intervals for Private Query Processing.Proceedings of the VLDB Endowment17, 3 (2023), 373–385

  21. [21]

    Siyuan Xia, Beizhen Chang, Karl Knopf, Yihan He, Yuchao Tao, and Xi He. 2021. Dpgraph: A benchmark platform for differentially private graph analysis. InProceedings of the 2021 International Conference on Management of Data. 2808–2812. A Proofs A.1 Proof of Theorem 3.2 We first prove the following lemma. Lemma A.1.𝑓has sensitivity1. Formally: max 𝐷,𝐷 ′ ∈ ...