pith. sign in

arxiv: 2511.01303 · v2 · pith:YBQVDZVAnew · submitted 2025-11-03 · 💻 cs.CR · cs.LG

Differentially Private Nonparametric Confidence Intervals Under Minimal Distributional Assumptions

Pith reviewed 2026-05-18 02:01 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords differential privacyconfidence intervalsnonparametric inferenceresamplingbootstrapprivacy-preserving statistics
0
0 comments X

The pith

Resampling any qualifying private estimator produces asymptotically valid and tight nonparametric confidence intervals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a resampling framework that turns any differentially private estimator meeting mild conditions into a nonparametric confidence interval for an arbitrary quantity. It works by drawing repeated subsamples from the data, running the private estimator on each subsample, and converting the collection of outputs into a confidence interval via their empirical distribution function. The approach requires no assumption of asymptotic normality and is not tied to any particular privacy mechanism. A reader would care because the method supplies a general, finite-sample-friendly route to private inference even when the target statistic is non-smooth or the underlying distribution is difficult.

Core claim

Our method repeatedly subsamples the data, applies the private estimator to each subset, and post-processes the resulting empirical CDF into a CI. We prove that the empirical CDF induced by our procedure converges to the sampling distribution of the private statistic, which implies that the resulting CI is asymptotically valid and tight.

What carries the argument

The black-box resampling procedure that builds an empirical CDF from private estimates on data subsamples.

If this is right

  • The resulting intervals are asymptotically valid for the target quantity.
  • The intervals are asymptotically tight.
  • The procedure applies to arbitrary target quantities and any private estimator meeting the conditions.
  • Empirical performance improves over existing general-purpose private CI methods, especially for non-smooth functionals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subsampling idea could be adapted to produce private p-values or hypothesis tests.
  • Hyperparameter selection rules might be refined by studying the rate at which the empirical CDF converges.
  • The framework could be combined with variance-reduction techniques from non-private resampling to lower the privacy cost.

Load-bearing premise

The private estimator must satisfy the paper's mild conditions for the empirical CDF to converge to the true sampling distribution.

What would settle it

A Monte Carlo experiment in which the constructed intervals achieve coverage far from the nominal level when the private estimator is chosen to violate the mild conditions.

Figures

Figures reproduced from arXiv: 2511.01303 by Katrina Ligett, Moshe Shenfeld, Noa Velner-Harris, Tomer Shoham.

Figure 1
Figure 1. Figure 1: A comparison of our method (PrivSub) to the other known general, non-parametric DP CI method—the BLB-based method (BLBquant [7]). We include two baselines: the private baseline tailored to the median (ExpMech [11]) and the non-private baseline (bootstrapping) and study 0.9-CI estimation of the median for the (truncated) normal, exponential, and Gaussian mixture distributions under (5, 0)-DP. A detailed dis… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical CDF of the median from a single run of [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: We compare our method (PrivSub) against two baselines: the private baseline tailored to the median (ExpMech [11]) and the non-private baseline (bootstrapping). We evaluate 1 −α = 0.9-CI estimation of the median for the (truncated) normal, exponential, and Gaussian mixture distributions under (2, 0)-DP. A detailed discussion appears in Section 4.4. C.5 Mean estimation We repeat the experiment in Section 4 w… view at source ↗
Figure 4
Figure 4. Figure 4: A comparison of our method (PrivSub) in terms of CI width (top row) and coverage (bottom row) for the median under εt = 5. We include two baselines: the private baseline tailored to the mean (Laplace noise addition mechanism; see A.6) and the non-private baseline (bootstrapping). We study 0.9-CI estimation of the mean for three distributions as described in the figure, where R denotes the truncation range.… view at source ↗
Figure 5
Figure 5. Figure 5: A comparison of our method (PrivSub) in terms of CI width (top row) and coverage (bottom row) for the median under εt = 2. We include two baselines: the private baseline tailored to the mean (Laplace noise addition mechanism; see A.6) and the non-private baseline (bootstrapping). We study 0.9-CI estimation of the mean for three distributions as described in the figure, where R denotes the truncation range.… view at source ↗
Figure 6
Figure 6. Figure 6: A comparison of CI width (left) and coverage (right) for median estimation under [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A comparison of CI width (left) and coverage (right) for median estimation under [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A comparison of CI width (top) and coverage (bottom) for median estimation under [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A comparison of CI width (top) and coverage (bottom) for median estimation under [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Empirical CDFs of the median under different distributions with [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
read the original abstract

We consider the problem of constructing differentially private nonparametric confidence intervals (CIs) for an arbitrary quantity using resampling. A growing body of work has adapted resampling ideas to the private setting, including private bootstrap methods \cite{brawner2018bootstrap, wang2025differentially,dette2025gaussian} and BLB-based subsample-and-aggregate approaches \cite{covington2025unbiased, chadha2024resampling}. However, existing methods typically rely on strong assumptions, such as asymptotic normality, or are tied to specific privacy mechanisms such as noise addition, and can be impractical in finite-sample regimes. We address these problems by introducing a simple, general framework that can convert any differentially private estimator satisfying mild conditions into a differentially private nonparametric CI for arbitrary target quantities. Our method repeatedly subsamples the data, applies the private estimator to each subset, and post-processes the resulting empirical CDF into a CI. The framework is black-box, and does not require a specific limiting distribution. We prove that the empirical CDF induced by our procedure converges to the sampling distribution of the private statistic, which implies that the resulting CI is asymptotically valid and tight, and provide heuristic guidance for choosing the hyperparameters. Empirically, our method outperforms competing general approaches, especially for non-smooth functionals and more challenging distributions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a general resampling framework for constructing differentially private nonparametric confidence intervals for arbitrary target quantities. The method repeatedly subsamples the data, applies a black-box differentially private estimator to each subsample, and post-processes the empirical CDF of the resulting private statistics into a CI. It proves that this empirical CDF converges to the sampling distribution of the private statistic under mild conditions on the estimator, implying asymptotic validity and tightness of the intervals, and reports empirical outperformance over prior private bootstrap and subsample-and-aggregate approaches, especially for non-smooth functionals and challenging distributions.

Significance. If the convergence result holds under the stated mild conditions, the work provides a flexible, assumption-light alternative to existing private resampling methods that often require asymptotic normality or specific mechanisms. The black-box applicability to qualifying DP estimators and the nonparametric nature could enable private inference for a broader class of statistics. The empirical comparisons and heuristic guidance for hyperparameters add practical value, though the significance hinges on the conditions being both weak and verifiable.

major comments (2)
  1. [Abstract and proposed method paragraph] Abstract and paragraph on the proposed method: the central claim that the framework converts 'any differentially private estimator satisfying mild conditions' into a valid nonparametric CI rests on convergence of the empirical CDF to the sampling distribution of the private statistic. The precise statement of these mild conditions (e.g., requirements on bias, consistency, or privacy-utility tradeoff of the estimator) is not formalized with explicit assumptions or a dedicated theorem, making it difficult to verify applicability to standard mechanisms such as Laplace or Gaussian noise addition. This is load-bearing for the advertised generality.
  2. [Convergence proof] Convergence argument: while the proof sketch invokes standard resampling ideas, the interaction between the privacy parameter, subsample size, and the rate of convergence to the sampling distribution is not quantified. Without this, the claim of asymptotic tightness cannot be fully assessed, particularly in finite-sample regimes where the method is advertised as practical.
minor comments (2)
  1. [Hyperparameter selection] The heuristic guidance for choosing subsampling hyperparameters is mentioned but would benefit from more concrete recommendations or sensitivity analysis tied to the privacy budget.
  2. [Method description] Notation for the empirical CDF and the resulting CI construction could be clarified with an explicit algorithmic description or pseudocode to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our framework. We address each major comment below and have revised the manuscript accordingly to strengthen the formalization of conditions and the discussion of convergence rates.

read point-by-point responses
  1. Referee: [Abstract and proposed method paragraph] Abstract and paragraph on the proposed method: the central claim that the framework converts 'any differentially private estimator satisfying mild conditions' into a valid nonparametric CI rests on convergence of the empirical CDF to the sampling distribution of the private statistic. The precise statement of these mild conditions (e.g., requirements on bias, consistency, or privacy-utility tradeoff of the estimator) is not formalized with explicit assumptions or a dedicated theorem, making it difficult to verify applicability to standard mechanisms such as Laplace or Gaussian noise addition. This is load-bearing for the advertised generality.

    Authors: We agree that the conditions require more explicit formalization to support the claimed generality. In the revised manuscript, we have added a dedicated Theorem 1 in Section 3 that states the precise assumptions: the private estimator must be consistent for the target functional at rate o(1) as subsample size grows, and the privacy-utility tradeoff must permit the subsample size m to satisfy m = ω(log n / ε) while remaining o(n). We also include a new subsection with explicit verification for the Laplace and Gaussian mechanisms, showing that both satisfy the conditions under standard parameter choices. This directly addresses applicability and removes ambiguity in the abstract and method description. revision: yes

  2. Referee: [Convergence proof] Convergence argument: while the proof sketch invokes standard resampling ideas, the interaction between the privacy parameter, subsample size, and the rate of convergence to the sampling distribution is not quantified. Without this, the claim of asymptotic tightness cannot be fully assessed, particularly in finite-sample regimes where the method is advertised as practical.

    Authors: The main result establishes convergence in probability of the empirical CDF to the true sampling distribution of the private statistic as n → ∞ under the conditions of Theorem 1, which is sufficient for asymptotic validity and tightness of the resulting intervals. We acknowledge that explicit finite-sample rates are not derived in the original version. In revision, we have expanded the proof sketch in the appendix to include a remark quantifying the dependence: the convergence rate is governed by the estimator's own consistency rate plus an additive term of order exp(-c m ε) arising from privacy noise concentration, with m chosen as a function of ε and n. We also added finite-sample simulation results for moderate n to illustrate practical behavior, though a complete non-asymptotic bound would require stronger moment assumptions on the estimator and is left for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: convergence proof is independent of fitted inputs or self-referential definitions

full rationale

The paper's central derivation is a proof that the empirical CDF from repeated private subsampling converges to the sampling distribution of the private statistic under mild conditions on any black-box DP estimator. This relies on standard resampling convergence arguments applied to the given private estimator, without reducing the claimed asymptotic validity or tightness to a fitted parameter, self-definition, or load-bearing self-citation by construction. The framework is presented as general and the proof is self-contained against external benchmarks, with no quoted equations or steps in the provided text exhibiting the specific reductions required for a circularity flag.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of mild conditions that the base private estimator must satisfy and on standard probabilistic convergence of the empirical distribution function to the true sampling distribution; no new entities are postulated.

free parameters (1)
  • subsampling hyperparameters
    The abstract states that heuristic guidance is provided for choosing them, implying they are selected in practice rather than derived from first principles.
axioms (1)
  • domain assumption Any differentially private estimator satisfying mild conditions can be converted into a valid nonparametric CI via the described resampling procedure
    This premise is invoked in the abstract as the starting point for the framework.

pith-pipeline@v0.9.0 · 5775 in / 1410 out tokens · 54016 ms · 2026-05-18T02:01:30.643462+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

  1. [1]

    and Duchi, J

    Asi, H. and Duchi, J. C. (2020). Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms.Advances in neural information processing systems, 33:14106–14117

  2. [2]

    Balle, B., Barthe, G., and Gaboardi, M. (2018). Privacy amplification by subsampling: Tight analyses via couplings and divergences.Advances in neural information processing systems, 31

  3. [3]

    N., and Romano, J

    Bertail, P., Politis, D. N., and Romano, J. P. (1999). On subsampling estimators with unknown rate of convergence.Journal of the American Statistical Association, 94(446):569–579

  4. [4]

    J., Götze, F., and van Zwet, W

    Bickel, P. J., Götze, F., and van Zwet, W. R. (2012).Resampling fewer than n observations: gains, losses, and remedies for losses. Springer

  5. [5]

    and Honaker, J

    Brawner, T. and Honaker, J. (2018). Bootstrap inference and differential privacy: Standard errors for free.Unpublished Manuscript

  6. [6]

    and Steinke, T

    Bun, M. and Steinke, T. (2019). Average-case averages: Private algorithms for smooth sensitivity and mean estimation.Advances in Neural Information Processing Systems, 32

  7. [7]

    Chadha, K., Duchi, J., and Kuditipudi, R. (2024). Resampling methods for private statistical inference.arXiv preprint arXiv:2402.07131

  8. [8]

    Chaudhuri, K., Monteleoni, C., and Sarwate, A. D. (2011). Differentially private empirical risk minimization.Journal of Machine Learning Research, 12(3)

  9. [9]

    Covington, C., He, X., Honaker, J., and Kamath, G. (2025). Unbiased statistical estimation and valid confidence intervals under differential privacy.Statistica Sinica, 35:651–670

  10. [10]

    Dong, W., Liang, Y., and Yi, K. (2022). Differentially private covariance revisited.Advances in Neural Information Processing Systems, 35:850–861

  11. [11]

    Drechsler, J., Globus-Harris, I., Mcmillan, A., Sarathy, J., and Smith, A. (2022). Nonparametric differentially private confidence intervals for the median.Journal of Survey Statistics and Methodology, 10(3):804–829

  12. [12]

    Du, W., Foot, C., Moniot, M., Bray, A., and Groce, A. (2020). Differentially private confidence intervals.arXiv preprint arXiv:2001.02285

  13. [13]

    Durfee, D. (2023). Unbounded differentially private quantile and maximum estimation.Advances in Neural Information Processing Systems, 36:77691–77712

  14. [14]

    Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. InTheory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer

  15. [15]

    and Roth, A

    Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy.Foundations and Trends®in Theoretical Computer Science, 9(3–4):211–407

  16. [16]

    N., and Vadhan, S

    Dwork, C., Rothblum, G. N., and Vadhan, S. (2010). Boosting and differential privacy. In2010 IEEE 51st annual symposium on foundations of computer science, pages 51–60. IEEE

  17. [17]

    Efron, B. (1992). Bootstrap methods: another look at the jackknife. InBreakthroughs in statistics: Methodology and distribution, pages 569–593. Springer. 12

  18. [18]

    Gillenwater, J., Joseph, M., and Kulesza, A. (2021). Differentially private quantiles. In International Conference on Machine Learning, pages 3713–3722. PMLR

  19. [19]

    and Price, E

    Hardt, M. and Price, E. (2014). The noisy power method: A meta algorithm with applications. Advances in neural information processing systems, 27

  20. [20]

    Kaplan, H., Schnapp, S., and Stemmer, U. (2022). Differentially private approximate quantiles. InInternational Conference on Machine Learning, pages 10751–10761. PMLR

  21. [21]

    and Vadhan, S

    Karwa, V. and Vadhan, S. (2018). Finite sample differentially private confidence intervals. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), pages 44–1. Schloss Dagstuhl–Leibniz-Zentrum für Informatik

  22. [22]

    Kleiner, A., Talwalkar, A., Sarkar, P., and Jordan, M. I. (2014). A scalable bootstrap for massive data.Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(4):795–816

  23. [23]

    Lei, J. (2011). Differentially private m-estimators.Advances in neural information processing systems, 24

  24. [24]

    and Talwar, K

    McSherry, F. and Talwar, K. (2007). Mechanism design via differential privacy. In48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE

  25. [25]

    Nissim, K., Raskhodnikova, S., and Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. InProceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84. [Politis et al.]Politis, D. N., Romano, J. P., and Wolf, M. Subsampling in the iid case. InSubsampling, pages 39–64. Springer

  26. [26]

    Serfling, R. J. (2009).Approximation theorems of mathematical statistics. John Wiley & Sons

  27. [27]

    and Ligett, K

    Shoham, T. and Ligett, K. (2025). Differentially private ratio statistics.arXiv preprint arXiv:2505.20351

  28. [28]

    and Rinott, Y

    Shoham, T. and Rinott, Y. (2022). Asking the proper question: Adjusting queries to statistical procedures under differential privacy. InInternational Conference on Privacy in Statistical Databases, pages 46–61. Springer

  29. [29]

    Smith, A. (2008). Efficient, differentially private point estimators.arXiv preprint arXiv:0809.4794

  30. [30]

    Smith, A. (2011). Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 813–822

  31. [31]

    Wang, Y., Kifer, D., Lee, J., and Karwa, V. (2018). Statistical approximating distributions under differential privacy.Journal of Privacy and Confidentiality, 8(1)

  32. [32]

    Wang, Y.-X. (2018). Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain.arXiv preprint arXiv:1803.02596

  33. [33]

    Wang, Z., Cheng, G., and Awan, J. (2022). Differentially private bootstrap: New privacy analysis and inference strategies.arXiv preprint arXiv:2210.06140

  34. [34]

    bad events

    Zhang, J., Zhang, Z., Xiao, X., Yang, Y., and Winslett, M. (2012). Functional mechanism: regression analysis under differential privacy.Proceedings of the VLDB Endowment, 5(11):1364– 1375. 13 Supplementary Materials A Differential privacy LetΩbe an abstract data domain. A dataset of sizen is a collection ofn individuals’ data records: ω = {ωi}n i=1 ∈ Ωn. ...