Equivalence Testing Under Privacy Constraints
Pith reviewed 2026-05-10 17:52 UTC · model grok-4.3
The pith
A simulation-calibrated procedure performs equivalence tests for means and proportions while satisfying differential privacy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DP-TOST conducts differentially private equivalence testing of means and proportions by injecting noise into the usual two one-sided test statistics and calibrating rejection thresholds via simulation so that the finite-sample type I error stays at the nominal alpha while power converges to the non-private benchmark as the privacy parameter or sample size increases.
What carries the argument
DP-TOST is a simulation-based calibration procedure that adds differential-privacy noise to the equivalence test statistic and determines critical values from Monte Carlo draws under the null.
If this is right
- Equivalence testing for means and proportions can be performed on sensitive data without exceeding the nominal type I error rate.
- Power of the private test increases toward the power of the corresponding non-private test as the privacy budget grows or as sample size increases.
- The same simulation-calibration approach works uniformly for both continuous means and binary proportions.
- The framework supplies a practical tool for privacy-preserving analyses in domains that require both statistical equivalence decisions and individual-level confidentiality.
Where Pith is reading between the lines
- The same noise-injection-plus-simulation strategy could be adapted to other two-sided or one-sided tests that currently lack private versions.
- Multi-site studies could adopt DP-TOST to compare treatment effects across institutions while releasing only the noisy summary statistics.
- Computational cost of the required simulations grows with desired precision, creating a practical trade-off between privacy strength and run time that users must manage.
Load-bearing premise
Simulation-based calibration must produce a sufficiently accurate approximation to the finite-sample distribution of the noisy test statistic.
What would settle it
A set of repeated simulations under the boundary null hypothesis in which the empirical rejection rate substantially exceeds the nominal alpha for any fixed privacy budget and sample size would falsify the type I error claim.
Figures
read the original abstract
Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DP-TOST, a unified framework for differentially private equivalence testing of means and proportions. It relies on simulation-based calibration to approximate the finite-sample null distribution of the noisy test statistic under differential privacy mechanisms (since the distribution is analytically intractable), and claims that the procedure maintains type-I error at the nominal level while achieving power comparable to the non-private TOST as the privacy budget or sample size grows, with demonstrations via numerical simulations and real-world applications.
Significance. If the simulation calibration can be shown to deliver reliable type-I error control, the work supplies a practical extension of equivalence testing to privacy-constrained settings, which is relevant for healthcare and other sensitive domains. It correctly builds on standard DP primitives rather than introducing ad-hoc mechanisms, and the empirical results suggest the power loss is modest for moderate privacy budgets. The absence of analytical error bounds on the Monte Carlo step, however, leaves the central guarantee on an empirical footing.
major comments (1)
- The claim of nominal type-I error control (abstract and methods) rests entirely on Monte Carlo approximation of the null distribution of the DP-perturbed test statistic. The manuscript provides no simulation count, Monte Carlo standard-error estimates, or convergence diagnostics for the calibrated critical values or p-value thresholds. This is load-bearing because the privacy noise renders the distribution intractable, and without reported precision on the approximation the control is only empirically suggested, especially near the equivalence boundary or for small ε and n where noise dominates sampling variability.
minor comments (2)
- The abstract states that simulations 'show type-I error control' but does not specify the range of ε, δ, n, or equivalence margins examined; adding a brief table or sentence summarizing the simulation grid would improve reproducibility.
- Notation for the privacy mechanism (e.g., Laplace or Gaussian noise scale) and the exact form of the calibrated rejection region should be stated explicitly in the main text rather than deferred to supplementary material.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We appreciate the recognition of DP-TOST as a practical extension of equivalence testing to privacy-constrained settings. We agree that the Monte Carlo calibration step requires more explicit reporting to support the type-I error claims and have prepared revisions accordingly.
read point-by-point responses
-
Referee: The claim of nominal type-I error control (abstract and methods) rests entirely on Monte Carlo approximation of the null distribution of the DP-perturbed test statistic. The manuscript provides no simulation count, Monte Carlo standard-error estimates, or convergence diagnostics for the calibrated critical values or p-value thresholds. This is load-bearing because the privacy noise renders the distribution intractable, and without reported precision on the approximation the control is only empirically suggested, especially near the equivalence boundary or for small ε and n where noise dominates sampling variability.
Authors: We agree that the manuscript should provide greater transparency on the Monte Carlo procedure used to calibrate critical values and p-value thresholds. In the revised manuscript we will explicitly state the number of Monte Carlo replications employed for each calibration, report Monte Carlo standard-error estimates for the resulting quantiles, and include convergence diagnostics (e.g., stability checks across increasing replication counts). These additions will be placed in the Methods section and highlighted in the simulation results, with particular attention to the regimes of small ε and n. While analytical error bounds on the Monte Carlo approximation remain intractable given the privacy-induced noise, the expanded reporting will make the empirical foundation of the type-I error control fully reproducible and quantifiable. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper constructs DP-TOST by combining standard differential privacy mechanisms (e.g., noise addition) with simulation-based calibration to handle the analytically intractable finite-sample null distribution of the test statistic. This is a standard, non-circular technique for Monte Carlo testing rather than a self-definitional loop or a fitted parameter renamed as a prediction. Type-I error control is asserted via numerical simulations, but the core derivation relies on established DP primitives and does not reduce to its own inputs by construction, self-citation chains, or imported uniqueness results. The approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The finite-sample distribution of the test statistic is analytically intractable under differential privacy
Reference graph
Works this paper leans on
-
[1]
D. J. Schuirmann. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability.Journal of Pharmacokinetics and Biopharmaceutics, 15(6):657–680, 1987
work page 1987
-
[2]
C. Metzler. Bioavailability – a problem in equivalence.Journal of Pharmaceutical Sciences, 30:309–317, 1974
work page 1974
-
[3]
W. J. Westlake. Use of confidence intervals in analysis of comparative bioavailability trials.Journal of Pharma- ceutical Sciences, 61(8):1340–1341, 1972
work page 1972
-
[4]
W. J. Westlake. Symmetrical confidence intervals for bioequivalence trials.Biometrics, 32:741–744, 1976
work page 1976
-
[5]
U.S. Food and Drug Administration. Bioequivalence studies with pharmacokinetic endpoints for drugs submitted under an ANDA. Technical report, Center for Drug Evaluation and Research (CDER), Silver Spring, MD, USA,
-
[6]
URLhttps://www.fda.gov/media/87219/download. Accessed: 2025-10-17
work page 2025
-
[7]
Guideline on the investigation of bioequivalence
European Medicines Agency. Guideline on the investigation of bioequivalence. Technical report, Committee for Medicinal Products for Human Use (CHMP), London, UK, 2010. URLhttps://www.ema.europa.eu/en/ documents/scientific-guideline/guideline-investigation-bioequivalence-rev1_en.pdf. Ac- cessed: 2025-10-17
work page 2010
-
[8]
M13a: Bioequivalence for immediate-release solid oral dosage forms
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). M13a: Bioequivalence for immediate-release solid oral dosage forms. Technical report, ICH, Geneva, Switzer- land, 2024. URLhttps://www.ich.org/page/m13-bioequivalence. Accessed: 2025-10-17
work page 2024
-
[9]
World Health Organization. Annex 6: Multisource (generic) pharmaceutical products: Guide- lines on registration requirements to establish interchangeability. Technical Report 992, World Health Organization, Geneva, Switzerland, 2017. URLhttps://www.who.int/docs/ default-source/medicines/norms-and-standards/guidelines/regulatory-standards/ trs992-annex6-wh...
work page 2017
-
[10]
Myles W. O’Brien and Derek S. Kimmerly. Is “not different” enough to conclude similar cardiovascular re- sponses across sexes?American Journal of Physiology-Heart and Circulatory Physiology, 322(3):H355–H358, 2022
work page 2022
-
[11]
F.M. Wehrle, T. Bartal, M. Adams, D. Bassler, C.F. Hagmann, O. Kretschmar, G. Natalucci, and B. Latal. Similarities and differences in the neurodevelopmental outcome of children with congenital heart disease and children born very preterm at school entry.The Journal of Pediatrics, 250:29–37.e1, 2022
work page 2022
-
[12]
P. Sansone, L.G. Giaccari, C. Aurilio, F. Coppolino, M.B. Passavanti, V . Pota, and M.C. Pace. Comparative efficacy of tapentadol versus tapentadol plus duloxetine in patients with chemotherapy-induced peripheral neu- ropathy.Cancers, 14:4002, 2022
work page 2022
-
[13]
M. Branscheidt, N. Ejaz, J. Xu, M. Widmer, M.D. Harran, J.C. Cort ´es, T. Kitago, P. Celnik, C. Hernandez- Castillo, J. Diedrichsen, A. Luft, and J.W. Krakauer. No evidence for motor-recovery-related cortical connectivity changes after stroke using resting-state fmri.Journal of Neurophysiology, 127:637–650, 2022
work page 2022
-
[14]
D. Lakens, A.M. Scheel, and P.M. Isager. Equivalence testing for psychological research: A tutorial.Advances in Methods and Practices in Psychological Science, 1:259–269, 2018
work page 2018
-
[15]
F. Feri, C. Giannetti, and P. Guarnieri. Risk-taking for others: An experiment on the role of moral discussion. Journal of Behavioral and Experimental Finance, 37:100735, 2023. 15 A PREPRINT
work page 2023
-
[16]
M. Aggarwal, J. Allen, A. Coppock, D. Frankowski, S. Messing, K. Zhang, J. Barnes, A. Beasley, H. Hantman, and S. Zheng. A 2 million-person, campaign-wide field experiment shows how digital advertising affects voter turnout.Nature Human Behaviour, pages 1–10, 2023
work page 2023
-
[17]
H. Sureshkumar, R. Xu, N. Erukulla, A. Wadhwa, and L. Zhao. “snap on” or not? a validation on the measure- ment tool in a virtual reality application.Journal of Digital Imaging, 35:692–703, 2022
work page 2022
-
[18]
M. Meyners. Equivalence Tests - A Review.Food Quality and Preference, 26:231–245, 2012
work page 2012
-
[19]
Gwena ¨el G.R. Leday, Jasper Engel, Jack H. V ossen, Ric C.H. de V os, and Hilko van der V oet. Multivariate equivalence testing for food safety assessment.Food and Chemical Toxicology, 170:113446, 2022
work page 2022
-
[20]
Scott J Richter and Carri Richter. A method for determining equivalence in industrial applications.Quality Engineering, 14(3):375–380, 2002
work page 2002
-
[21]
Nathan Moore, Richard Steger, Benjamin Bowers, and Adam Taylor. Investigation of ideal-ct device equivalence: Are all devices equal?Transportation Research Record, 2676(5):1–12, 2022
work page 2022
-
[22]
R. Mazzolari, S. Porcelli, D.J. Bishop, and D. Lakens. Myths and methodologies: The use of equivalence and non-inferiority tests for interventional studies in exercise physiology and sport science.Experimental Physiology, 107:201–212, 2022
work page 2022
-
[23]
T. Tango. Equivalence test and confidence interval for the difference in proportions for the paired-sample design. Statistics in Medicine, 17(8):891–908, 1998
work page 1998
-
[24]
J. J. Chen, Y . Tsong, and S.H. Kang. Tests for equivalence or noninferiority between two proportions.Drug Information Journal, 34:569–578, 2000
work page 2000
-
[25]
Wellek.Testing Statistical Hypotheses of Equivalence and Noninferiority
S. Wellek.Testing Statistical Hypotheses of Equivalence and Noninferiority. CRC Press, Boca Raton, FL, 2nd edition, 2010
work page 2010
-
[26]
Chapman and Hall/CRC, New York, 2nd edition, 2017
Scott D Patterson and Byron Jones.Bioequivalence and Statistics in Clinical Pharmacology. Chapman and Hall/CRC, New York, 2nd edition, 2017
work page 2017
-
[27]
R.L. Berger and J.C. Hsu. Bioequivalence trials, intersection-union tests and equivalence confidence sets.Sta- tistical Science, 11(4):283–319, November 1996
work page 1996
-
[28]
Z. ElSayed, A. Abdelgawad, and N. Elsayed. Cybersecurity and frequent cyber attacks on IoT devices in health- care: Issues and solutions.arXiv preprint arXiv:2501.11250, 2025
-
[29]
D.J. Solove and W. Hartzog. The great scrape: The clash between scraping and privacy.California Law Review, 113:1521–1580, 2025
work page 2025
-
[30]
Seun Solomon Bakare, Adekunle Oyeyemi Adeniyi, Chidiogo Uzoamaka Akpuokwe, and Nkechi Emmanuella Eneh. Data privacy laws and compliance: A comparative review of the eu gdpr and usa regulations.Computer Science & IT Research Journal, 5(3):528–543, 2024
work page 2024
-
[31]
N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V . Pearson, D. A. Stephan, S. F. Nelson, and D. W. Craig. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays.PLoS Genetics, 4(8):1–9, 2008
work page 2008
-
[32]
A. Narayanan and V . Shmatikov. Robust de-anonymization of large sparse datasets. InProceedings of the 2008 IEEE Symposium on Security and Privacy (SP 2008), pages 111–125, Oakland, CA, USA, 2008. Proceedings of the IEEE Symposium on Security and Privacy
work page 2008
-
[33]
M. Gotz, A. Machanavajjhala, G. Wang, X. Xiao, and J. Gehrke. Publishing search logs – a comparative study of privacy guarantees.IEEE transactions on knowledge and data engineering, 24(3):520–532, 2012
work page 2012
-
[34]
C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V . Sassone, and I. Wegener, editors,Automata, Languages and Programming, volume vol 4052 ofLecture Notes in Computer Science, pages 1–12, Berlin, Heidelberg, 2006. Springer. doi: 10.1007/11787006 1. URLhttps://doi.org/10.1007/11787006_1
-
[35]
C. Dwork and A. Roth. The algorithmic foundations of differential privacy.Foundations and Trends in Theoret- ical Computer Science, 9(3–4):211–407, 2014
work page 2014
-
[36]
V . Karwa and S. Vadhan. Finite sample differentially private confidence intervals. InProceedings of the 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), pages 44:1–44:9, Dagstuhl, Germany,
work page 2018
-
[37]
Finite Sample Differentially Private Confidence Intervals
Innovations in Theoretical Computer Science Conference (ITCS), Schloss Dagstuhl–Leibniz-Zentrum f¨ur Informatik. doi: 10.4230/LIPIcs.ITCS.2018.44. URLhttps://doi.org/10.4230/LIPIcs.ITCS.2018.44
-
[38]
J. A. Awan and A. Slavkovi ´c. Differentially private inference for binomial data.Journal of Privacy and Confi- dentiality, 10(1), 2020. 16 A PREPRINT
work page 2020
-
[39]
Ogonnaya Michael Romanus, Younes Boulaguiem, and Roberto Molinari. Fiducial matching: Differentially private inference for categorical data.arXiv preprint arXiv:2507.11762, 2025
-
[40]
M. Gaboardi, H. Lim, R. Rogers, and S. Vadhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. InProceedings of the 33rd International Conference on Machine Learning (ICML), pages 2111–2120. Proceedings of The 33rd International Conference on Machine Learning, PMLR, 2016
work page 2016
-
[41]
M. Aliakbarpour, I. Diakonikolas, and R. Rubinfeld. Differentially private identity and equivalence testing of discrete distributions. InProceedings of the 35th International Conference on Machine Learning (ICML 2018), pages 169–178, Stockholm, Sweden, 2018. International Conference on Machine Learning (ICML), PMLR. URLhttps://proceedings.mlr.press/v80/al...
work page 2018
-
[42]
D. Omer and O. Sheffet. Differentially private equivalence testing for continuous distributions and applications. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, 2024. Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)
work page 2024
-
[43]
S. Orso, M. Karemera, M.-P. Victoria-Feser, and S. Guerrier. An accurate percentile method for parametric inference based on asymptotically biased estimators.arXiv preprint, 2024
work page 2024
-
[44]
C. Gourieroux, A. Monfort, and E. Renault. Indirect inference.Journal of Applied Econometrics, 8(S1):S85– S118, 1993
work page 1993
-
[45]
S. M. Hammer, D. A. Katzenstein, M. D. Hughes, H. Gundacker, R. T. Schooley, R. H. Haubrich, W. K. Henry, M. M. Lederman, J. P. Phair, M. Niu, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter.New England Journal of Medicine, 335(15):1081–1090, 1996
work page 1996
-
[46]
Lawrence Barker, Henry Rolka, Deborah Rolka, and Cedric Brown. Equivalence testing for binomial random variables: Which test to use?The American Statistician, 55(4):279–287, November 2001
work page 2001
-
[47]
I. Dinur and K. Nissim. Revealing information while preserving privacy. InProceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202–210. PODS ’03: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2003
work page 2003
-
[48]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors,Theory of Cryptography Conference, volume vol 3876 ofLecture Notes in Computer Science, pages 265–284, Berlin, Heidelberg, 2006. Springer. doi: 10.1007/11681878 14. URL https://doi.org/10.1007/11681878_14
-
[49]
D. Alabi and S. Vadhan. Hypothesis testing for differentially private linear regression. InAdvances in Neu- ral Information Processing Systems 35 (NeurIPS 2022), pages 14196–14209. Advances in Neural Information Processing Systems 35 (NeurIPS), 2022
work page 2022
-
[50]
St ´ephane Guerrier, Elise Dupuis-Lozeron, Yanyuan Ma, and Maria-Pia Victoria-Feser. Simulation-based bias correction methods for complex models.Journal of the American Statistical Association, 114(525):146–157, 2019
work page 2019
-
[51]
M. Juraska, P. B. Gilbert, X. Lu, M. Zhang, M. Davidian, and A. A. Tsiatis.speff2trial: Semiparametric Efficient Estimation for a Two-Sample Treatment Effect, 2025. URLhttps://github.com/mjuraska/speff2trial. R package version 1.0.5
work page 2025
-
[52]
N. P. Terry. Protecting patient privacy in the age of big data.UMKC Law Review, 81:385–415, 2012
work page 2012
- [53]
-
[54]
W. Xia, W. Liu, Z. Wan, Y . V orobeychik, M. Kantarcioglu, S. Nyemba, E. W. Clayton, and B. A. Malin. Enabling realistic health data re-identification risk assessment through adversarial modeling.Journal of the American Medical Informatics Association, 28(4):744–752, 2021
work page 2021
-
[55]
S. A. Battistini Garcia, M. Zubair, and N. Guzman. Cd4 cell count and hiv. InStatPearls [Internet], Treasure Island (FL), January 2025. StatPearls [Internet], StatPearls Publishing
work page 2025
-
[56]
W. Liu, Y . Zhang, H. Yang, and Q. Meng. A survey on differential privacy for medical data analysis.Annals of Data Science, pages 1–15, 2023
work page 2023
-
[57]
D. Hauschke, V . Steinijans, and I. Pigeot.Bioequivalence Studies in Drug Development: Methods and Applica- tions. John Wiley & Sons, Ltd, Chichester, UK, 2007
work page 2007
-
[58]
T. Hoffelder, R. G ¨ossl, and S. Wellek. Multivariate equivalence tests for use in pharmaceutical development. Journal of Biopharmaceutical Statistics, 25(3):417–437, 2015. 17 A PREPRINT
work page 2015
-
[59]
P. Pallmann and T. Jaki. Simultaneous confidence regions for multivariate bioequivalence.Statistics in Medicine, 36(29):4585–4603, 2017. 18
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.