pith. machine review for the scientific record. sign in

arxiv: 2512.18632 · v2 · submitted 2025-12-21 · 💻 cs.CR

Recognition: 1 theorem link

· Lean Theorem

Multi-user Pufferfish Privacy

Authors on Pith no claims yet

Pith reviewed 2026-05-16 20:46 UTC · model grok-4.3

classification 💻 cs.CR
keywords multi-user pufferfish privacyLaplace noise calibrationWasserstein metricstatistical indistinguishabilitytabular databinary random variables
0
0 comments X

The pith

Sufficient conditions from Wasserstein metrics enable Laplace noise calibration for Pufferfish privacy in multi-user aggregated queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives sufficient conditions to calibrate Laplace noise in aggregated queries from a multi-user system to achieve Pufferfish privacy. This guarantees statistical indistinguishability for four sets of secret pairs when a user changes their reported data, leaves the system, or gets replaced by another user with different randomness. The conditions come from the Kantorovich method using the Wasserstein metric of order 1. They apply to attaining privacy when adding or removing a class of users from tabular data. Privacy for an individual is conditioned only on that user's statistics, and for binary variables the conditions relax to allow less noise.

Core claim

Sufficient conditions are derived for attaining statistical indistinguishability on four sets of secret pairs in all scenarios of user data changes using the Kantorovich method with the Wasserstein metric of order 1. These can be applied to user addition or removal in tabular data, revealing that individual indifference is conditioned on the user's statistics only. For binary Bernoulli distributed random variables, the conditions can be relaxed to reduce noise and improve utility.

What carries the argument

Laplace noise calibration via the Kantorovich method and Wasserstein metric of order 1 to enforce Pufferfish privacy on secret pairs.

If this is right

  • The noise calibration works for scenarios where users leave and are replaced.
  • Privacy protection applies when adding or removing users from tabular data.
  • Indistinguishability depends solely on the statistics of the user in question.
  • Relaxed conditions for binary data reduce required noise levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This calibration technique might apply to other privacy definitions using similar distance metrics.
  • In practice, it could lead to more efficient privacy mechanisms in dynamic user databases.
  • Further relaxation might be possible for other discrete distributions beyond binary.

Load-bearing premise

The random variables reported by users have distributions that allow the Wasserstein metric to directly bound the privacy loss for the considered secret pairs.

What would settle it

If computing the Wasserstein distance between the distributions of secret pairs does not yield the expected bound on the privacy parameter when Laplace noise is added, the sufficient conditions would not hold.

Figures

Figures reproduced from arXiv: 2512.18632 by Ni Ding, Songpei Lu, Wenjing Yang, Zijian Zhang.

Figure 1
Figure 1. Figure 1: Assume the first three users in Table 2 and con [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Assume the first three users in Table 2 and consider [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: For the experiment in Figure 3, we use Brent method [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Two values of 𝜃 produced by (13) and (14) in Theorem 3 for achieving 𝜖-pufferfish privacy on S𝑃,⊥ = {(𝑃𝑖 , ⊥𝑖)} on three data sets in the UCI machine learning repository, where 𝑃𝑖(·) = Pr(education = ·|race = ‘white’), 𝑃𝑖(·) = Pr(romatic = ·|higher = ‘yes’) and 𝑃𝑖(·) = Pr(marital = ·|loan = ‘yes’) for dataset adult, student performance and bank marketing, respectively. query answer 𝑓 (𝐷). Theorem below sho… view at source ↗
Figure 6
Figure 6. Figure 6: For the first three users in Table 2, consider adding [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: For the same setting in Figure 7, we apply Lemma 2 [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Two values of 𝜃 produced by Lemma 1 and Lemma 2 for achieving 𝜖-pufferfish privacy on S𝑃,𝑄 = {(𝑃𝑖 , 𝑄𝑖)} on three datasets in the UCI machine learning repository. For adult, 𝑃𝑖(·) = Pr(relationship = ·|race = ‘Asian-Pac-Islander’) and 𝑄𝑖(·) = Pr(relationship = ·|race = ‘Amer-Indian-Eskimo’); For student performance, 𝑃𝑖(·) = Pr(romance = ·|freetime = 5) and 𝑄𝑖(·) = Pr(romance = ·|freetime = 2); For bank mar… view at source ↗
read the original abstract

This paper studies how to achieve individual indistinguishability by pufferfish privacy in aggregated query to a multi-user system. It is assumed that each user reports realization of a random variable. We study how to calibrate Laplace noise, added to the query answer, to attain pufferfish privacy when user changes his/her reported data value, leaves the system and is replaced by another use with different randomness. Sufficient conditions are derived for all scenarios for attaining statistical indistinguishability on four sets of secret pairs. They are derived using the existing Kantorovich method (Wasserstain metric of order $1$). These results can be applied to attain indistinguishability when a certain class of users is added or removed from a tabular data. It is revealed that attaining indifference in individual's data is conditioned on the statistics of this user only. For binary (Bernoulli distributed) random variables, the derived sufficient conditions can be further relaxed to reduce the noise and improve data utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies achieving individual indistinguishability via Pufferfish privacy for aggregated queries in multi-user systems. Assuming each user reports a realization of a random variable, it calibrates Laplace noise to attain privacy when a user changes data, leaves, or is replaced. Sufficient conditions are derived using the Kantorovich method (Wasserstein metric of order 1) for four sets of secret pairs. These apply to adding/removing users from tabular data, with relaxation for Bernoulli variables to reduce noise.

Significance. If the derived conditions hold and the independence assumptions are verified, this provides a practical method for calibrating noise in dynamic multi-user aggregates to achieve Pufferfish privacy based solely on individual user statistics. It extends existing Wasserstein-based approaches to user replacement scenarios and offers utility improvements for binary data, which is relevant for applications in collaborative sensing or data sharing where user sets change.

major comments (2)
  1. Replacement scenario: The claim that statistical indifference depends only on the statistics of the replaced user requires that reports are independent and the aggregate query is linear (e.g., sum). The manuscript does not explicitly verify that the Kantorovich transport plan remains valid under the multi-user joint distribution, which could make the Wasserstein-1 distance depend on the full joint law rather than the marginal alone.
  2. Sufficient conditions section: The abstract states that conditions are derived for all scenarios using the existing Kantorovich method, but no equation or derivation is shown to confirm that the resulting bounds support the indistinguishability claims without gaps when reports may be correlated.
minor comments (1)
  1. The abstract contains a typo: 'Wasserstain' should be 'Wasserstein'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback. We address each major comment below and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: Replacement scenario: The claim that statistical indifference depends only on the statistics of the replaced user requires that reports are independent and the aggregate query is linear (e.g., sum). The manuscript does not explicitly verify that the Kantorovich transport plan remains valid under the multi-user joint distribution, which could make the Wasserstein-1 distance depend on the full joint law rather than the marginal alone.

    Authors: We agree that the replacement scenario analysis relies on the assumptions of independent user reports and a linear aggregate query (e.g., summation). Under these conditions, the joint distribution factors such that the Wasserstein-1 distance between the aggregates depends only on the marginal distribution of the replaced user's report, allowing the Kantorovich transport plan to be constructed from individual statistics alone. We will add an explicit statement of these assumptions together with a brief verification of the transport plan reduction in the revised manuscript. revision: yes

  2. Referee: Sufficient conditions section: The abstract states that conditions are derived for all scenarios using the existing Kantorovich method, but no equation or derivation is shown to confirm that the resulting bounds support the indistinguishability claims without gaps when reports may be correlated.

    Authors: The sufficient conditions and Kantorovich derivations appear in Section 3 for the four secret-pair sets across all scenarios. These derivations implicitly rely on report independence, and we did not explicitly address correlated reports or include equations confirming the bounds in that case. We will revise the section to insert the intermediate equations showing the reduction to marginals under independence and to state clearly that the results assume independence (with a note that correlations fall outside the current scope). revision: yes

Circularity Check

0 steps flagged

No circularity: applies external Kantorovich/Wasserstein-1 method to derive conditions for new multi-user scenarios

full rationale

The paper derives sufficient conditions for pufferfish privacy under user addition/removal/replacement by directly invoking the existing Kantorovich duality for the Wasserstein-1 metric on the aggregate query output. This is an application of a standard external tool to four specific secret-pair sets; the resulting bounds are not obtained by fitting parameters to the target indistinguishability statement itself, nor by renaming a known result, nor by load-bearing self-citation. The derivation chain therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; the ledger is therefore incomplete. The model assumes users report realizations of random variables whose distributions permit Wasserstein calibration. No free parameters or invented entities are explicitly named in the abstract.

free parameters (1)
  • Laplace noise scale
    Calibrated via the derived sufficient conditions; exact functional form not visible in abstract.
axioms (2)
  • domain assumption Each user reports a realization of a random variable
    Explicitly stated as the system model in the abstract.
  • domain assumption Privacy is defined over four specific sets of secret pairs
    The indistinguishability targets are taken as given for the derivation.

pith-pipeline@v0.9.0 · 5458 in / 1457 out tokens · 42482 ms · 2026-05-16T20:46:30.797886+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. $\alpha$-Wasserstein Mechanism for R\'{e}nyi Pufferfish Privacy

    cs.CR 2026-05 unverdicted novelty 7.0

    The α-Wasserstein mechanism calibrates noise for exact Rényi Pufferfish Privacy by bounding the Wasserstein metric, generalizing the W_∞ pufferfish mechanism and Rényi differential privacy results.

  2. R\'enyi Pufferfish Privacy with Gaussian-based Priors: From Single Gaussian to Mixture Model

    cs.CR 2026-04 unverdicted novelty 6.0

    Gaussian mechanisms for Rényi Pufferfish Privacy under Gaussian and mixture priors deliver exact divergence derivations, closed-form sufficient conditions, and 48.9% less noise than additive baselines on statistical a...

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Alvim, Kostas Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith

    M’rio S. Alvim, Kostas Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith. 2012. Measuring Information Leakage Using Generalized Gain Functions. In2012 IEEE 25th Computer Security Foundations Symposium. IEEE, 265–279. https://doi.org/10.1109/csf.2012.26 12

  2. [2]

    Arthur Asuncion and David Newman. 2007. UCI machine learning repository https://archive.ics.uci.edu/ml/index.php. https://archive.ics.uci.edu/ml/index. php

  3. [3]

    R. P. Brent. 1971. An algorithm with guaranteed convergence for finding a zero of a function.Comput. J.14, 4 (April 1971), 422–425. https://doi.org/10.1093/ comjnl/14.4.422

  4. [4]

    Thierry Champion, Luigi De Pascale, and Petri Juutinen. 2008. The ∞- Wasserstein Distance: Local Solutions and Existence of Optimal Transport Maps. SIAM Journal on Mathematical Analysis40, 1 (2008), 1–20

  5. [5]

    Clarkson, Andrew C

    Michael R. Clarkson, Andrew C. Myers, and Fred B. Schneider. 2009. Quantifying information flow with beliefs.Journal of Computer Security17, 5 (Oct. 2009), 655–701. https://doi.org/10.3233/jcs-2009-0353

  6. [6]

    Henry Corrigan-Gibbs and Dan Boneh. 2017. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. In14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 259–282. https://www.usenix.org/conference/nsdi17/technical-sessions/ presentation/corrigan-gibbs

  7. [7]

    Paul Cuff and Lanqing Yu. 2016. Differential Privacy as a Mutual Information Constraint. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). ACM, 43–54. https://doi.org/10.1145/2976749. 2978308

  8. [8]

    Luigi De Pascale and Jean Louet. 2019. A study of the dual problem of the one-dimensional 𝐿∞-optimal transport problem with applications.Journal of Functional Analysis276, 11 (2019), 3304–3324

  9. [9]

    Ni Ding. 2022. Kantorovich Mechanism for Pufferfish Privacy. InProceed- ings of The 25th International Conference on Artificial Intelligence and Statis- tics (Proceedings of Machine Learning Research), Gustau Camps-Valls, Fran- cisco J. R. Ruiz, and Isabel Valera (Eds.), Vol. 151. PMLR, 5084–5103. https: //proceedings.mlr.press/v151/ding22b.html

  10. [10]

    Ni Ding. 2024. Approximation of Pufferfish Privacy for Gaussian Priors.IEEE Transactions on Information Forensics and Security19 (2024), 5630–5640. https: //doi.org/10.1109/tifs.2024.3402104

  11. [11]

    Ni Ding, Yucheng Liu, and Farhad Farokhi. 2021. A Linear Reduction Method for Local Differential Privacy and Log-lift. In2021 IEEE International Symposium on Information Theory (ISIT). Melbourne, 551–556

  12. [12]

    Jinshuo Dong, Aaron Roth, and Weijie J. Su. 2022. Gaussian Differential Privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology84, 1 (Feb. 2022), 3–37. https://doi.org/10.1111/rssb.12454

  13. [13]

    Flávio du Pin Calmon and Nadia Fawaz. 2012. Privacy against statistical infer- ence. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). 1401–1408. https://doi.org/10.1109/Allerton.2012.6483382

  14. [14]

    Duchi, Michael I

    John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local Pri- vacy and Statistical Minimax Rates. In2013 IEEE 54th Annual Symposium on Foundations of Computer Science. 429–438. https://doi.org/10.1109/FOCS.2013.53

  15. [15]

    Cynthia Dwork. 2006. Differential Privacy. InAutomata, Languages and Pro- gramming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–12

  16. [16]

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating Noise to Sensitivity in Private Data Analysis. InTheory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265–284

  17. [17]

    Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- ential privacy.Found. Trends Theor. Comput. Sci.9, 3-4 (2014), 211–407

  18. [18]

    I. Issa, A. B. Wagner, and S. Kamath. 2020. An Operational Approach to Infor- mation Leakage.IEEE Transactions on Information Theory66, 3 (March 2020), 1625–1657

  19. [19]

    Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. InProceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD/PODS ’11). ACM, 193–204. https://doi.org/10.1145/1989323. 1989345

  20. [20]

    Daniel Kifer and Ashwin Machanavajjhala. 2012. A Rigorous and Customizable Framework for Privacy. InProceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems(Scottsdale, Arizona, USA)(PODS ’12). Association for Computing Machinery, New York, NY, USA, 77–88

  21. [21]

    Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A Framework for Mathematical Privacy Definitions.ACM Transactions on Database Systems39, 1, Article 3 (Jan. 2014), 36 pages. https://doi.org/10.1145/2514689

  22. [22]

    Jakub Konečn`y, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527(2016)

  23. [23]

    Jakub Konečn `y, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency.arXiv preprint arXiv:1610.05492(2016)

  24. [24]

    Gerty J. L. M. Lensvelt-Mulders, Joop J. Hox, Peter G. M. van der Heijden, and Cora J. M. Maas. 2005. Meta-Analysis of Randomized Response Research: Thirty- Five Years of Validation.Sociological Methods Research33, 3 (Feb. 2005), 319–348. https://doi.org/10.1177/0049124104268664

  25. [25]

    Jiachun Liao, Oliver Kosut, Lalitha Sankar, and Flavio du Pin Calmon. 2019. Tunable Measures for Information Leakage and Applications to Privacy-Utility Tradeoffs.IEEE Transactions on Information Theory65, 12 (Dec. 2019), 8043–8066. https://doi.org/10.1109/tit.2019.2935768

  26. [26]

    Ali Makhdoumi, Salman Salamatian, Nadia Fawaz, and Muriel Médard. 2014. From the Information Bottleneck to the Privacy Funnel. In2014 IEEE Information Theory Workshop (ITW 2014). 501–505. https://doi.org/10.1109/ITW.2014.6970882

  27. [27]

    I. Mironov. 2017. Rényi Differential Privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF). 263–275

  28. [28]

    Theshani Nuradha and Ziv Goldfeld. 2023. Pufferfish Privacy: An Information- Theoretic Study.IEEE Transactions on Information Theory69, 11 (Nov. 2023), 7336–7356. https://doi.org/10.1109/tit.2023.3296288

  29. [29]

    Clément Pierquin, Aurélien Bellet, Marc Tommasi, and Matthieu Boussard

  30. [30]

    InInterna- tional Conference on Machine Learning (ICML 2024)

    Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration via Shift Reduction Lemmas. InInterna- tional Conference on Machine Learning (ICML 2024). Vienna (Austria), Austria. https://inria.hal.science/hal-04363020

  31. [31]

    Filippo Santambrogio. 2015. Optimal transport for applied mathematicians. Birkäuser, NY55, 58-63 (2015), 94

  32. [32]

    2009.On the Foundations of Quantitative Information Flow

    Geoffrey Smith. 2009.On the Foundations of Quantitative Information Flow. Springer Berlin Heidelberg, 288–302. https://doi.org/10.1007/978-3-642-00596- 1_21

  33. [33]

    Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. 2017. Pufferfish Privacy Mechanisms for Correlated Data. InProceedings of the 2017 ACM International Conference on Management of Data(Chicago, Illinois, USA). New York, NY, USA, 1291–1306

  34. [34]

    2003.An introduction to numerical analysis

    Endre Süli and David F Mayers. 2003.An introduction to numerical analysis. Cambridge university press

  35. [35]

    2009.Optimal transport: old and new

    Cédric Villani. 2009.Optimal transport: old and new. Vol. 338. Springer

  36. [36]

    Larry Wasserman and Shuheng Zhou. 2010. A Statistical Framework for Differential Privacy.J. Amer. Statist. Assoc.105, 489 (March 2010), 375–389. https://doi.org/10.1198/jasa.2009.tm08651

  37. [37]

    Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. 2020. Attribute Privacy: Framework and Mechanisms.arXiv preprint arXiv:2009.04013(2020)

  38. [38]

    Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. 2022. Attribute Privacy: Framework and Mechanisms. In2022 ACM Conference on Fairness Ac- countability and Transparency (FAccT ’22). ACM, 757–766. https://doi.org/10. 1145/3531146.3533139 13