arxiv: 2512.18632 · v2 · submitted 2025-12-21 · 💻 cs.CR

Recognition: 1 theorem link

· Lean Theorem

Multi-user Pufferfish Privacy

Ni Ding , Songpei Lu , Wenjing Yang , Zijian Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 20:46 UTC · model grok-4.3

classification 💻 cs.CR

keywords multi-user pufferfish privacyLaplace noise calibrationWasserstein metricstatistical indistinguishabilitytabular databinary random variables

0 comments

The pith

Sufficient conditions from Wasserstein metrics enable Laplace noise calibration for Pufferfish privacy in multi-user aggregated queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives sufficient conditions to calibrate Laplace noise in aggregated queries from a multi-user system to achieve Pufferfish privacy. This guarantees statistical indistinguishability for four sets of secret pairs when a user changes their reported data, leaves the system, or gets replaced by another user with different randomness. The conditions come from the Kantorovich method using the Wasserstein metric of order 1. They apply to attaining privacy when adding or removing a class of users from tabular data. Privacy for an individual is conditioned only on that user's statistics, and for binary variables the conditions relax to allow less noise.

Core claim

Sufficient conditions are derived for attaining statistical indistinguishability on four sets of secret pairs in all scenarios of user data changes using the Kantorovich method with the Wasserstein metric of order 1. These can be applied to user addition or removal in tabular data, revealing that individual indifference is conditioned on the user's statistics only. For binary Bernoulli distributed random variables, the conditions can be relaxed to reduce noise and improve utility.

What carries the argument

Laplace noise calibration via the Kantorovich method and Wasserstein metric of order 1 to enforce Pufferfish privacy on secret pairs.

If this is right

The noise calibration works for scenarios where users leave and are replaced.
Privacy protection applies when adding or removing users from tabular data.
Indistinguishability depends solely on the statistics of the user in question.
Relaxed conditions for binary data reduce required noise levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This calibration technique might apply to other privacy definitions using similar distance metrics.
In practice, it could lead to more efficient privacy mechanisms in dynamic user databases.
Further relaxation might be possible for other discrete distributions beyond binary.

Load-bearing premise

The random variables reported by users have distributions that allow the Wasserstein metric to directly bound the privacy loss for the considered secret pairs.

What would settle it

If computing the Wasserstein distance between the distributions of secret pairs does not yield the expected bound on the privacy parameter when Laplace noise is added, the sufficient conditions would not hold.

Figures

Figures reproduced from arXiv: 2512.18632 by Ni Ding, Songpei Lu, Wenjing Yang, Zijian Zhang.

**Figure 2.** Figure 2: Assume the first three users in Table 2 and consider [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: For the experiment in Figure 3, we use Brent method [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Two values of 𝜃 produced by (13) and (14) in Theorem 3 for achieving 𝜖-pufferfish privacy on S𝑃,⊥ = {(𝑃𝑖 , ⊥𝑖)} on three data sets in the UCI machine learning repository, where 𝑃𝑖(·) = Pr(education = ·|race = ‘white’), 𝑃𝑖(·) = Pr(romatic = ·|higher = ‘yes’) and 𝑃𝑖(·) = Pr(marital = ·|loan = ‘yes’) for dataset adult, student performance and bank marketing, respectively. query answer 𝑓 (𝐷). Theorem below sho… view at source ↗

**Figure 6.** Figure 6: For the first three users in Table 2, consider adding [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 8.** Figure 8: For the same setting in Figure 7, we apply Lemma 2 [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Two values of 𝜃 produced by Lemma 1 and Lemma 2 for achieving 𝜖-pufferfish privacy on S𝑃,𝑄 = {(𝑃𝑖 , 𝑄𝑖)} on three datasets in the UCI machine learning repository. For adult, 𝑃𝑖(·) = Pr(relationship = ·|race = ‘Asian-Pac-Islander’) and 𝑄𝑖(·) = Pr(relationship = ·|race = ‘Amer-Indian-Eskimo’); For student performance, 𝑃𝑖(·) = Pr(romance = ·|freetime = 5) and 𝑄𝑖(·) = Pr(romance = ·|freetime = 2); For bank mar… view at source ↗

read the original abstract

This paper studies how to achieve individual indistinguishability by pufferfish privacy in aggregated query to a multi-user system. It is assumed that each user reports realization of a random variable. We study how to calibrate Laplace noise, added to the query answer, to attain pufferfish privacy when user changes his/her reported data value, leaves the system and is replaced by another use with different randomness. Sufficient conditions are derived for all scenarios for attaining statistical indistinguishability on four sets of secret pairs. They are derived using the existing Kantorovich method (Wasserstain metric of order $1$). These results can be applied to attain indistinguishability when a certain class of users is added or removed from a tabular data. It is revealed that attaining indifference in individual's data is conditioned on the statistics of this user only. For binary (Bernoulli distributed) random variables, the derived sufficient conditions can be further relaxed to reduce the noise and improve data utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives explicit Wasserstein-1 sufficient conditions for calibrating Laplace noise to get Pufferfish privacy under user join/leave/replace in aggregates, with a useful relaxation for Bernoulli variables.

read the letter

The main thing here is a set of sufficient conditions, obtained via the Kantorovich/Wasserstein-1 method, that tell you how much Laplace noise to add so the aggregate stays Pufferfish-private when one user changes their report, drops out, or gets swapped for someone else. The conditions are stated for four secret-pair sets and simplify for binary random variables, which cuts the noise and helps utility. That is the concrete new piece: prior Pufferfish work was mostly single-user or static, so this gives a direct calibration recipe for dynamic user sets in tabular aggregates. It is useful because the noise scale ends up depending only on the marginal statistics of the user in question rather than the whole joint distribution, at least under the stated assumptions. The derivations look clean on the surface and reuse a standard optimal-transport tool without introducing circularity or new fitted parameters. The soft spot is the replacement case. The argument that the aggregate distribution shift depends only on the replaced user's marginal requires the reports to be independent and the query to be linear. If users are correlated or the aggregate is nonlinear, the Wasserstein distance between the two output distributions will involve the joint law, and the transport plan may not separate as claimed. The abstract does not show the explicit verification for that joint case, so the full derivations need checking to confirm the claim holds or to see the hidden independence assumption. This is for researchers who already work with Pufferfish privacy or Laplace mechanisms on multi-user data and need a practical way to handle churn. A reader who wants checkable noise scales for those four scenarios will get something usable. It deserves peer review because the results are falsifiable, build directly on existing methods, and address a real gap in dynamic settings, even if the replacement independence point needs tightening.

Referee Report

2 major / 1 minor

Summary. The paper studies achieving individual indistinguishability via Pufferfish privacy for aggregated queries in multi-user systems. Assuming each user reports a realization of a random variable, it calibrates Laplace noise to attain privacy when a user changes data, leaves, or is replaced. Sufficient conditions are derived using the Kantorovich method (Wasserstein metric of order 1) for four sets of secret pairs. These apply to adding/removing users from tabular data, with relaxation for Bernoulli variables to reduce noise.

Significance. If the derived conditions hold and the independence assumptions are verified, this provides a practical method for calibrating noise in dynamic multi-user aggregates to achieve Pufferfish privacy based solely on individual user statistics. It extends existing Wasserstein-based approaches to user replacement scenarios and offers utility improvements for binary data, which is relevant for applications in collaborative sensing or data sharing where user sets change.

major comments (2)

Replacement scenario: The claim that statistical indifference depends only on the statistics of the replaced user requires that reports are independent and the aggregate query is linear (e.g., sum). The manuscript does not explicitly verify that the Kantorovich transport plan remains valid under the multi-user joint distribution, which could make the Wasserstein-1 distance depend on the full joint law rather than the marginal alone.
Sufficient conditions section: The abstract states that conditions are derived for all scenarios using the existing Kantorovich method, but no equation or derivation is shown to confirm that the resulting bounds support the indistinguishability claims without gaps when reports may be correlated.

minor comments (1)

The abstract contains a typo: 'Wasserstain' should be 'Wasserstein'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback. We address each major comment below and indicate the revisions planned for the manuscript.

read point-by-point responses

Referee: Replacement scenario: The claim that statistical indifference depends only on the statistics of the replaced user requires that reports are independent and the aggregate query is linear (e.g., sum). The manuscript does not explicitly verify that the Kantorovich transport plan remains valid under the multi-user joint distribution, which could make the Wasserstein-1 distance depend on the full joint law rather than the marginal alone.

Authors: We agree that the replacement scenario analysis relies on the assumptions of independent user reports and a linear aggregate query (e.g., summation). Under these conditions, the joint distribution factors such that the Wasserstein-1 distance between the aggregates depends only on the marginal distribution of the replaced user's report, allowing the Kantorovich transport plan to be constructed from individual statistics alone. We will add an explicit statement of these assumptions together with a brief verification of the transport plan reduction in the revised manuscript. revision: yes
Referee: Sufficient conditions section: The abstract states that conditions are derived for all scenarios using the existing Kantorovich method, but no equation or derivation is shown to confirm that the resulting bounds support the indistinguishability claims without gaps when reports may be correlated.

Authors: The sufficient conditions and Kantorovich derivations appear in Section 3 for the four secret-pair sets across all scenarios. These derivations implicitly rely on report independence, and we did not explicitly address correlated reports or include equations confirming the bounds in that case. We will revise the section to insert the intermediate equations showing the reduction to marginals under independence and to state clearly that the results assume independence (with a note that correlations fall outside the current scope). revision: yes

Circularity Check

0 steps flagged

No circularity: applies external Kantorovich/Wasserstein-1 method to derive conditions for new multi-user scenarios

full rationale

The paper derives sufficient conditions for pufferfish privacy under user addition/removal/replacement by directly invoking the existing Kantorovich duality for the Wasserstein-1 metric on the aggregate query output. This is an application of a standard external tool to four specific secret-pair sets; the resulting bounds are not obtained by fitting parameters to the target indistinguishability statement itself, nor by renaming a known result, nor by load-bearing self-citation. The derivation chain therefore remains self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; the ledger is therefore incomplete. The model assumes users report realizations of random variables whose distributions permit Wasserstein calibration. No free parameters or invented entities are explicitly named in the abstract.

free parameters (1)

Laplace noise scale
Calibrated via the derived sufficient conditions; exact functional form not visible in abstract.

axioms (2)

domain assumption Each user reports a realization of a random variable
Explicitly stated as the system model in the abstract.
domain assumption Privacy is defined over four specific sets of secret pairs
The indistinguishability targets are taken as given for the derivation.

pith-pipeline@v0.9.0 · 5458 in / 1457 out tokens · 42482 ms · 2026-05-16T20:46:30.797886+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

$\alpha$-Wasserstein Mechanism for R\'{e}nyi Pufferfish Privacy
cs.CR 2026-05 unverdicted novelty 7.0

The α-Wasserstein mechanism calibrates noise for exact Rényi Pufferfish Privacy by bounding the Wasserstein metric, generalizing the W_∞ pufferfish mechanism and Rényi differential privacy results.
R\'enyi Pufferfish Privacy with Gaussian-based Priors: From Single Gaussian to Mixture Model
cs.CR 2026-04 unverdicted novelty 6.0

Gaussian mechanisms for Rényi Pufferfish Privacy under Gaussian and mixture priors deliver exact divergence derivations, closed-form sufficient conditions, and 48.9% less noise than additive baselines on statistical a...

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 2 Pith papers · 2 internal anchors

[1]

Alvim, Kostas Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith

M’rio S. Alvim, Kostas Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith. 2012. Measuring Information Leakage Using Generalized Gain Functions. In2012 IEEE 25th Computer Security Foundations Symposium. IEEE, 265–279. https://doi.org/10.1109/csf.2012.26 12

work page doi:10.1109/csf.2012.26 2012
[2]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository https://archive.ics.uci.edu/ml/index.php. https://archive.ics.uci.edu/ml/index. php

work page 2007
[3]

R. P. Brent. 1971. An algorithm with guaranteed convergence for finding a zero of a function.Comput. J.14, 4 (April 1971), 422–425. https://doi.org/10.1093/ comjnl/14.4.422

work page 1971
[4]

Thierry Champion, Luigi De Pascale, and Petri Juutinen. 2008. The ∞- Wasserstein Distance: Local Solutions and Existence of Optimal Transport Maps. SIAM Journal on Mathematical Analysis40, 1 (2008), 1–20

work page 2008
[5]

Clarkson, Andrew C

Michael R. Clarkson, Andrew C. Myers, and Fred B. Schneider. 2009. Quantifying information flow with beliefs.Journal of Computer Security17, 5 (Oct. 2009), 655–701. https://doi.org/10.3233/jcs-2009-0353

work page doi:10.3233/jcs-2009-0353 2009
[6]

Henry Corrigan-Gibbs and Dan Boneh. 2017. Prio: Private, Robust, and Scalable Computation of Aggregate Statistics. In14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 259–282. https://www.usenix.org/conference/nsdi17/technical-sessions/ presentation/corrigan-gibbs

work page 2017
[7]

Paul Cuff and Lanqing Yu. 2016. Differential Privacy as a Mutual Information Constraint. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). ACM, 43–54. https://doi.org/10.1145/2976749. 2978308

work page doi:10.1145/2976749 2016
[8]

Luigi De Pascale and Jean Louet. 2019. A study of the dual problem of the one-dimensional 𝐿∞-optimal transport problem with applications.Journal of Functional Analysis276, 11 (2019), 3304–3324

work page 2019
[9]

Ni Ding. 2022. Kantorovich Mechanism for Pufferfish Privacy. InProceed- ings of The 25th International Conference on Artificial Intelligence and Statis- tics (Proceedings of Machine Learning Research), Gustau Camps-Valls, Fran- cisco J. R. Ruiz, and Isabel Valera (Eds.), Vol. 151. PMLR, 5084–5103. https: //proceedings.mlr.press/v151/ding22b.html

work page 2022
[10]

Ni Ding. 2024. Approximation of Pufferfish Privacy for Gaussian Priors.IEEE Transactions on Information Forensics and Security19 (2024), 5630–5640. https: //doi.org/10.1109/tifs.2024.3402104

work page doi:10.1109/tifs.2024.3402104 2024
[11]

Ni Ding, Yucheng Liu, and Farhad Farokhi. 2021. A Linear Reduction Method for Local Differential Privacy and Log-lift. In2021 IEEE International Symposium on Information Theory (ISIT). Melbourne, 551–556

work page 2021
[12]

Jinshuo Dong, Aaron Roth, and Weijie J. Su. 2022. Gaussian Differential Privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology84, 1 (Feb. 2022), 3–37. https://doi.org/10.1111/rssb.12454

work page doi:10.1111/rssb.12454 2022
[13]

Flávio du Pin Calmon and Nadia Fawaz. 2012. Privacy against statistical infer- ence. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton). 1401–1408. https://doi.org/10.1109/Allerton.2012.6483382

work page doi:10.1109/allerton.2012.6483382 2012
[14]

Duchi, Michael I

John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local Pri- vacy and Statistical Minimax Rates. In2013 IEEE 54th Annual Symposium on Foundations of Computer Science. 429–438. https://doi.org/10.1109/FOCS.2013.53

work page doi:10.1109/focs.2013.53 2013
[15]

Cynthia Dwork. 2006. Differential Privacy. InAutomata, Languages and Pro- gramming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–12

work page 2006
[16]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali- brating Noise to Sensitivity in Private Data Analysis. InTheory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265–284

work page 2006
[17]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ- ential privacy.Found. Trends Theor. Comput. Sci.9, 3-4 (2014), 211–407

work page 2014
[18]

I. Issa, A. B. Wagner, and S. Kamath. 2020. An Operational Approach to Infor- mation Leakage.IEEE Transactions on Information Theory66, 3 (March 2020), 1625–1657

work page 2020
[19]

Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. InProceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD/PODS ’11). ACM, 193–204. https://doi.org/10.1145/1989323. 1989345

work page doi:10.1145/1989323 2011
[20]

Daniel Kifer and Ashwin Machanavajjhala. 2012. A Rigorous and Customizable Framework for Privacy. InProceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems(Scottsdale, Arizona, USA)(PODS ’12). Association for Computing Machinery, New York, NY, USA, 77–88

work page 2012
[21]

Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A Framework for Mathematical Privacy Definitions.ACM Transactions on Database Systems39, 1, Article 3 (Jan. 2014), 36 pages. https://doi.org/10.1145/2514689

work page doi:10.1145/2514689 2014
[22]

Jakub Konečn`y, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

Jakub Konečn `y, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency.arXiv preprint arXiv:1610.05492(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

Gerty J. L. M. Lensvelt-Mulders, Joop J. Hox, Peter G. M. van der Heijden, and Cora J. M. Maas. 2005. Meta-Analysis of Randomized Response Research: Thirty- Five Years of Validation.Sociological Methods Research33, 3 (Feb. 2005), 319–348. https://doi.org/10.1177/0049124104268664

work page doi:10.1177/0049124104268664 2005
[25]

Jiachun Liao, Oliver Kosut, Lalitha Sankar, and Flavio du Pin Calmon. 2019. Tunable Measures for Information Leakage and Applications to Privacy-Utility Tradeoffs.IEEE Transactions on Information Theory65, 12 (Dec. 2019), 8043–8066. https://doi.org/10.1109/tit.2019.2935768

work page doi:10.1109/tit.2019.2935768 2019
[26]

Ali Makhdoumi, Salman Salamatian, Nadia Fawaz, and Muriel Médard. 2014. From the Information Bottleneck to the Privacy Funnel. In2014 IEEE Information Theory Workshop (ITW 2014). 501–505. https://doi.org/10.1109/ITW.2014.6970882

work page doi:10.1109/itw.2014.6970882 2014
[27]

I. Mironov. 2017. Rényi Differential Privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF). 263–275

work page 2017
[28]

Theshani Nuradha and Ziv Goldfeld. 2023. Pufferfish Privacy: An Information- Theoretic Study.IEEE Transactions on Information Theory69, 11 (Nov. 2023), 7336–7356. https://doi.org/10.1109/tit.2023.3296288

work page doi:10.1109/tit.2023.3296288 2023
[29]

Clément Pierquin, Aurélien Bellet, Marc Tommasi, and Matthieu Boussard

work page
[30]

InInterna- tional Conference on Machine Learning (ICML 2024)

Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration via Shift Reduction Lemmas. InInterna- tional Conference on Machine Learning (ICML 2024). Vienna (Austria), Austria. https://inria.hal.science/hal-04363020

work page 2024
[31]

Filippo Santambrogio. 2015. Optimal transport for applied mathematicians. Birkäuser, NY55, 58-63 (2015), 94

work page 2015
[32]

2009.On the Foundations of Quantitative Information Flow

Geoffrey Smith. 2009.On the Foundations of Quantitative Information Flow. Springer Berlin Heidelberg, 288–302. https://doi.org/10.1007/978-3-642-00596- 1_21

work page doi:10.1007/978-3-642-00596- 2009
[33]

Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. 2017. Pufferfish Privacy Mechanisms for Correlated Data. InProceedings of the 2017 ACM International Conference on Management of Data(Chicago, Illinois, USA). New York, NY, USA, 1291–1306

work page 2017
[34]

2003.An introduction to numerical analysis

Endre Süli and David F Mayers. 2003.An introduction to numerical analysis. Cambridge university press

work page 2003
[35]

2009.Optimal transport: old and new

Cédric Villani. 2009.Optimal transport: old and new. Vol. 338. Springer

work page 2009
[36]

Larry Wasserman and Shuheng Zhou. 2010. A Statistical Framework for Differential Privacy.J. Amer. Statist. Assoc.105, 489 (March 2010), 375–389. https://doi.org/10.1198/jasa.2009.tm08651

work page doi:10.1198/jasa.2009.tm08651 2010
[37]

Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. 2020. Attribute Privacy: Framework and Mechanisms.arXiv preprint arXiv:2009.04013(2020)

work page arXiv 2020
[38]

Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. 2022. Attribute Privacy: Framework and Mechanisms. In2022 ACM Conference on Fairness Ac- countability and Transparency (FAccT ’22). ACM, 757–766. https://doi.org/10. 1145/3531146.3533139 13

work page arXiv 2022