Data Sharing with Endogenous Choices over Differential Privacy Levels
Pith reviewed 2026-05-21 14:26 UTC · model grok-4.3
The pith
A partially decentralized mechanism for data sharing under differential privacy achieves efficiency within constant factors by having a central designer fix the privacy noise level while players decide on participation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In large-population games of data sharing with differential privacy, fully decentralized mechanisms in which each player endogenously chooses both participation and local privacy noise level produce equilibria that are highly inefficient in social welfare and estimator accuracy relative to the socially optimal benchmark. A partially decentralized mechanism, where players retain participation agency but a central designer chooses a fixed privacy noise level for everyone, closes this efficiency gap down to constant factors across all privacy-cost regimes.
What carries the argument
The partially decentralized mechanism with central selection of a uniform privacy noise level, which coordinates the privacy-utility trade-off for the coalition while preserving individual participation choices to mitigate strategic externalities.
Load-bearing premise
The model assumes that privacy choices induce a fundamental trade-off between individual privacy costs and data utility or statistical accuracy for the coalition, and that these choices generate strategic externalities across players in a large-population game.
What would settle it
A calculation or simulation in which the ratio of social welfare under the partially decentralized mechanism to the welfare of the socially optimal benchmark grows without bound as population size increases in any single privacy-cost regime would falsify the constant-factor claim.
read the original abstract
Motivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper models large-population data-sharing games under differential privacy where agents have heterogeneous privacy costs. In the fully decentralized setting, each agent chooses both participation and a local privacy parameter ε_i, generating strategic externalities that lead to inefficient equilibrium participation and noise levels relative to the social planner's benchmark. The central contribution is the analysis of a partially decentralized mechanism in which a designer selects a single noise level ε for the entire coalition while agents retain only the binary participation decision; the authors claim that this mechanism recovers constant-factor approximations to both social welfare and statistical accuracy across all privacy-cost regimes corresponding to different DP attack models.
Significance. If the constant-factor closure result holds under the stated heterogeneity, the work supplies a clean theoretical justification for limited central coordination in privacy-sensitive data markets. It isolates the source of inefficiency to the endogenous choice of noise levels and shows that removing that degree of freedom while preserving participation autonomy suffices for bounded approximation ratios, which is a useful design insight for platforms that must accommodate voluntary data contribution.
major comments (1)
- [analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.
minor comments (1)
- Clarify the exact functional form of the statistical accuracy as a function of participation mass and ε; the current description leaves open whether the accuracy term is derived from a specific estimator or is a generic decreasing function of ε.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. Below we respond point-by-point to the major comment on the robustness of the constant-factor guarantees for the partially decentralized mechanism.
read point-by-point responses
-
Referee: [analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.
Authors: We thank the referee for highlighting this subtlety. In the model, the designer is assumed to know the cost distribution when choosing the common ε; this is the natural information structure in a mechanism-design setting where the designer optimizes the shared noise level. Under this assumption the equilibrium participation threshold is exactly the point at which individual cost equals marginal accuracy benefit, and the constant-factor bounds on welfare and accuracy are derived in the large-population limit and hold uniformly across all privacy-cost regimes. These constants are independent of the particular functional form or parameters of the cost distribution (they arise from the concavity/convexity properties of the accuracy function and the worst-case regime-specific privacy-utility trade-offs). For distributions with unbounded support the participation mass still converges to a value whose ratio to the planner’s optimum remains bounded, because accuracy is Lipschitz in total mass. Nevertheless, to make the assumption and the distribution-independence explicit, we will add a short clarifying subsection (and a corresponding remark in the main theorem statement) that states the known-distribution assumption and notes that the approximation ratios are invariant to the specific distribution. We will also include a brief discussion of robustness to small distribution misspecification, showing that the ratios degrade continuously rather than becoming unbounded. revision: yes
Circularity Check
No circularity; derivation is self-contained from explicit game model
full rationale
The paper constructs an explicit large-population game with heterogeneous privacy costs, participation decisions, and accuracy externalities defined via differential privacy parameters. Equilibria under full decentralization and the partially decentralized mechanism are derived directly from best-response conditions and welfare comparisons to a centralized benchmark within the same model. No step reduces a claimed result to a fitted parameter, self-definition, or self-citation chain; all outcomes follow from the stated assumptions and equations without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Players have heterogeneous privacy preferences and make strategic participation and noise-level decisions that create externalities on coalition utility.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J is the unique calibrated reciprocal cost) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent ... decides whether to participate ... and how much noise to add ... Privacy choices induce a fundamental trade-off ... These choices generate externalities ...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
f(|S|) = |S|^α where α ∈ [−1,1] ... three regimes ... local information-theoretic DP, privacy amplification in federated learning, fully adversarial models
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Limits of Personalizing Differential Privacy Budgets
For mean estimation, a simple thresholding operator on privacy budgets matches the performance of fully personalized differential privacy mechanisms up to constant factors.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.