Data Sharing with Endogenous Choices over Differential Privacy Levels

Annuo Zhao; Diptangshu Sen; Juba Ziani; Kate Donahue; Raef Bassily

arxiv: 2602.09357 · v2 · pith:Z7IPPZEHnew · submitted 2026-02-10 · 💻 cs.GT · cs.CR

Data Sharing with Endogenous Choices over Differential Privacy Levels

Raef Bassily , Kate Donahue , Diptangshu Sen , Annuo Zhao , Juba Ziani This is my paper

Pith reviewed 2026-05-21 14:26 UTC · model grok-4.3

classification 💻 cs.GT cs.CR

keywords differential privacydata sharingmechanism designgame theoryendogenous choicesprivacy regimessocial welfareestimator accuracy

0 comments

The pith

A partially decentralized mechanism for data sharing under differential privacy achieves efficiency within constant factors by having a central designer fix the privacy noise level while players decide on participation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When individuals with heterogeneous privacy preferences decide both whether to join a data-sharing coalition and how much local noise to add under differential privacy, fully decentralized choices produce equilibria with low participation and excessive noise, yielding poor social welfare and inaccurate aggregate estimates. The paper shows this inefficiency persists across multiple privacy-cost regimes in large populations when compared to a centralized social optimum. It then establishes that a simple partially decentralized mechanism, in which a central designer sets one uniform privacy noise level while players retain the right to opt in or out, narrows the gap in welfare and accuracy down to constant factors. This matters for building viable large-scale data coalitions because it shows how limited coordination on privacy standards can avoid the collapse that full autonomy tends to produce.

Core claim

In large-population games of data sharing with differential privacy, fully decentralized mechanisms in which each player endogenously chooses both participation and local privacy noise level produce equilibria that are highly inefficient in social welfare and estimator accuracy relative to the socially optimal benchmark. A partially decentralized mechanism, where players retain participation agency but a central designer chooses a fixed privacy noise level for everyone, closes this efficiency gap down to constant factors across all privacy-cost regimes.

What carries the argument

The partially decentralized mechanism with central selection of a uniform privacy noise level, which coordinates the privacy-utility trade-off for the coalition while preserving individual participation choices to mitigate strategic externalities.

Load-bearing premise

The model assumes that privacy choices induce a fundamental trade-off between individual privacy costs and data utility or statistical accuracy for the coalition, and that these choices generate strategic externalities across players in a large-population game.

What would settle it

A calculation or simulation in which the ratio of social welfare under the partially decentralized mechanism to the welfare of the socially optimal benchmark grows without bound as population size increases in any single privacy-cost regime would falsify the constant-factor claim.

read the original abstract

Motivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Full decentralization is inefficient but a partially decentralized mechanism with central fixed noise achieves constant-factor efficiency, though the constants may depend on cost distribution assumptions.

read the letter

The punchline is that full decentralization in this data sharing game leads to poor participation and accuracy, but switching to a partially decentralized mechanism where a central party sets a single privacy noise level while players decide whether to join gets the outcomes within constant factors of the best possible, across the privacy cost regimes they study. The paper sets up a large population game where each player has a private cost for privacy loss. They choose to participate or not and, in the full version, how much noise to add to their data. The coalition's statistical accuracy depends on the number of participants and the noise levels chosen. This creates externalities because one player's choice affects the accuracy everyone gets. They characterize the equilibria and compare them to the socially optimal benchmark that maximizes total welfare. What is new is the side-by-side analysis of the fully decentralized case against the partially decentralized one with endogenous participation but fixed central noise. They do this for several regimes that correspond to different ways of modeling the privacy cost, like different attack models in differential privacy. The result that the partial mechanism closes the efficiency gap to constants is the main claim. The work is clear on the model and the practical motivation from decentralized data sharing and federated learning. It gives credit to prior work on DP mechanisms and extends it by making the privacy levels endogenous. The soft spots are around the details of the proofs. The abstract states the constant factor results but does not include any sketch of how they bound the ratios or what properties of the cost distribution are used. The stress-test note raises a fair point: if costs are heterogeneous, the participation decision can have a sharp threshold where the player's cost equals their share of the accuracy. A fixed ε chosen without knowing the exact distribution could then lead to participation rates that are far from optimal, potentially making the welfare ratio unbounded rather than constant. If the paper proves the constants hold for arbitrary distributions or shows how the designer can pick ε based on observable information, that would address this. Otherwise, it might be a limitation for real-world application where distributions are unknown. This paper is aimed at researchers in algorithmic game theory and privacy, particularly those working on mechanism design for data coalitions. A reader who wants to understand the value of limited centralization in privacy settings would find it useful. It deserves serious peer review because the modeling is clean, the question is relevant, and the claims are specific enough to be checked, even if the current version might need more on the derivations.

Referee Report

1 major / 1 minor

Summary. The paper models large-population data-sharing games under differential privacy where agents have heterogeneous privacy costs. In the fully decentralized setting, each agent chooses both participation and a local privacy parameter ε_i, generating strategic externalities that lead to inefficient equilibrium participation and noise levels relative to the social planner's benchmark. The central contribution is the analysis of a partially decentralized mechanism in which a designer selects a single noise level ε for the entire coalition while agents retain only the binary participation decision; the authors claim that this mechanism recovers constant-factor approximations to both social welfare and statistical accuracy across all privacy-cost regimes corresponding to different DP attack models.

Significance. If the constant-factor closure result holds under the stated heterogeneity, the work supplies a clean theoretical justification for limited central coordination in privacy-sensitive data markets. It isolates the source of inefficiency to the endogenous choice of noise levels and shows that removing that degree of freedom while preserving participation autonomy suffices for bounded approximation ratios, which is a useful design insight for platforms that must accommodate voluntary data contribution.

major comments (1)

[analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.

minor comments (1)

Clarify the exact functional form of the statistical accuracy as a function of participation mass and ε; the current description leaves open whether the accuracy term is derived from a specific estimator or is a generic decreasing function of ε.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. Below we respond point-by-point to the major comment on the robustness of the constant-factor guarantees for the partially decentralized mechanism.

read point-by-point responses

Referee: [analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.

Authors: We thank the referee for highlighting this subtlety. In the model, the designer is assumed to know the cost distribution when choosing the common ε; this is the natural information structure in a mechanism-design setting where the designer optimizes the shared noise level. Under this assumption the equilibrium participation threshold is exactly the point at which individual cost equals marginal accuracy benefit, and the constant-factor bounds on welfare and accuracy are derived in the large-population limit and hold uniformly across all privacy-cost regimes. These constants are independent of the particular functional form or parameters of the cost distribution (they arise from the concavity/convexity properties of the accuracy function and the worst-case regime-specific privacy-utility trade-offs). For distributions with unbounded support the participation mass still converges to a value whose ratio to the planner’s optimum remains bounded, because accuracy is Lipschitz in total mass. Nevertheless, to make the assumption and the distribution-independence explicit, we will add a short clarifying subsection (and a corresponding remark in the main theorem statement) that states the known-distribution assumption and notes that the approximation ratios are invariant to the specific distribution. We will also include a brief discussion of robustness to small distribution misspecification, showing that the ratios degrade continuously rather than becoming unbounded. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained from explicit game model

full rationale

The paper constructs an explicit large-population game with heterogeneous privacy costs, participation decisions, and accuracy externalities defined via differential privacy parameters. Equilibria under full decentralization and the partially decentralized mechanism are derived directly from best-response conditions and welfare comparisons to a centralized benchmark within the same model. No step reduces a claimed result to a fitted parameter, self-definition, or self-citation chain; all outcomes follow from the stated assumptions and equations without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Analysis rests on standard assumptions from game theory and differential privacy applied to data sharing coalitions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Players have heterogeneous privacy preferences and make strategic participation and noise-level decisions that create externalities on coalition utility.
Explicitly stated as motivation for the model and the source of strategic behavior.

pith-pipeline@v0.9.0 · 5744 in / 1185 out tokens · 65761 ms · 2026-05-21T14:26:00.741770+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J is the unique calibrated reciprocal cost) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study coalition formation for data sharing under differential privacy when agents have heterogeneous privacy costs. Each agent ... decides whether to participate ... and how much noise to add ... Privacy choices induce a fundamental trade-off ... These choices generate externalities ...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

f(|S|) = |S|^α where α ∈ [−1,1] ... three regimes ... local information-theoretic DP, privacy amplification in federated learning, fully adversarial models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Limits of Personalizing Differential Privacy Budgets
cs.CR 2026-05 unverdicted novelty 6.0

For mean estimation, a simple thresholding operator on privacy budgets matches the performance of fully personalized differential privacy mechanisms up to constant factors.