pith. sign in

arxiv: 2602.09357 · v2 · pith:Z7IPPZEHnew · submitted 2026-02-10 · 💻 cs.GT · cs.CR

Data Sharing with Endogenous Choices over Differential Privacy Levels

Pith reviewed 2026-05-21 14:26 UTC · model grok-4.3

classification 💻 cs.GT cs.CR
keywords differential privacydata sharingmechanism designgame theoryendogenous choicesprivacy regimessocial welfareestimator accuracy
0
0 comments X

The pith

A partially decentralized mechanism for data sharing under differential privacy achieves efficiency within constant factors by having a central designer fix the privacy noise level while players decide on participation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When individuals with heterogeneous privacy preferences decide both whether to join a data-sharing coalition and how much local noise to add under differential privacy, fully decentralized choices produce equilibria with low participation and excessive noise, yielding poor social welfare and inaccurate aggregate estimates. The paper shows this inefficiency persists across multiple privacy-cost regimes in large populations when compared to a centralized social optimum. It then establishes that a simple partially decentralized mechanism, in which a central designer sets one uniform privacy noise level while players retain the right to opt in or out, narrows the gap in welfare and accuracy down to constant factors. This matters for building viable large-scale data coalitions because it shows how limited coordination on privacy standards can avoid the collapse that full autonomy tends to produce.

Core claim

In large-population games of data sharing with differential privacy, fully decentralized mechanisms in which each player endogenously chooses both participation and local privacy noise level produce equilibria that are highly inefficient in social welfare and estimator accuracy relative to the socially optimal benchmark. A partially decentralized mechanism, where players retain participation agency but a central designer chooses a fixed privacy noise level for everyone, closes this efficiency gap down to constant factors across all privacy-cost regimes.

What carries the argument

The partially decentralized mechanism with central selection of a uniform privacy noise level, which coordinates the privacy-utility trade-off for the coalition while preserving individual participation choices to mitigate strategic externalities.

Load-bearing premise

The model assumes that privacy choices induce a fundamental trade-off between individual privacy costs and data utility or statistical accuracy for the coalition, and that these choices generate strategic externalities across players in a large-population game.

What would settle it

A calculation or simulation in which the ratio of social welfare under the partially decentralized mechanism to the welfare of the socially optimal benchmark grows without bound as population size increases in any single privacy-cost regime would falsify the constant-factor claim.

read the original abstract

Motivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper models large-population data-sharing games under differential privacy where agents have heterogeneous privacy costs. In the fully decentralized setting, each agent chooses both participation and a local privacy parameter ε_i, generating strategic externalities that lead to inefficient equilibrium participation and noise levels relative to the social planner's benchmark. The central contribution is the analysis of a partially decentralized mechanism in which a designer selects a single noise level ε for the entire coalition while agents retain only the binary participation decision; the authors claim that this mechanism recovers constant-factor approximations to both social welfare and statistical accuracy across all privacy-cost regimes corresponding to different DP attack models.

Significance. If the constant-factor closure result holds under the stated heterogeneity, the work supplies a clean theoretical justification for limited central coordination in privacy-sensitive data markets. It isolates the source of inefficiency to the endogenous choice of noise levels and shows that removing that degree of freedom while preserving participation autonomy suffices for bounded approximation ratios, which is a useful design insight for platforms that must accommodate voluntary data contribution.

major comments (1)
  1. [analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.
minor comments (1)
  1. Clarify the exact functional form of the statistical accuracy as a function of participation mass and ε; the current description leaves open whether the accuracy term is derived from a specific estimator or is a generic decreasing function of ε.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. Below we respond point-by-point to the major comment on the robustness of the constant-factor guarantees for the partially decentralized mechanism.

read point-by-point responses
  1. Referee: [analysis of partially decentralized mechanism] The skeptic's concern about distribution-independent constant factors appears to be a load-bearing issue for the main claim. In the analysis of the partially decentralized mechanism, the equilibrium participation mass is determined by the threshold at which an agent's privacy cost equals its share of the coalition's accuracy (which depends on total participation and the fixed ε). If the cost distribution has unbounded support or the designer must select ε without exact knowledge of the distribution, the resulting participation rate can deviate arbitrarily from the socially optimal mass, potentially making the welfare and accuracy ratios unbounded rather than constant. This needs an explicit robustness statement or a worst-case bound over cost distributions.

    Authors: We thank the referee for highlighting this subtlety. In the model, the designer is assumed to know the cost distribution when choosing the common ε; this is the natural information structure in a mechanism-design setting where the designer optimizes the shared noise level. Under this assumption the equilibrium participation threshold is exactly the point at which individual cost equals marginal accuracy benefit, and the constant-factor bounds on welfare and accuracy are derived in the large-population limit and hold uniformly across all privacy-cost regimes. These constants are independent of the particular functional form or parameters of the cost distribution (they arise from the concavity/convexity properties of the accuracy function and the worst-case regime-specific privacy-utility trade-offs). For distributions with unbounded support the participation mass still converges to a value whose ratio to the planner’s optimum remains bounded, because accuracy is Lipschitz in total mass. Nevertheless, to make the assumption and the distribution-independence explicit, we will add a short clarifying subsection (and a corresponding remark in the main theorem statement) that states the known-distribution assumption and notes that the approximation ratios are invariant to the specific distribution. We will also include a brief discussion of robustness to small distribution misspecification, showing that the ratios degrade continuously rather than becoming unbounded. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained from explicit game model

full rationale

The paper constructs an explicit large-population game with heterogeneous privacy costs, participation decisions, and accuracy externalities defined via differential privacy parameters. Equilibria under full decentralization and the partially decentralized mechanism are derived directly from best-response conditions and welfare comparisons to a centralized benchmark within the same model. No step reduces a claimed result to a fitted parameter, self-definition, or self-citation chain; all outcomes follow from the stated assumptions and equations without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Analysis rests on standard assumptions from game theory and differential privacy applied to data sharing coalitions; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Players have heterogeneous privacy preferences and make strategic participation and noise-level decisions that create externalities on coalition utility.
    Explicitly stated as motivation for the model and the source of strategic behavior.

pith-pipeline@v0.9.0 · 5744 in / 1185 out tokens · 65761 ms · 2026-05-21T14:26:00.741770+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Limits of Personalizing Differential Privacy Budgets

    cs.CR 2026-05 unverdicted novelty 6.0

    For mean estimation, a simple thresholding operator on privacy budgets matches the performance of fully personalized differential privacy mechanisms up to constant factors.