pith. sign in

arxiv: 2503.04980 · v1 · submitted 2025-03-06 · 💻 cs.CR · cs.AI

A Consensus Privacy Metrics Framework for Synthetic Data

Pith reviewed 2026-05-23 00:35 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords synthetic dataprivacy evaluationmembership disclosureattribute disclosuredifferential privacyexpert consensusdata sharing
0
0 comments X

The pith

Expert consensus produces a framework that recommends metrics for membership and attribute disclosures in synthetic data while discouraging similarity metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for evaluating privacy in synthetic data using an expert panel and consensus process. Current similarity metrics are found to fail at measuring identity disclosure and their use is discouraged. For differentially private synthetic data, only privacy budgets close to zero are considered interpretable. Consensus highlights the importance of membership and attribute disclosure, which allow inferring personal information without revealing identity. The framework gives precise recommendations for metrics addressing these disclosures to help meet privacy legislation requirements.

Core claim

The authors create a privacy metrics framework for synthetic data through expert consensus. The framework advises against using similarity metrics for identity disclosure and deems non-zero privacy budgets uninterpretable in differential privacy settings. It prioritizes metrics for membership and attribute disclosures to assess risks of inferring personal information.

What carries the argument

The consensus-derived framework of privacy metrics that targets membership and attribute disclosures rather than similarity-based identity measures.

If this is right

  • Similarity metrics are not suitable for assessing identity disclosure in synthetic data.
  • Privacy budgets close to zero are required for interpretability in differentially private synthetic data.
  • Metrics for membership and attribute disclosures provide effective ways to evaluate privacy risks.
  • Adoption of the framework can support compliance with data protection laws.
  • Future research is needed to refine these metrics for broader use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework's metrics could be validated through application to diverse synthetic data generation techniques.
  • Regulatory bodies might incorporate these recommendations into guidelines for synthetic data sharing.
  • Automated tools could implement these metrics to evaluate synthetic datasets in practice.
  • Links between this framework and existing privacy standards in other fields like statistics could be investigated.

Load-bearing premise

The opinions gathered from the expert panel through the consensus process correctly identify the best metrics for privacy protection in synthetic data regardless of generation method or context.

What would settle it

Empirical evidence that similarity metrics can reliably detect identity disclosure risks in multiple synthetic datasets would falsify the discouragement of their use.

Figures

Figures reproduced from arXiv: 2503.04980 by Bradley Malin, Chao Yan, Fabian Prasser, Fida K. Dankar, Jean Louis Raisaro, Jorg Drechsler, Josep Domingo-Ferrer, Khaled El Emam, Krishnamurty Muralidhar, Linglong Kong, Lisa Pilgram, Mark Elliot, Murat Kantarcioglu, Paul Francis, Puja Myles.

Figure 1
Figure 1. Figure 1: Study Process. Four literature reviews on the evaluation of synthetic data served as a starting point to identify commonly used privacy metrics [36,40,41,52]. Their primary literature was reviewed, and various additional simulation experiments conducted to better understand their behaviour. Panelists and regulators were invited to comment on the report (round 0). Statements were then created from the repor… view at source ↗
Figure 2
Figure 2. Figure 2: Consensus Measurement. Consensus was measured after stability of all statements was achieved. The measurement included two steps: (1) whether consensus and (2) whether agreement is achieved. Four possible outcomes could be expected. IQR: Interquartile range. 3. Results Stability of responses was achieved after the third Delphi round. Consensus and agreement were then analyzed. In most statements (10/11), t… view at source ↗
Figure 3
Figure 3. Figure 3: Consensus Measurement. The 11 recommendations of the third round were assessed for consensus and agreement. The recommendations are shown in [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Membership Disclosure when Varying Number of Attributes for Matching. Membership disclosure was simulated by drawing an attack dataset from the same population as the training dataset. For each population, the F1 score is reported as the membership metric, calculated when varying the number of variables used in the attack. Maximum F1 score is the maximum value from all combinations for the respective numbe… view at source ↗
Figure 5
Figure 5. Figure 5: Different Adversary’s Attack Datasets. The original data consisted of 1,000 people with HIV. It was randomly drawn from the population of young people with HIV in Ottawa (2,172) highlighted in grey. The attack dataset of 1,000 people was drawn from the same population as the original data, so from young people with HIV in Ottawa (A), or from all young people in Ottawa (B). The population of young people in… view at source ↗
Figure 6
Figure 6. Figure 6: Practical Guidance to Calculate Membership Disclosure Vulnerability. [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Practical Guidance to Calculate Attribute Disclosure Vulnerability. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
read the original abstract

Synthetic data generation is one approach for sharing individual-level data. However, to meet legislative requirements, it is necessary to demonstrate that the individuals' privacy is adequately protected. There is no consolidated standard for measuring privacy in synthetic data. Through an expert panel and consensus process, we developed a framework for evaluating privacy in synthetic data. Our findings indicate that current similarity metrics fail to measure identity disclosure, and their use is discouraged. For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable. There was consensus on the importance of membership and attribute disclosure, both of which involve inferring personal information about an individual without necessarily revealing their identity. The resultant framework provides precise recommendations for metrics that address these types of disclosures effectively. Our findings further present specific opportunities for future research that can help with widespread adoption of synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper reports the outcomes of an expert panel and consensus process to develop a framework for privacy metrics in synthetic data. Key findings include discouraging similarity metrics for identity disclosure, deeming non-zero differential privacy budgets uninterpretable, and endorsing a focus on membership and attribute disclosure metrics, with the framework offering precise recommendations and identifying future research needs.

Significance. If the consensus process is representative and the metric recommendations prove robust, the framework could help establish a needed standard for demonstrating privacy protection in synthetic data releases, supporting legislative compliance. The structured use of expert consensus is a positive element for incorporating domain knowledge, though the lack of any empirical validation or cross-context testing of the recommendations limits the immediate impact.

major comments (2)
  1. [Abstract] The central claim that the framework supplies precise metric recommendations that effectively address membership and attribute disclosures rests solely on reported expert consensus without empirical validation, formal analysis, or comparison showing these metrics bound disclosure risk better than alternatives (Abstract, findings paragraph).
  2. [Methods/Consensus process description] The manuscript provides no details on the expert panel composition, selection criteria, number of participants, or exact consensus procedures (e.g., voting thresholds or disagreement resolution), which is load-bearing for assessing whether the reported recommendations are generalizable across synthetic data methods, datasets, and regulatory contexts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below with proposed revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The central claim that the framework supplies precise metric recommendations that effectively address membership and attribute disclosures rests solely on reported expert consensus without empirical validation, formal analysis, or comparison showing these metrics bound disclosure risk better than alternatives (Abstract, findings paragraph).

    Authors: The paper reports outcomes from an expert consensus process to develop a framework, which is a recognized approach for establishing standards where empirical benchmarks do not yet exist. The abstract's claim refers to the framework's recommendations as derived from this consensus. We agree the abstract should be revised to explicitly note that the recommendations are consensus-based rather than empirically validated, and to reference the future research needs section that calls for such validation and comparisons. revision: yes

  2. Referee: [Methods/Consensus process description] The manuscript provides no details on the expert panel composition, selection criteria, number of participants, or exact consensus procedures (e.g., voting thresholds or disagreement resolution), which is load-bearing for assessing whether the reported recommendations are generalizable across synthetic data methods, datasets, and regulatory contexts.

    Authors: We acknowledge the methods section lacks these details. The revised manuscript will expand the description of the consensus process to include the expert panel composition, selection criteria, number of participants, and exact procedures such as voting thresholds and disagreement resolution. This addition will strengthen the assessment of generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: framework rests on external expert consensus, not self-referential derivation

full rationale

The paper constructs its privacy metrics framework exclusively through an external expert panel and consensus process, with no equations, fitted parameters, predictions, or derivations present. Central claims (discouraging similarity metrics, limiting DP budgets to near-zero, endorsing membership/attribute disclosure focus) are reported outcomes of that panel rather than results derived from the paper's own inputs or prior self-citations. No load-bearing step reduces by construction to the paper's own definitions or fitted values; the consensus is treated as independent evidence. This matches the default non-circular case for consensus or survey papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on expert consensus rather than mathematical axioms or data-driven parameters; no free parameters, axioms, or invented entities are identifiable from the abstract.

pith-pipeline@v0.9.0 · 5724 in / 998 out tokens · 85802 ms · 2026-05-23T00:35:46.542784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    RAND Methodological Guidance for Conducting and Critically Appraising Delphi Panels,

    D. Khodyakov, S. Grant, J. Kroger, and M. Bauman, “RAND Methodological Guidance for Conducting and Critically Appraising Delphi Panels,” RAND Corporation, Dec. 2023. Accessed: Jan. 20, 2024. [Online]. Available: https://www.rand.org/pubs/tools/TLA3082-1.html

  2. [2]

    Qualitative research: standards, challenges, and guidelines,

    K. Malterud, “Qualitative research: standards, challenges, and guidelines,” Lancet, vol. 358, no. 9280, pp. 483–488, Aug. 2001, doi: 10.1016/S0140-6736(01)05627-6

  3. [3]

    Synthetic Data: Legal Implications of the Data-Generation Revolution,

    M. Gal and O. Lynskey, “Synthetic Data: Legal Implications of the Data-Generation Revolution,” Apr. 10, 2023, Rochester, NY: 4414385. doi: 10.2139/ssrn.4414385

  4. [4]

    Predictive privacy: towards an applied ethics of data analytics,

    R. Mühlhoff, “Predictive privacy: towards an applied ethics of data analytics,” Ethics Inf Technol, vol. 23, no. 4, pp. 675–690, Dec. 2021, doi: 10.1007/s10676-021-09606-x

  5. [5]

    From Group Privacy to Collective Privacy: Towards a New Dimension of Privacy and Data Protection in the Big Data Era,

    A. Mantelero, “From Group Privacy to Collective Privacy: Towards a New Dimension of Privacy and Data Protection in the Big Data Era,” in Group Privacy: New Challenges of Data Technologies, L. Taylor, L. Floridi, and B. van der Sloot, Eds., Cham: Springer International Publishing, 2017, pp. 139–

  6. [6]

    doi: 10.1007/978-3-319-46608-8_8

  7. [7]

    A Unified Framework for Quantifying Privacy Risk in Synthetic Data,

    M. Giomi, F. Boenisch, C. Wehmeyer, and B. Tasnádi, “A Unified Framework for Quantifying Privacy Risk in Synthetic Data,” Proceedings on Privacy Enhancing Technologies, 2023, Accessed: Nov. 28,

  8. [8]

    Available: https://petsymposium.org/popets/2023/popets-2023-0055.php

    [Online]. Available: https://petsymposium.org/popets/2023/popets-2023-0055.php

  9. [9]

    Interpreting area under the receiver operating characteristic curve,

    A. A. H. de Hond, E. W. Steyerberg, and B. van Calster, “Interpreting area under the receiver operating characteristic curve,” The Lancet Digital Health, vol. 4, no. 12, pp. e853–e855, Dec. 2022, doi: 10.1016/S2589-7500(22)00188-1

  10. [10]

    Fidelity and Privacy of Synthetic Medical Data,

    O. Mendelevitch and M. D. Lesh, “Fidelity and Privacy of Synthetic Medical Data,” arXiv:2101.08658 [cs], Jun. 2021, Accessed: Jul. 05, 2021. [Online]. Available: http://arxiv.org/abs/2101.08658

  11. [11]

    Validating A Membership Disclosure Metric For Synthetic Health Data,

    K. El Emam, L. Mosquera, and X. Fang, “Validating A Membership Disclosure Metric For Synthetic Health Data,” JAMIA Open, vol. 5, no. 4, p. ooac083, Dec. 2022

  12. [12]

    Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets,

    S. El Kababji et al., “Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets,” JCO Clin Cancer Inform, no. 7, p. e2300116, Sep. 2023, doi: 10.1200/CCI.23.00116