pith. sign in

arxiv: 2604.18352 · v1 · submitted 2026-04-20 · 💻 cs.CR · cs.AI· cs.LG

Tight Auditing of Differential Privacy in MST and AIM

Pith reviewed 2026-05-10 04:30 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords differential privacyprivacy auditingsynthetic dataMSTAIMGaussian differential privacyempirical privacytheory-practice gap
0
0 comments X

The pith

A GDP-based auditing framework delivers the first tight privacy measurements for MST and AIM in the strong-privacy regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Gaussian Differential Privacy auditing method that evaluates the full false-positive to false-negative tradeoff curve for two widely used synthetic data generators, MST and AIM. This approach moves beyond single-point bounds to provide precise empirical privacy estimates under worst-case conditions. When tested at strong privacy levels such as epsilon equal to 1 and delta of 0.01, the empirical mu value reaches approximately 0.43, close to the theoretical implied mu of 0.45. The small observed gap indicates these generators nearly achieve their stated privacy guarantees. The method supplies the first such tight audits in this regime.

Core claim

The authors establish that applying a GDP-based auditing framework, which measures privacy through the complete tradeoff curve, to MST and AIM under worst-case settings produces tight empirical privacy parameters that closely track the theoretical bounds, as shown by mu_emp approximately 0.43 versus implied mu of 0.45 for (epsilon, delta) equal to (1, 10 to the minus 2).

What carries the argument

Gaussian Differential Privacy (GDP) auditing framework that computes the full false-positive/false-negative privacy tradeoff curve.

Load-bearing premise

The GDP framework applied to worst-case settings of MST and AIM accurately measures their actual privacy guarantees.

What would settle it

Repeated empirical audits producing an mu value substantially larger or smaller than the implied theoretical mu, such as 0.60 instead of 0.45 for the (1, 10^{-2}) parameters, would show the audits are not tight.

Figures

Figures reproduced from arXiv: 2604.18352 by Bogdan Kulynych, Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai.

Figure 1
Figure 1. Figure 1: Empirical privacy tradeoff of MIA for MST/AIM compared to theoretical bounds under multiple DP [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Threshold selection on validation data. FPR, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Empirical privacy µemp for MST/AIM trained with varying orders of marginals, comparing Black-box and “Default” (hybrid Black/White-box) threat models. Red line is the implied µ derived via ρ-zCDP. A Auditing MST/AIM Trained with Higher-Order Marginals In this section, we relax the independent-marginal setting by training MST/AIM with higher-order marginals (2-way and 3-way), while keeping the dependency gr… view at source ↗
read the original abstract

State-of-the-art Differentially Private (DP) synthetic data generators such as MST and AIM are widely used, yet tightly auditing their privacy guarantees remains challenging. We introduce a Gaussian Differential Privacy (GDP)-based auditing framework that measures privacy via the full false-positive/false-negative tradeoff. Applied to MST and AIM under worst-case settings, our method provides the first tight audits in the strong-privacy regime. For $(\epsilon,\delta)=(1,10^{-2})$, we obtain $\mu_{emp}\approx0.43$ vs. implied $\mu=0.45$, showing a small theory-practice gap. Our code is publicly available: https://github.com/sassoftware/dpmm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces a Gaussian Differential Privacy (GDP)-based auditing framework for measuring the privacy guarantees of MST and AIM, two state-of-the-art differentially private synthetic data generators. The framework evaluates privacy via the full false-positive/false-negative tradeoff curve and is applied under worst-case settings to deliver tight audits in the strong-privacy regime. For the parameter pair (ε,δ)=(1,10^{-2}), it reports an empirical μ_emp≈0.43 against an implied theoretical μ=0.45, indicating a small theory-practice gap. Publicly available code is provided at https://github.com/sassoftware/dpmm.

Significance. If the auditing framework and empirical results hold, the work is significant for the DP community because it supplies the first tight, GDP-based audits of widely deployed synthetic data mechanisms in the strong-privacy regime, where prior methods were loose or inapplicable. The public release of the implementation code is a clear strength that supports reproducibility and enables independent verification or extension.

minor comments (3)
  1. The abstract states the central empirical result (μ_emp≈0.43 vs. implied μ=0.45) but does not indicate the number of trials, the exact worst-case data distribution used, or the precise GDP conversion formulas; adding these details to §3 or §4 would improve immediate clarity without altering the claims.
  2. Figure captions and axis labels should explicitly state whether the plotted curves are for the empirical audit or the theoretical GDP bound, and whether they are averaged over multiple runs, to avoid reader ambiguity.
  3. The manuscript would benefit from a short paragraph in the introduction or §2 contrasting the new GDP auditing approach with prior black-box or membership-inference audits of MST/AIM, including quantitative comparisons of tightness where available.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of our work and for recommending minor revision. We appreciate the recognition that our GDP-based auditing framework provides the first tight audits of MST and AIM in the strong-privacy regime, along with the value placed on the public code release.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a GDP-based auditing framework that computes empirical privacy (μ_emp) via full ROC tradeoff on MST and AIM under worst-case settings, then compares it to the theoretically implied μ value. This comparison is external: the empirical measurement is obtained by applying the framework to the mechanisms' outputs, not by fitting parameters to the target μ or by re-deriving the same quantity from itself. No self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation chain appears in the abstract or described method. The reported small gap (0.43 vs 0.45) constitutes an independent check rather than a tautology, and the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on the standard GDP assumption as a domain assumption for the new auditing approach.

axioms (1)
  • domain assumption The Gaussian Differential Privacy model accurately reflects the privacy-utility tradeoff for auditing purposes in MST and AIM
    This underpins the entire auditing framework introduced.

pith-pipeline@v0.9.0 · 5423 in / 1249 out tokens · 55133 ms · 2026-05-10T04:30:12.876578+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    What do you want from theory alone?

    Meenatchi Sundaram Muthu Selva Annamalai, Georgi Ganev, and Emiliano De Cristofaro. “What do you want from theory alone?” Experimenting with Tight Auditing of Differentially Private Synthetic Data Generation. InUSENIX Security, 2024

  2. [2]

    SoK: The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing

    Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, and Emiliano De Cristofaro. SoK: The Hitchhiker’s Guide to Efficient, End-to-End, and Tight DP Auditing. InIEEE SaTML, 2026

  3. [3]

    Three Variants of Differential Privacy: Lossless Conversion and Applications.IEEE JSAIT, 2021

    Shahab Asoodeh, Jiachun Liao, Flavio P Calmon, Oliver Kosut, and Lalitha Sankar. Three Variants of Differential Privacy: Lossless Conversion and Applications.IEEE JSAIT, 2021

  4. [4]

    Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

    Mark Bun and Thomas Steinke. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. InTCC, 2016

  5. [5]

    The Discrete Gaussian for Differ- ential Privacy

    Clément L Canonne, Gautam Kamath, and Thomas Steinke. The Discrete Gaussian for Differ- ential Privacy. InNeurIPS, 2020

  6. [6]

    Widespread Underestimation of Sensitivity in Differentially Private Libraries and How to Fix it

    Sílvia Casacuberta, Michael Shoemate, Salil Vadhan, and Connor Wagaman. Widespread Underestimation of Sensitivity in Differentially Private Libraries and How to Fix it. InACM CCS, 2022

  7. [7]

    Benchmarking Differentially Private Tabular Data Synthesis.PACMMOD, 2025

    Kai Chen, Xiaochen Li, Chen Gong, Ryan McKenna, and Tianhao Wang. Benchmarking Differentially Private Tabular Data Synthesis.PACMMOD, 2025

  8. [8]

    XGBoost: A Scalable Tree Boosting System

    Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. InACM KDD, 2016

  9. [9]

    Synthetic Data: Methods, Use Cases, and Risks.IEEE S&P Magazine, 2024

    Emiliano De Cristofaro. Synthetic Data: Methods, Use Cases, and Risks.IEEE S&P Magazine, 2024

  10. [10]

    Gaussian Differential Privacy.JRSSB, 2022

    Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian Differential Privacy.JRSSB, 2022

  11. [11]

    The Algorithmic Foundations of Differential Privacy.Founda- tions and Trends in Theoretical Computer Science, 2014

    Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy.Founda- tions and Trends in Theoretical Computer Science, 2014

  12. [12]

    Our data, ourselves: Privacy via distributed noise generation

    Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. InEuroCrypt, 2006

  13. [13]

    Calibrating Noise to Sensitivity in Private Data Analysis

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to Sensitivity in Private Data Analysis. InTCC, 2006

  14. [14]

    Robin Hood and Matthew Effects: Differential Privacy has Disparate Impact on Synthetic Data

    Georgi Ganev, Bristena Oprisanu, and Emiliano De Cristofaro. Robin Hood and Matthew Effects: Differential Privacy has Disparate Impact on Synthetic Data. InICML, 2022

  15. [15]

    Graphical vs

    Georgi Ganev, Kai Xu, and Emiliano De Cristofaro. Graphical vs. Deep Generative Models: Measuring the Impact of Differentially Private Mechanisms and Budgets on Utility. InACM CCS, 2024

  16. [16]

    The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, Debugging.TMLR, 2025

    Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, and Emiliano De Cristofaro. The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, Debugging.TMLR, 2025

  17. [17]

    The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

    Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, and Emiliano De Cristofaro. The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data. InACM CCS, 2025

  18. [18]

    Position: Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

    Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P Calmon, Jamie Hayes, Borja Balle, and Antti Honkela. Position: Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning. InIEEE SaTML, 2026

  19. [19]

    Precision-Based Attacks and Interval Refining: How to Break, then Fix, Differential Privacy on Finite Computers

    Samuel Haney, Damien Desfontaines, Luke Hartman, Ruchit Shrestha, and Michael Hay. Precision-Based Attacks and Interval Refining: How to Break, then Fix, Differential Privacy on Finite Computers. InTPDP, 2022

  20. [20]

    Logan: Membership Inference Attacks against Generative Models

    Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. Logan: Membership Inference Attacks against Generative Models. InPoPETs, 2019

  21. [21]

    TAPAS: A Toolbox for Adversarial Privacy Auditing of Synthetic Data

    Florimond Houssiau, James Jordon, Samuel N Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch. TAPAS: A Toolbox for Adversarial Privacy Auditing of Synthetic Data. InNeurIPS Workshop on SyntheticData4ML, 2022. 5

  22. [22]

    Mimic-iii, a freely accessible critical care database.Scientific data, 3(1):1–9, 2016a

    James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N Cohen, and Adrian Weller. Synthetic Data–What, Why and How? arXiv:2205.03257, 2022

  23. [23]

    Dimitrov, and Martin Vechev

    Johan Lokna, Anouk Paradis, Dimitar I. Dimitrov, and Martin Vechev. Group and Attack: Auditing Differential Privacy. InACM CCS, 2023

  24. [24]

    dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation

    Sofiane Mahiou, Amir Dizche, Reza Nazari, Xinmin Wu, Ralph Abbey, Jorge Silva, and Georgi Ganev. dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation. InTPDP, 2025

  25. [25]

    private-pgm.https://github.com/ryan112358/private-pgm, 2019

    Ryan McKenna. private-pgm.https://github.com/ryan112358/private-pgm, 2019

  26. [26]

    A Simple Recipe for Private Synthetic Data Generation

    Ryan McKenna and Terrance Liu. A Simple Recipe for Private Synthetic Data Generation. DifferentialPrivacy.org, 2022.https://differentialprivacy.org/synth-data-1/

  27. [27]

    Graphical-Model Based Estimation and Inference for Differential Privacy

    Ryan McKenna, Daniel Sheldon, and Gerome Miklau. Graphical-Model Based Estimation and Inference for Differential Privacy. InICML, 2019

  28. [28]

    Winning the NIST Contest: A Scalable and General Approach to Differentially Private Synthetic Data.JPC, 2021

    Ryan McKenna, Gerome Miklau, and Daniel Sheldon. Winning the NIST Contest: A Scalable and General Approach to Differentially Private Synthetic Data.JPC, 2021

  29. [29]

    AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data.PVLDB, 2022

    Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerome Miklau. AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data.PVLDB, 2022

  30. [30]

    Mechanism Design via Differential Privacy

    Frank McSherry and Kunal Talwar. Mechanism Design via Differential Privacy. InFOCS, 2007

  31. [31]

    Adver- sary Instantiation: Lower Bounds for Differentially Private Machine Learning

    Milad Nasr, Shuang Songi, Abhradeep Thakurta, Nicolas Papernot, and Nicholas Carlin. Adver- sary Instantiation: Lower Bounds for Differentially Private Machine Learning. InIEEE S&P, 2021

  32. [32]

    Tight Auditing of Differentially Private Machine Learning

    Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, and Andreas Terzis. Tight Auditing of Differentially Private Machine Learning. InUSENIX Security, 2023

  33. [33]

    2018 Differential Privacy Synthetic Data Challenge

    NIST. 2018 Differential Privacy Synthetic Data Challenge. https://www.nist.gov/ ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018- differential-privacy-synthetic, 2018

  34. [34]

    Synthesising the Linked 2011 Census and Deaths Dataset while Preserving its Confiden- tiality

    ONS. Synthesising the Linked 2011 Census and Deaths Dataset while Preserving its Confiden- tiality. https://datasciencecampus.ons.gov.uk/synthesising-the-linked-2011- census-and-deaths-dataset-while-preserving-its-confidentiality/, 2023

  35. [35]

    SmartNoise SDK: Tools for Differential Privacy on Tabular Data

    OpenDP. SmartNoise SDK: Tools for Differential Privacy on Tabular Data. https://github. com/opendp/smartnoise-sdk, 2021

  36. [36]

    Synthcity: A Benchmark Framework for Diverse Use Cases of Tabular Synthetic Data

    Zhaozhi Qian, Rob Davis, and Mihaela van der Schaar. Synthcity: A Benchmark Framework for Diverse Use Cases of Tabular Synthetic Data. InNeurIPS Datasets and Benchmarks Track, 2023.https://github.com/vanderschaarlab/synthcity

  37. [37]

    On Measures of Entropy and Information

    Alfréd Rényi. On Measures of Entropy and Information. InBerkeley Symposium on Mathemati- cal Statistics and Probability, 1961

  38. [38]

    Optimal Conversion from Rényi Differential Privacy to f-Differential Privacy

    Anneliese Riess, Juan Felipe Gomez, Flavio du Pin Calmon, Julia Anne Schnabel, and Geor- gios Kaissis. Optimal Conversion from Rényi Differential Privacy to f-Differential Privacy. arXiv:2602.04562, 2026

  39. [39]

    Membership Inference Attacks against Machine Learning Models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership Inference Attacks against Machine Learning Models. InIEEE S&P, 2017

  40. [40]

    Privacy Auditing with One (1) Training Run

    Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy Auditing with One (1) Training Run. InNeurIPS, 2023

  41. [41]

    Benchmarking differentially private synthetic data generation algorithms

    Yuchao Tao, Ryan McKenna, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. Benchmarking differentially private synthetic data generation algorithms. InPPAI, 2022

  42. [42]

    Bayesian Estimation of Differential Privacy

    Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, and Daniel Jones. Bayesian Estimation of Differential Privacy. InICML, 2023. 6 Black-box Default Black-box Default Black-box Default 0.0 0.1 0.2 0.3 0.4emp 0.390 0.429 0.401 0.426 0.432 0.446 Theory (via -zCDP) Empirical audit: ...