pith. machine review for the scientific record. sign in

arxiv: 2605.14591 · v1 · submitted 2026-05-14 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

Privacy Auditing with Zero (0) Training Run

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:24 UTC · model grok-4.3

classification 💻 cs.CR
keywords privacy auditingdifferential privacymembership inferencedistribution shiftpost-hoc analysiscausal correctionlarge language models
0
0 comments X

The pith

Zero-Run privacy auditing yields valid differential privacy bounds from fixed member and non-member datasets without any model retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to audit the privacy of trained models using only two fixed datasets: one of known training examples and one of known non-training examples. Because these datasets may come from different distributions, simple membership inference can be misled by that shift rather than by how the model was trained. The authors borrow tools from causal inference to separate the two effects and give two ways to correct the audit scores, one conservative and global and one sharper and local. This matters for large deployed models where retraining or randomizing data inclusion is impractical. If the corrections work, practitioners gain a practical way to check privacy guarantees after the fact.

Core claim

Zero-Run privacy auditing is a post-hoc framework that produces valid empirical lower bounds on differential privacy parameters by using two fixed datasets of members and non-members. It formalizes the confounding due to distribution shift and offers an adaptive-composition correction for global bounds and a pointwise-conditioning correction for instance-level bounds, both shown to be valid under the observational regime.

What carries the argument

Adaptive composition of distribution shift and algorithmic leakage, together with pointwise conditioning on observed data, to isolate privacy leakage from confounding.

If this is right

  • Privacy evaluation becomes feasible for large foundation models where multiple training runs are too expensive.
  • Global privacy bounds can be obtained conservatively without knowing instance-level details.
  • Instance-dependent bounds allow sharper assessments for specific data points.
  • Existing membership inference methods can be adapted into valid audits with these corrections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar corrections might apply to auditing other properties such as fairness or robustness when distribution shifts are present.
  • The approach could be tested on models where the true privacy parameters are known from controlled training to validate the bounds empirically.
  • Extensions to cases with only approximate knowledge of membership status might be possible by treating uncertain points separately.

Load-bearing premise

The corrections fully remove bias from the distribution shift so that remaining signal reflects only algorithmic leakage.

What would settle it

Observing that the corrected audit still reports strong privacy on a model known to have leaked training data through overfitting would falsify the validity of the bounds.

Figures

Figures reproduced from arXiv: 2605.14591 by Aur\'elien Bellet, Linus Bleistein, Mathieu Even, Tudor Cebere.

Figure 1
Figure 1. Figure 1: Overview of the Zero-Run Privacy Auditing framework. Observational data ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evaluation of zero-run auditing under both controlled synthetic conditions and real-world [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Privacy auditing provides empirical lower bounds on the differential privacy parameters of learning algorithms. Existing methods, however, require interventional access to the training pipeline, either to retrain multiple times or to randomize data inclusion. This is often infeasible for large deployed systems such as foundation models. We introduce Zero-Run privacy auditing, a post-hoc framework for auditing models using two fixed datasets: examples known to be training-set members and examples known to be non-members. In this observational regime, membership is no longer randomized; instead, member and non-member data often differ in distribution, so membership inference scores may reflect a distribution shift rather than algorithmic leakage. Drawing on ideas from causal inference, we formalize this confounding effect and propose two complementary corrections that yield valid privacy audits. Our first approach models the combined effect of distribution shift and algorithmic leakage as an adaptive composition, producing conservative global corrections. Our second approach conditions on observed data and adjusts pointwise membership guesses, yielding sharper instance-dependent bounds. Experiments on synthetic data and large-scale models show that Zero-Run auditing enables practical privacy evaluation when retraining or controlled data insertion is infeasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce Zero-Run privacy auditing, a post-hoc framework for auditing models using two fixed datasets of known training-set members and non-members. It formalizes the confounding effect due to distribution shift in this observational regime and proposes two complementary corrections—an adaptive composition for global bounds and pointwise conditioning for instance-dependent bounds—that are claimed to yield valid privacy audits without requiring retraining or randomized data insertion.

Significance. If the corrections are shown to produce valid lower bounds on DP parameters by properly isolating algorithmic leakage from distribution shift, this would be a significant contribution for privacy evaluation of large deployed systems such as foundation models, where multiple training runs are infeasible.

major comments (2)
  1. [Formalization of confounding effect] The central claim requires that the adaptive composition and pointwise conditioning corrections produce valid lower bounds despite non-randomized membership. This holds only if the distribution shift is modeled as an independent mechanism that composes conservatively with the training algorithm's leakage. The manuscript must provide a detailed proof or derivation demonstrating the absence of residual confounding from how the shift interacts with model parameters or the loss landscape.
  2. [Abstract] The abstract describes the formalization and two corrections but supplies no equations, proofs, or error-bar details; without the full derivation the central claim cannot be verified.
minor comments (1)
  1. [Experiments] Details on the synthetic data setup and how the corrections are applied to large-scale models should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We address each major comment point by point below, providing clarifications and indicating revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Formalization of confounding effect] The central claim requires that the adaptive composition and pointwise conditioning corrections produce valid lower bounds despite non-randomized membership. This holds only if the distribution shift is modeled as an independent mechanism that composes conservatively with the training algorithm's leakage. The manuscript must provide a detailed proof or derivation demonstrating the absence of residual confounding from how the shift interacts with model parameters or the loss landscape.

    Authors: We appreciate the referee's emphasis on rigorous justification. In Section 3, we model distribution shift as an independent causal mechanism using potential outcomes and derive that adaptive composition yields conservative global bounds by treating shift-induced divergence as an additive term that cannot decrease the observed privacy leakage. We will expand the appendix with an explicit derivation showing that, under the observational regime where membership is fixed and shift is independent of the loss landscape (as justified by the fixed datasets), no residual confounding arises from parameter interactions. This addresses the concern directly. revision: yes

  2. Referee: [Abstract] The abstract describes the formalization and two corrections but supplies no equations, proofs, or error-bar details; without the full derivation the central claim cannot be verified.

    Authors: We agree the abstract is high-level by design. The full equations for adaptive composition and pointwise conditioning, along with derivations and error analysis, appear in Sections 3 and 4 and the appendix. We will revise the abstract to include one key bounding equation and a reference to the formal sections for improved verifiability while preserving brevity. revision: partial

Circularity Check

0 steps flagged

No circularity: derivations draw on external causal-inference formalisms without reducing to self-defined inputs or fitted predictions

full rationale

The paper's core derivation formalizes membership inference confounding via distribution shift using ideas from causal inference, then introduces adaptive composition for global bounds and pointwise conditioning for instance-level adjustments. These steps are presented as independent modeling choices that produce conservative or sharper bounds on the DP parameter, without any quoted equations showing that the corrected scores are defined in terms of the target leakage quantity itself or obtained by fitting to the same membership signals being audited. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing; experiments on synthetic and large-scale models serve as external validation rather than tautological confirmation. The approach therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability to separate distribution shift from algorithmic leakage using causal-inference modeling; no free parameters or new entities are mentioned in the abstract.

axioms (1)
  • domain assumption Distribution shift and algorithmic leakage can be jointly modeled as an adaptive composition or conditioned pointwise without introducing new bias.
    Invoked to justify the two correction strategies.

pith-pipeline@v0.9.0 · 5497 in / 1174 out tokens · 46324 ms · 2026-05-15T01:24:02.414081+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 6 internal anchors

  1. [1]

    The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing

    Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, and Emiliano De Cristofaro. The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing. arXiv preprint arXiv:2506.16666, 2025

  2. [2]

    The secret sharer: Evaluating and testing unintended memorization in neural networks

    Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019

  3. [3]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Kather- ine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650, 2021

  4. [4]

    Quantifying Memorization Across Neural Language Models

    Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models, 2023. URL https://arxiv.org/abs/2202.07646

  5. [5]

    Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

    Tudor Cebere, Aurélien Bellet, and Nicolas Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InICLR, 2025

  6. [6]

    Privacy in theory, bugs in practice: Grey-box auditing of differential privacy libraries.arXiv preprint arXiv:2602.17454, 2026

    Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, and Jack Fitzsimons. Privacy in theory, bugs in practice: Grey-box auditing of differential privacy libraries.arXiv preprint arXiv:2602.17454, 2026

  7. [7]

    Moshi: a speech-text foundation model for real-time dialogue

    Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, and Neil Zeghidour. Moshi: a speech-text foundation model for real-time dialogue.arXiv preprint arXiv:2410.00037, 2024

  8. [8]

    Tabula: A tabular self-supervised foundation model for single-cell transcriptomics

    Jiayuan Ding, Jianhui Lin, Shiyu Jiang, Yixin Wang, Ziyang Miao, Zhaoyu Fang, Jiliang Tang, Min Li, and Xiaojie Qiu. Tabula: A tabular self-supervised foundation model for single-cell transcriptomics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  9. [9]

    Detecting violations of differential privacy

    Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 475–489, 2018

  10. [10]

    Gaussian differential privacy.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(1):3–37, 2022

    Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(1):3–37, 2022

  11. [11]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URLhttps://arxiv.org/abs/2010.11929

  12. [12]

    The Algorithmic Foundations of Differential Privacy.Founda- tions and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014

    Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy.Founda- tions and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014. 10

  13. [13]

    Calibrating noise to sensi- tivity in private data analysis

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi- tivity in private data analysis. InProceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284, Berlin, Heidelberg, 2006. Springer

  14. [14]

    Better bootstrap confidence intervals.Journal of the American statistical Association, 82(397):171–185, 1987

    Bradley Efron. Better bootstrap confidence intervals.Journal of the American statistical Association, 82(397):171–185, 1987

  15. [15]

    Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of ai mod- els, 2024

    European Data Protection Board. Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of ai mod- els, 2024. URL https://www.edpb.europa.eu/our-work-tools/our-documents/ opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en

  16. [16]

    Membership inference attacks from causal principles.arXiv preprint arXiv:2602.02819, 2026

    Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, and Aurélien Bellet. Membership inference attacks from causal principles.arXiv preprint arXiv:2602.02819, 2026

  17. [17]

    Gaussian dp for reporting differential privacy guarantees in machine learning, 2025

    Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Jamie Hayes, Borja Balle, and Antti Honkela. Gaussian dp for reporting differential privacy guarantees in machine learning, 2025. URLhttps://arxiv.org/abs/2503.10945

  18. [18]

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second, 2023. URL https: //arxiv.org/abs/2207.01848

  19. [19]

    Auditing differentially private machine learning: How private is private sgd?Advances in Neural Information Processing Systems, 33: 22205–22216, 2020

    Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd?Advances in Neural Information Processing Systems, 33: 22205–22216, 2020

  20. [20]

    Evaluating differentially private machine learning in practice

    Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In28th USENIX security symposium (USENIX security 19), pages 1895–1912, 2019

  21. [21]

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...

  22. [22]

    Observa- tional auditing of label privacy, 2025

    Iden Kalemaj, Luca Melis, Maxime Boucher, Ilya Mironov, and Saeed Mahloujifar. Observa- tional auditing of label privacy, 2025

  23. [23]

    Panoramia: Privacy auditing of machine learning models without retraining.Advances in Neural Information Processing Systems, 37:57262– 57300, 2024

    Mishaal Kazmi, Hadrien Lautraite, Alireza Akbari, Qiaoyue Tang, Mauricio Soroco, Tao Wang, Sébastien Gambs, and Mathias Lécuyer. Panoramia: Privacy auditing of machine learning models without retraining.Advances in Neural Information Processing Systems, 37:57262– 57300, 2024

  24. [24]

    Analyzing leakage of personally identifiable information in language models

    Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella- Béguelin. Analyzing leakage of personally identifiable information in language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 346–363. IEEE, 2023

  25. [25]

    Auditing f-differential privacy in one run.arXiv preprint arXiv:2410.22235, 2024

    Saeed Mahloujifar, Luca Melis, and Kamalika Chaudhuri. Auditing f-differential privacy in one run.arXiv preprint arXiv:2410.22235, 2024

  26. [26]

    Wilds: A benchmark of in-the-wild distribution shifts.arXiv preprint arXiv:2012.07421, 2020

    Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, et al. Wilds: A benchmark of in-the-wild distribution shifts.arXiv preprint arXiv:2012.07421, 2020

  27. [27]

    Did the neurons read your book? document-level membership inference for large language models

    Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Did the neurons read your book? document-level membership inference for large language models. In33rd USENIX Security Symposium (USENIX Security 24), pages 2369–2385, 2024

  28. [28]

    Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it).arXiv preprint arXiv:2406.17975, 2024

    Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, and Yves-Alexandre de Montjoye. Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it).arXiv preprint arXiv:2406.17975, 2024. 11

  29. [29]

    Ad- versary instantiation: Lower bounds for differentially private machine learning

    Milad Nasr, Shuang Songi, Abhradeep Thakurta, Nicolas Papernot, and Nicholas Carlin. Ad- versary instantiation: Lower bounds for differentially private machine learning. In2021 IEEE Symposium on security and privacy (SP), pages 866–882. IEEE, 2021

  30. [30]

    Tight auditing of differentially private machine learning

    Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023

  31. [31]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

  32. [32]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational conference on machine learning, pages 8821–8831. Pmlr, 2021

  33. [33]

    The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

  34. [34]

    Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023

    Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023

  35. [35]

    Vaultgemma: A differentially private gemma model.arXiv preprint arXiv:2510.15001, 2025

    Amer Sinha, Thomas Mesnard, Ryan McKenna, Daogao Liu, Christopher A Choquette-Choo, Yangsibo Huang, Da Yu, George Kaissis, Zachary Charles, Ruibo Liu, et al. Vaultgemma: A differentially private gemma model.arXiv preprint arXiv:2510.15001, 2025

  36. [36]

    Privacy loss classes: The central limit theorem in differential privacy.Cryptology ePrint Archive, 2018

    David Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The central limit theorem in differential privacy.Cryptology ePrint Archive, 2018

  37. [37]

    Privacy auditing with one (1) training run

    Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. Advances in Neural Information Processing Systems, 36:49268–49280, 2023

  38. [38]

    An investigation of memorization risk in healthcare foundation models

    Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, and Marzyeh Ghas- semi. An investigation of memorization risk in healthcare foundation models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=NMvMYtRjkg

  39. [39]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  40. [40]

    Debugging differential privacy: A case study for privacy auditing.arXiv preprint arXiv:2202.12219, 2022

    Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, and Nicholas Carlini. Debugging differential privacy: A case study for privacy auditing.arXiv preprint arXiv:2202.12219, 2022

  41. [41]

    Cambridge university press, 2000

    Aad W Van der Vaart.Asymptotic statistics, volume 3. Cambridge university press, 2000

  42. [42]

    activated

    Stefan Wager. Causal inference: A statistical learning approach, 2024. A Additional general content and proofs A.1 Privacy Loss Random Variable Definition 4.Define the Privacy Loss Random Variable (PLRV for short) as [36]: PLRVA D,D′(θ) = log P(A(D) =θ) P(A(D ′) =θ) ,(3) for two adjacent datasetsDandD ′ and any mechanismA. 12 A.2 Proof of Theorem 4.1 Proo...