Recognition: 2 theorem links
· Lean TheoremPrivacy Auditing with Zero (0) Training Run
Pith reviewed 2026-05-15 01:24 UTC · model grok-4.3
The pith
Zero-Run privacy auditing yields valid differential privacy bounds from fixed member and non-member datasets without any model retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Zero-Run privacy auditing is a post-hoc framework that produces valid empirical lower bounds on differential privacy parameters by using two fixed datasets of members and non-members. It formalizes the confounding due to distribution shift and offers an adaptive-composition correction for global bounds and a pointwise-conditioning correction for instance-level bounds, both shown to be valid under the observational regime.
What carries the argument
Adaptive composition of distribution shift and algorithmic leakage, together with pointwise conditioning on observed data, to isolate privacy leakage from confounding.
If this is right
- Privacy evaluation becomes feasible for large foundation models where multiple training runs are too expensive.
- Global privacy bounds can be obtained conservatively without knowing instance-level details.
- Instance-dependent bounds allow sharper assessments for specific data points.
- Existing membership inference methods can be adapted into valid audits with these corrections.
Where Pith is reading between the lines
- Similar corrections might apply to auditing other properties such as fairness or robustness when distribution shifts are present.
- The approach could be tested on models where the true privacy parameters are known from controlled training to validate the bounds empirically.
- Extensions to cases with only approximate knowledge of membership status might be possible by treating uncertain points separately.
Load-bearing premise
The corrections fully remove bias from the distribution shift so that remaining signal reflects only algorithmic leakage.
What would settle it
Observing that the corrected audit still reports strong privacy on a model known to have leaked training data through overfitting would falsify the validity of the bounds.
Figures
read the original abstract
Privacy auditing provides empirical lower bounds on the differential privacy parameters of learning algorithms. Existing methods, however, require interventional access to the training pipeline, either to retrain multiple times or to randomize data inclusion. This is often infeasible for large deployed systems such as foundation models. We introduce Zero-Run privacy auditing, a post-hoc framework for auditing models using two fixed datasets: examples known to be training-set members and examples known to be non-members. In this observational regime, membership is no longer randomized; instead, member and non-member data often differ in distribution, so membership inference scores may reflect a distribution shift rather than algorithmic leakage. Drawing on ideas from causal inference, we formalize this confounding effect and propose two complementary corrections that yield valid privacy audits. Our first approach models the combined effect of distribution shift and algorithmic leakage as an adaptive composition, producing conservative global corrections. Our second approach conditions on observed data and adjusts pointwise membership guesses, yielding sharper instance-dependent bounds. Experiments on synthetic data and large-scale models show that Zero-Run auditing enables practical privacy evaluation when retraining or controlled data insertion is infeasible.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce Zero-Run privacy auditing, a post-hoc framework for auditing models using two fixed datasets of known training-set members and non-members. It formalizes the confounding effect due to distribution shift in this observational regime and proposes two complementary corrections—an adaptive composition for global bounds and pointwise conditioning for instance-dependent bounds—that are claimed to yield valid privacy audits without requiring retraining or randomized data insertion.
Significance. If the corrections are shown to produce valid lower bounds on DP parameters by properly isolating algorithmic leakage from distribution shift, this would be a significant contribution for privacy evaluation of large deployed systems such as foundation models, where multiple training runs are infeasible.
major comments (2)
- [Formalization of confounding effect] The central claim requires that the adaptive composition and pointwise conditioning corrections produce valid lower bounds despite non-randomized membership. This holds only if the distribution shift is modeled as an independent mechanism that composes conservatively with the training algorithm's leakage. The manuscript must provide a detailed proof or derivation demonstrating the absence of residual confounding from how the shift interacts with model parameters or the loss landscape.
- [Abstract] The abstract describes the formalization and two corrections but supplies no equations, proofs, or error-bar details; without the full derivation the central claim cannot be verified.
minor comments (1)
- [Experiments] Details on the synthetic data setup and how the corrections are applied to large-scale models should be expanded for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments. We address each major comment point by point below, providing clarifications and indicating revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Formalization of confounding effect] The central claim requires that the adaptive composition and pointwise conditioning corrections produce valid lower bounds despite non-randomized membership. This holds only if the distribution shift is modeled as an independent mechanism that composes conservatively with the training algorithm's leakage. The manuscript must provide a detailed proof or derivation demonstrating the absence of residual confounding from how the shift interacts with model parameters or the loss landscape.
Authors: We appreciate the referee's emphasis on rigorous justification. In Section 3, we model distribution shift as an independent causal mechanism using potential outcomes and derive that adaptive composition yields conservative global bounds by treating shift-induced divergence as an additive term that cannot decrease the observed privacy leakage. We will expand the appendix with an explicit derivation showing that, under the observational regime where membership is fixed and shift is independent of the loss landscape (as justified by the fixed datasets), no residual confounding arises from parameter interactions. This addresses the concern directly. revision: yes
-
Referee: [Abstract] The abstract describes the formalization and two corrections but supplies no equations, proofs, or error-bar details; without the full derivation the central claim cannot be verified.
Authors: We agree the abstract is high-level by design. The full equations for adaptive composition and pointwise conditioning, along with derivations and error analysis, appear in Sections 3 and 4 and the appendix. We will revise the abstract to include one key bounding equation and a reference to the formal sections for improved verifiability while preserving brevity. revision: partial
Circularity Check
No circularity: derivations draw on external causal-inference formalisms without reducing to self-defined inputs or fitted predictions
full rationale
The paper's core derivation formalizes membership inference confounding via distribution shift using ideas from causal inference, then introduces adaptive composition for global bounds and pointwise conditioning for instance-level adjustments. These steps are presented as independent modeling choices that produce conservative or sharper bounds on the DP parameter, without any quoted equations showing that the corrected scores are defined in terms of the target leakage quantity itself or obtained by fitting to the same membership signals being audited. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing; experiments on synthetic and large-scale models serve as external validation rather than tautological confirmation. The approach therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Distribution shift and algorithmic leakage can be jointly modeled as an adaptive composition or conditioned pointwise without introducing new bias.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formalize the post-hoc auditing regime... propensity score π(x)=P(Si=1|Xi=x)... adaptive composition... pointwise conditioning
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.1... fTot = f ⊗ fDS... Assumption 2 (Approximate overlap)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing
Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, and Emiliano De Cristofaro. The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing. arXiv preprint arXiv:2506.16666, 2025
-
[2]
The secret sharer: Evaluating and testing unintended memorization in neural networks
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019
work page 2019
-
[3]
Extracting training data from large language models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Kather- ine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21), pages 2633–2650, 2021
work page 2021
-
[4]
Quantifying Memorization Across Neural Language Models
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models, 2023. URL https://arxiv.org/abs/2202.07646
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Tudor Cebere, Aurélien Bellet, and Nicolas Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InICLR, 2025
work page 2025
-
[6]
Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, and Jack Fitzsimons. Privacy in theory, bugs in practice: Grey-box auditing of differential privacy libraries.arXiv preprint arXiv:2602.17454, 2026
-
[7]
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, and Neil Zeghidour. Moshi: a speech-text foundation model for real-time dialogue.arXiv preprint arXiv:2410.00037, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Tabula: A tabular self-supervised foundation model for single-cell transcriptomics
Jiayuan Ding, Jianhui Lin, Shiyu Jiang, Yixin Wang, Ziyang Miao, Zhaoyu Fang, Jiliang Tang, Min Li, and Xiaojie Qiu. Tabula: A tabular self-supervised foundation model for single-cell transcriptomics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[9]
Detecting violations of differential privacy
Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 475–489, 2018
work page 2018
-
[10]
Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(1):3–37, 2022
work page 2022
-
[11]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. URLhttps://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[12]
Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy.Founda- tions and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014. 10
work page 2014
-
[13]
Calibrating noise to sensi- tivity in private data analysis
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensi- tivity in private data analysis. InProceedings of the 3rd Conference on Theory of Cryptography, TCC ’06, pages 265–284, Berlin, Heidelberg, 2006. Springer
work page 2006
-
[14]
Bradley Efron. Better bootstrap confidence intervals.Journal of the American statistical Association, 82(397):171–185, 1987
work page 1987
-
[15]
European Data Protection Board. Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of ai mod- els, 2024. URL https://www.edpb.europa.eu/our-work-tools/our-documents/ opinion-board-art-64/opinion-282024-certain-data-protection-aspects_en
work page 2024
-
[16]
Membership inference attacks from causal principles.arXiv preprint arXiv:2602.02819, 2026
Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, and Aurélien Bellet. Membership inference attacks from causal principles.arXiv preprint arXiv:2602.02819, 2026
-
[17]
Gaussian dp for reporting differential privacy guarantees in machine learning, 2025
Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Jamie Hayes, Borja Balle, and Antti Honkela. Gaussian dp for reporting differential privacy guarantees in machine learning, 2025. URLhttps://arxiv.org/abs/2503.10945
-
[18]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second, 2023. URL https: //arxiv.org/abs/2207.01848
work page internal anchor Pith review arXiv 2023
-
[19]
Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd?Advances in Neural Information Processing Systems, 33: 22205–22216, 2020
work page 2020
-
[20]
Evaluating differentially private machine learning in practice
Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In28th USENIX security symposium (USENIX security 19), pages 1895–1912, 2019
work page 1912
-
[21]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023. URL https://arxi...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
Observa- tional auditing of label privacy, 2025
Iden Kalemaj, Luca Melis, Maxime Boucher, Ilya Mironov, and Saeed Mahloujifar. Observa- tional auditing of label privacy, 2025
work page 2025
-
[23]
Mishaal Kazmi, Hadrien Lautraite, Alireza Akbari, Qiaoyue Tang, Mauricio Soroco, Tao Wang, Sébastien Gambs, and Mathias Lécuyer. Panoramia: Privacy auditing of machine learning models without retraining.Advances in Neural Information Processing Systems, 37:57262– 57300, 2024
work page 2024
-
[24]
Analyzing leakage of personally identifiable information in language models
Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, and Santiago Zanella- Béguelin. Analyzing leakage of personally identifiable information in language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 346–363. IEEE, 2023
work page 2023
-
[25]
Auditing f-differential privacy in one run.arXiv preprint arXiv:2410.22235, 2024
Saeed Mahloujifar, Luca Melis, and Kamalika Chaudhuri. Auditing f-differential privacy in one run.arXiv preprint arXiv:2410.22235, 2024
-
[26]
Wilds: A benchmark of in-the-wild distribution shifts.arXiv preprint arXiv:2012.07421, 2020
Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, et al. Wilds: A benchmark of in-the-wild distribution shifts.arXiv preprint arXiv:2012.07421, 2020
-
[27]
Did the neurons read your book? document-level membership inference for large language models
Matthieu Meeus, Shubham Jain, Marek Rei, and Yves-Alexandre de Montjoye. Did the neurons read your book? document-level membership inference for large language models. In33rd USENIX Security Symposium (USENIX Security 24), pages 2369–2385, 2024
work page 2024
-
[28]
Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, and Yves-Alexandre de Montjoye. Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it).arXiv preprint arXiv:2406.17975, 2024. 11
-
[29]
Ad- versary instantiation: Lower bounds for differentially private machine learning
Milad Nasr, Shuang Songi, Abhradeep Thakurta, Nicolas Papernot, and Nicholas Carlin. Ad- versary instantiation: Lower bounds for differentially private machine learning. In2021 IEEE Symposium on security and privacy (SP), pages 866–882. IEEE, 2021
work page 2021
-
[30]
Tight auditing of differentially private machine learning
Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, and Andreas Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023
work page 2023
-
[31]
F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011
work page 2011
-
[32]
Zero-shot text-to-image generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational conference on machine learning, pages 8821–8831. Pmlr, 2021
work page 2021
-
[33]
Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983
work page 1983
-
[34]
Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models.arXiv preprint arXiv:2310.16789, 2023
-
[35]
Vaultgemma: A differentially private gemma model.arXiv preprint arXiv:2510.15001, 2025
Amer Sinha, Thomas Mesnard, Ryan McKenna, Daogao Liu, Christopher A Choquette-Choo, Yangsibo Huang, Da Yu, George Kaissis, Zachary Charles, Ruibo Liu, et al. Vaultgemma: A differentially private gemma model.arXiv preprint arXiv:2510.15001, 2025
-
[36]
David Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The central limit theorem in differential privacy.Cryptology ePrint Archive, 2018
work page 2018
-
[37]
Privacy auditing with one (1) training run
Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. Advances in Neural Information Processing Systems, 36:49268–49280, 2023
work page 2023
-
[38]
An investigation of memorization risk in healthcare foundation models
Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, and Marzyeh Ghas- semi. An investigation of memorization risk in healthcare foundation models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/forum?id=NMvMYtRjkg
work page 2025
-
[39]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, and Nicholas Carlini. Debugging differential privacy: A case study for privacy auditing.arXiv preprint arXiv:2202.12219, 2022
-
[41]
Cambridge university press, 2000
Aad W Van der Vaart.Asymptotic statistics, volume 3. Cambridge university press, 2000
work page 2000
-
[42]
Stefan Wager. Causal inference: A statistical learning approach, 2024. A Additional general content and proofs A.1 Privacy Loss Random Variable Definition 4.Define the Privacy Loss Random Variable (PLRV for short) as [36]: PLRVA D,D′(θ) = log P(A(D) =θ) P(A(D ′) =θ) ,(3) for two adjacent datasetsDandD ′ and any mechanismA. 12 A.2 Proof of Theorem 4.1 Proo...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.