arxiv: 2511.13502 · v3 · submitted 2025-11-17 · 💻 cs.CR

SnapAudit: Active Auditing of Differentially Private In-Context Learning via Snapshot-Based Simulation

Yuyang Xia , Ruixuan Liu , Li Xiong This is my paper

Pith reviewed 2026-05-17 22:17 UTC · model grok-4.3

classification 💻 cs.CR

keywords differential privacyin-context learningprivacy auditingbootstrap simulationmembership inferencenoise calibrationLLM privacyactive auditing

0 comments

The pith

SnapAudit verifies privacy guarantees in differentially private in-context learning by snapshotting clean inferences and bootstrapping noise simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops SnapAudit to make auditing of DP-ICL pipelines practical. It separates the deterministic LLM inference from the random noise injection used for privacy. A limited number of clean runs at zero temperature build a snapshot, from which many simulated noisy runs are generated using bootstrap methods and uncertainty bounds. This leads to efficient, stable privacy estimates and the detection of two specific issues in prior DP-ICL implementations.

Core claim

SnapAudit achieves 80--200× speedup over prior passive auditing while producing tighter and more stable empirical privacy estimates that closely match theoretical guarantees. It uncovers two concrete flaws: classical Gaussian noise calibrations underestimate leakage at large privacy budgets, and the sensitivity analysis of an embedding-aggregation mechanism is incorrect when the number of partitions equals one.

What carries the argument

Decomposition into clean-inference snapshot and bootstrap simulation of the DP noise stage, with multi-sweep search for separable signals in embedding cases.

If this is right

Empirical privacy estimates match theoretical guarantees more closely than before.
Existing Gaussian calibrations fail to bound leakage when privacy budgets are large.
Incorrect sensitivity leads to privacy violations in single-partition embedding aggregation.
The method reduces the number of expensive LLM calls needed for auditing by two orders of magnitude.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This simulation approach could apply to verifying privacy in other LLM-based systems with mixed deterministic and stochastic steps.
Future designs of DP mechanisms for ICL should incorporate empirical checks using snapshot methods to avoid the identified calibration errors.
Active auditing frameworks like this may help standardize privacy verification beyond theoretical analysis alone.

Load-bearing premise

Clean LLM outputs at temperature zero are near-deterministic enough for a small number of calls to produce a snapshot that represents the full distribution when noise is added in simulation.

What would settle it

A direct comparison where the privacy leakage estimated from many full end-to-end membership inference attacks differs significantly from the SnapAudit bootstrap estimate.

Figures

Figures reproduced from arXiv: 2511.13502 by Li Xiong, Ruixuan Liu, Yuyang Xia.

**Figure 1.** Figure 1: An example of ICL privacy leakage. ing on a few demonstrations included in the input prompt. This phenomenon, popularized by GPT-3 [1], has driven widespread applications across classification, reasoning, and generation tasks [2]–[5]. However, in many real deployments, these demonstrations may contain private or proprietary data that is not intended to be revealed to downstream users, yet the model’s out… view at source ↗

**Figure 2.** Figure 2: General Auditing Framework for DP-ICL. budget? We formulate the auditing task as a Membership Inference Attack (MIA) against the DP-ICL mechanism. To carry out the attack, the auditor injects a canary exemplar into the demonstration set to create a neighboring dataset pair (with or without canary), and randomly samples one to feed into the DP-ICL system. Based on the output, the auditor attempts to disting… view at source ↗

**Figure 3.** Figure 3: A classification auditing example. C1, there are nine votes for ’no’ and one vote for ’yes’. Therefore, the clean vote vector Vwith = [1, 9]; in C0, Vwithout = [0, 10]. Then the DP-ICL system gets DP histograms by adding Gaussian noise to the clean vote vectors. The DP histogram of C1 is V˜ with = [5.5, 4] and of C2 is V˜ without = [4, 11]. The DP outputs of C1 and C0 are ’yes’ and ’no’, respectively. 3.3… view at source ↗

**Figure 4.** Figure 4: A generation auditing example. are far apart. The inferred membership is zˆ = ( 1, if y = y1, 0, if y = y0. For white-box access, it is easier because it naturally bypasses the 0-shot responses. The attacker classifies the intercepted DP embedding mean vector ˜ embavg based on its proximity to the two embedding vectors of y1 and y0. Formally, zˆ = ( 1, if || ˜ embavg − emb(y1)||2 − || ˜ embavg − emb(y0)||… view at source ↗

**Figure 5.** Figure 5: Overview of the text classification auditing result. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the number of partitions in the text classification task. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Relationship between the number of shots and the [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 10.** Figure 10: Relationship between the LLM temperature and [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 13.** Figure 13: Overview of the text generation auditing result. [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 12.** Figure 12: Relationship between the number of partitions [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 14.** Figure 14: Effect of the number of partitions in the text generation task. [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

**Figure 15.** Figure 15: Relationship between the number of shots and [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗

**Figure 18.** Figure 18: Relationship between the signal embedding dis [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

**Figure 21.** Figure 21: Relationship between the number of partitions [PITH_FULL_IMAGE:figures/full_fig_p013_21.png] view at source ↗

**Figure 20.** Figure 20: Relationship between different LLMs and empir [PITH_FULL_IMAGE:figures/full_fig_p013_20.png] view at source ↗

read the original abstract

In-context learning (ICL) allows LLMs to adapt to new tasks via a few demonstrations, but those demonstrations may contain sensitive data. Differentially private (DP) ICL mechanisms mitigate this risk by injecting noise into the aggregation step, but verifying that an implementation actually meets its claimed privacy bound currently requires repeated end-to-end membership-inference attacks (MIAs) against the pipeline as a black box, incurring prohibitive LLM cost and yielding unstable empirical privacy estimates. We propose SnapAudit, an active auditing framework that decomposes a DP-ICL pipeline into a deterministic clean-inference stage and a stochastic DP-noise stage, and audits the full pipeline by combining a small snapshot of the former with bootstrap simulation of the latter. Because clean LLM outputs are near-deterministic at temperature zero, a few thousand clean LLM calls suffice to approximate the snapshot distribution; SnapAudit then bootstraps $10^5$ noisy trials from this snapshot at negligible additional cost, with finite-sample uncertainty controlled via an empirical Bernstein correction. For embedding-based mechanisms, we further introduce a multi-sweep search procedure that constructs maximally separable audit signals. SnapAudit achieves $80$--$200\times$ speedup over prior passive auditing while producing tighter and more stable empirical privacy estimates that closely match theoretical guarantees. Beyond efficiency, SnapAudit uncovers two concrete flaws in existing DP-ICL designs: (i) classical Gaussian noise calibrations underestimate leakage at large privacy budgets, allowing empirical leakage to exceed the theoretical bound; (ii) the sensitivity analysis of an embedding-aggregation mechanism is incorrect when the number of partitions equals one, leading to undersized noise and an outright privacy violation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SnapAudit gives a practical speedup for auditing DP-ICL and flags two real design issues, though the snapshot assumption needs closer checks.

read the letter

The main point is that this paper splits DP-ICL auditing into a cheap deterministic snapshot from a few thousand temperature-zero LLM calls and then bootstraps the noise additions on top. That separation lets them run 10^5 trials at low cost and produces more stable privacy estimates than repeated full end-to-end attacks. They also add a multi-sweep search to build strong audit signals for embedding mechanisms. If the numbers hold, the 80-200x speedup is the practical win here, and the two flaw detections (Gaussian calibration underestimating leakage at large budgets, and wrong sensitivity when partitions equal one) are the kind of concrete findings that matter for people shipping these systems. The empirical Bernstein correction for finite-sample uncertainty is a sensible addition that keeps the method grounded. The soft spot is exactly the one the stress-test note raises: temperature zero is only approximately deterministic, so the snapshot can miss low-probability outputs that still affect the tail of the privacy estimate. The abstract says the estimates match theory closely, but without seeing the full experimental sections, snapshot-size ablations, or code, it is hard to judge how much truncation error leaks into the reported violations. The method itself does not appear circular; it is an empirical procedure rather than a self-referential derivation. This is aimed at researchers and engineers who need to verify privacy claims in LLM pipelines that use sensitive demonstrations. A reader who cares about deployable DP-ICL would find the auditing recipe useful even if they end up adjusting the snapshot size in their own setting. The work shows clear thinking on the cost bottleneck and honest engagement with the limitations of passive auditing. It deserves a serious referee because the core idea is testable and the claims are specific enough to check.

Referee Report

2 major / 2 minor

Summary. The paper introduces SnapAudit, an active auditing framework for differentially private in-context learning (DP-ICL). It decomposes the pipeline into a deterministic clean-inference stage (approximated via a snapshot of a few thousand temperature-zero LLM calls) and a stochastic DP-noise stage (bootstrapped with 10^5 simulated trials plus empirical Bernstein correction). For embedding-based mechanisms, a multi-sweep search constructs audit signals. The method claims 80--200× speedup over passive auditing, tighter and more stable empirical privacy estimates that closely match theory, and detection of two flaws: classical Gaussian noise underestimating leakage at large budgets, and incorrect sensitivity analysis for embedding aggregation when the number of partitions equals one.

Significance. If the snapshot faithfully represents the clean marginal, SnapAudit would provide a practical efficiency gain for verifying DP-ICL implementations, which is valuable given the LLM query costs of black-box MIAs. The deterministic-stochastic decomposition and bootstrap approach are a clear methodological strength, as is the explicit identification of concrete design flaws that could affect real deployments. Reproducible simulation procedures and finite-sample uncertainty control further support the contribution to privacy auditing tools.

major comments (2)

[§3.1] §3.1 (Snapshot Construction): The central efficiency and flaw-detection claims rest on the snapshot from a few thousand temperature-zero calls being a sufficient proxy for the clean-inference distribution. The manuscript does not provide a bound on missing mass for low-probability outputs or empirical verification that truncation does not bias the subsequent 10^5-trial bootstrap estimates of epsilon, particularly at large privacy budgets where the reported Gaussian calibration flaw appears.
[§5.2] §5.2 (Flaw (ii) on embedding-aggregation sensitivity): The claim of an outright privacy violation when the number of partitions equals one depends on the correctness of the reported sensitivity analysis error. Without the explicit derivation of the incorrect sensitivity value versus the correct one, or the resulting empirical epsilon excess, it is difficult to assess whether this is a load-bearing implementation bug or a minor edge-case discrepancy.

minor comments (2)

[§4.3] The multi-sweep search procedure for embedding mechanisms would benefit from a small concrete numerical example in the main text to illustrate how the maximally separable audit signal is constructed.
[Table 2] Table 2 reports speedup ranges; adding the precise number of LLM calls, model, and prompt lengths used for each baseline comparison would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our work. We provide point-by-point responses to the major comments and describe the changes we will incorporate in the revised manuscript.

read point-by-point responses

Referee: [§3.1] §3.1 (Snapshot Construction): The central efficiency and flaw-detection claims rest on the snapshot from a few thousand temperature-zero calls being a sufficient proxy for the clean-inference distribution. The manuscript does not provide a bound on missing mass for low-probability outputs or empirical verification that truncation does not bias the subsequent 10^5-trial bootstrap estimates of epsilon, particularly at large privacy budgets where the reported Gaussian calibration flaw appears.

Authors: We agree that additional support for the snapshot approximation would strengthen the paper. At temperature zero, LLM outputs are deterministic for the vast majority of inputs in our experiments, making the missing mass for low-probability tokens extremely small. However, to address the concern directly, we will add an empirical study in the revised version comparing the distribution from a few thousand samples to one with 10 times more samples, demonstrating that the bootstrap epsilon estimates are unaffected within the reported uncertainty. We will also include a discussion on why this holds even at large privacy budgets for the Gaussian flaw detection. revision: yes
Referee: [§5.2] §5.2 (Flaw (ii) on embedding-aggregation sensitivity): The claim of an outright privacy violation when the number of partitions equals one depends on the correctness of the reported sensitivity analysis error. Without the explicit derivation of the incorrect sensitivity value versus the correct one, or the resulting empirical epsilon excess, it is difficult to assess whether this is a load-bearing implementation bug or a minor edge-case discrepancy.

Authors: We appreciate this feedback. In the revised manuscript, we will add an explicit derivation of the sensitivity calculation for the case when the number of partitions equals one, contrasting the incorrect value used in the original mechanism with the correct one. We will also report the resulting empirical epsilon excess to demonstrate the severity of the violation. revision: yes

Circularity Check

0 steps flagged

No circularity: SnapAudit is an empirical simulation procedure independent of its measured outputs

full rationale

The paper describes a decomposition of the DP-ICL pipeline into a deterministic clean-inference stage (approximated via a finite snapshot of temperature-zero LLM calls) and a stochastic DP-noise stage (bootstrapped via simulation). Privacy estimates and speedup claims are produced by executing this new auditing procedure on the target mechanism and comparing the resulting empirical bounds against theoretical guarantees and prior passive methods. No equation or result is shown to be defined in terms of itself, no parameter is fitted to a subset and then relabeled as a prediction of a closely related quantity, and no load-bearing premise reduces to a self-citation whose content is unverified. The method remains self-contained against external benchmarks (full end-to-end MIAs) and does not rename known results or smuggle ansatzes via citation. The finite-snapshot assumption is an empirical modeling choice whose validity can be checked independently; it does not create a definitional loop inside the reported estimates.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the modeling assumption that temperature-zero LLM inference is near-deterministic and that bootstrap resampling from the snapshot faithfully represents the distribution of noisy outputs; no free parameters or new invented entities are introduced in the abstract.

axioms (2)

domain assumption Clean LLM outputs at temperature zero are near-deterministic, so a few thousand calls suffice to approximate the snapshot distribution.
Stated in the abstract as the basis for the snapshot stage.
domain assumption Bootstrap simulation of the DP-noise stage combined with an empirical Bernstein correction controls finite-sample uncertainty.
Invoked to justify the large number of simulated trials at low cost.

pith-pipeline@v0.9.0 · 5602 in / 1412 out tokens · 33103 ms · 2026-05-17T22:17:07.949492+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 6 internal anchors

[1]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language models are few-shot learners,”Advances in neural information pro- cessing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[2]

A Survey on In-context Learning

Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. Wu, T. Liuet al., “A survey on in-context learning,”arXiv preprint arXiv:2301.00234, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

De- mystifying prompts in language models via perplexity estimation,

H. Gonen, S. Iyer, T. Blevins, N. A. Smith, and L. Zettlemoyer, “De- mystifying prompts in language models via perplexity estimation,” arXiv preprint arXiv:2212.04037, 2022

work page arXiv 2022
[4]

Which examples to annotate for in- context learning? towards effective and efficient selection,

C. Mavromatis, B. Srinivasan, Z. Shen, J. Zhang, H. Rangwala, C. Faloutsos, and G. Karypis, “Which examples to annotate for in- context learning? towards effective and efficient selection,”arXiv preprint arXiv:2310.20046, 2023

work page arXiv 2023
[5]

Let’s learn step by step: Enhancing in-context learning ability with curriculum learning.arXiv preprint arXiv:2402.10738,

Y . Liu, J. Liu, X. Shi, Q. Cheng, Y . Huang, and W. Lu, “Let’s learn step by step: Enhancing in-context learning ability with curriculum learning,”arXiv preprint arXiv:2402.10738, 2024

work page arXiv 2024
[6]

Privacy-preserving in-context learning for large language models,

T. Wu, A. Panda, J. T. Wang, and P. Mittal, “Privacy-preserving in-context learning for large language models,”arXiv preprint arXiv:2305.01639, 2023

work page arXiv 2023
[7]

Privacy-preserving in- context learning with differentially private few-shot generation,

X. Tang, R. Shin, H. A. Inan, A. Manoel, F. Mireshghallah, Z. Lin, S. Gopi, J. Kulkarni, and R. Sim, “Privacy-preserving in- context learning with differentially private few-shot generation,”arXiv preprint arXiv:2309.11765, 2023

work page arXiv 2023
[8]

A general framework for auditing differ- entially private machine learning,

F. Lu, J. Munoz, M. Fuchs, T. LeBlond, E. Zaresky-Williams, E. Raff, F. Ferraro, and B. Testa, “A general framework for auditing differ- entially private machine learning,”Advances in Neural Information Processing Systems, vol. 35, pp. 4165–4176, 2022

work page 2022
[9]

Auditing differentially private machine learning: How private is private sgd?

M. Jagielski, J. Ullman, and A. Oprea, “Auditing differentially private machine learning: How private is private sgd?”Advances in Neural Information Processing Systems, vol. 33, pp. 22 205–22 216, 2020

work page 2020
[10]

Adver- sary instantiation: Lower bounds for differentially private machine learning,

M. Nasr, S. Songi, A. Thakurta, N. Papernot, and N. Carlin, “Adver- sary instantiation: Lower bounds for differentially private machine learning,” in2021 IEEE Symposium on security and privacy (SP). IEEE, 2021, pp. 866–882

work page 2021
[11]

Tight auditing of differentially private machine learning,

M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tram `er, M. Jagielski, N. Carlini, and A. Terzis, “Tight auditing of differentially private machine learning,” in32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 1631–1648

work page 2023
[12]

Membership inference attacks against in-context learning,

R. Wen, Z. Li, M. Backes, and Y . Zhang, “Membership inference attacks against in-context learning,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Secu- rity, 2024, pp. 3481–3495

work page 2024
[13]

Membership inference attack against long-context large language models,

Z. Wang, G. Liu, Y . Yang, and C. Wang, “Membership inference attack against long-context large language models,”arXiv preprint arXiv:2411.11424, 2024

work page arXiv 2024
[14]

Contextleak: Auditing leakage in private in-context learning methods,

J. Choi, S. Cao, X. Dong, and S. P. Karimireddy, “Contextleak: Auditing leakage in private in-context learning methods,” inThe Impact of Memorization on Trustworthy Foundation Models: ICML 2025 Workshop

work page 2025
[15]

Gaussian differential privacy,

J. Dong, A. Roth, and W. J. Su, “Gaussian differential privacy,”Jour- nal of the Royal Statistical Society Series B: Statistical Methodology, vol. 84, no. 1, pp. 3–37, 2022

work page 2022
[16]

Efron,The jackknife, the bootstrap and other resampling plans

B. Efron,The jackknife, the bootstrap and other resampling plans. SIAM, 1982

work page 1982
[17]

(2024, Jul.) Llama-3.1-8b-instruct

Meta. (2024, Jul.) Llama-3.1-8b-instruct. Hugging Face, Accessed: 2025-09-18. [Online]. Available: https://huggingface.co/meta- llama/Llama-3.1-8B-Instruct

work page 2024
[18]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI, “gpt-oss-120b & gpt-oss-20b model card,” 2025. [Online]. Available: https://arxiv.org/abs/2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Character-level convolutional networks for text classification,

X. Zhang, J. Zhao, and Y . LeCun, “Character-level convolutional networks for text classification,”Advances in neural information processing systems, vol. 28, 2015

work page 2015
[20]

Building a question answering test collection,

E. M. V oorhees and D. M. Tice, “Building a question answering test collection,” inProceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000, pp. 200–207

work page 2000
[21]

SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization,

B. Gliwa, I. Mochol, M. Biesek, and A. Wawer, “SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization,” inProceedings of the 2nd Workshop on New Frontiers in Summarization. Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 70–79. [Online]. Available: https://www.aclweb.org/anthology/D19-5409

work page 2019
[22]

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

S. Narayan, S. B. Cohen, and M. Lapata, “Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization,”ArXiv, vol. abs/1808.08745, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer, “Rethinking the role of demonstrations: What makes in-context learning work?”arXiv preprint arXiv:2202.12837, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

An Explanation of In-context Learning as Implicit Bayesian Inference

S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, “An explanation of in-context learning as implicit bayesian inference,”arXiv preprint arXiv:2111.02080, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[25]

What can trans- formers learn in-context? a case study of simple function classes,

S. Garg, D. Tsipras, P. S. Liang, and G. Valiant, “What can trans- formers learn in-context? a case study of simple function classes,” Advances in neural information processing systems, vol. 35, pp. 30 583–30 598, 2022

work page 2022
[26]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[27]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, G. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection,” arXiv:2302.12173, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Prompt injection attacks and defenses in llm applications,

H. Liu, Y . Zhang, J. Zouet al., “Prompt injection attacks and defenses in llm applications,”arXiv:2309.01327, 2023

work page arXiv 2023
[29]

Semi-supervised knowledge transfer for deep learning from private training data,

N. Papernot, M. Abadi, ´U. Erlingsson, I. Goodfellow, and K. Talwar, “Semi-supervised knowledge transfer for deep learning from private training data,” inInternational Conference on Learning Representa- tions, 2017

work page 2017
[30]

Scalable private learning with pate,

N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and U. Erlingsson, “Scalable private learning with pate,” inInternational Conference on Learning Representations, 2018

work page 2018
[31]

The algorithmic foundations of differential privacy,

C. Dwork, A. Rothet al., “The algorithmic foundations of differential privacy,”Foundations and trends® in theoretical computer science, vol. 9, no. 3–4, pp. 211–407, 2014

work page 2014
[32]

Mechanism design via differential pri- vacy,

F. McSherry and K. Talwar, “Mechanism design via differential pri- vacy,” in48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, 2007, pp. 94–103

work page 2007
[33]

Calibrating noise to sensitivity in private data analysis,

C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” inTheory of cryptography conference. Springer, 2006, pp. 265–284

work page 2006
[34]

Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising,

B. Balle and Y .-X. Wang, “Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising,” in International conference on machine learning. PMLR, 2018, pp. 394–403

work page 2018
[35]

Privacy amplification by sub- sampling: Tight analyses via couplings and divergences,

B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by sub- sampling: Tight analyses via couplings and divergences,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[36]

Subsampled r´enyi differential privacy and analytical moments accountant,

Y .-X. Wang, B. Balle, and S. P. Kasiviswanathan, “Subsampled r´enyi differential privacy and analytical moments accountant,” inThe 22nd international conference on artificial intelligence and statistics. PMLR, 2019, pp. 1226–1235

work page 2019
[37]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318

work page 2016
[38]

A survey on membership inference attacks and defenses in machine learning,

J. Niu, P. Liu, X. Zhu, K. Shen, Y . Wang, H. Chi, Y . Shen, X. Jiang, J. Ma, and Y . Zhang, “A survey on membership inference attacks and defenses in machine learning,”Journal of Information and Intel- ligence, vol. 2, no. 5, pp. 404–454, 2024

work page 2024
[39]

Mills’ ratio: Monotonicity patterns and functional inequal- ities,

´A. Baricz, “Mills’ ratio: Monotonicity patterns and functional inequal- ities,”Journal of Mathematical Analysis and Applications, vol. 340, no. 2, pp. 1362–1370, 2008

work page 2008
[40]

We do not collect any new human- subject data

Ethics Considerations This work investigates privacy risks in differentially private in-context learning (DP-ICL) using only standard, publicly available NLP benchmarks (e.g., Agnews, TREC, Samsum, and Xsum). We do not collect any new human- subject data. All canary exemplars used in our membership inference attacks are sampled from these benchmarks and t...

work page
[41]

Lemma A.1.The L2-sensitivity of the released noisy mean embedding from the ESA algorithm is 2 T , whereTis the number of disjoint exemplar subsets

Tighter Sensitivity Analysis In this section, we present our analysis of the text- generation mechanism based on the ESA algorithm. Lemma A.1.The L2-sensitivity of the released noisy mean embedding from the ESA algorithm is 2 T , whereTis the number of disjoint exemplar subsets. Proof.Consider contextsCandC ′ that differ only by the presence/absence (or r...

work page
[42]

{canary}

Query Prompt Used in Paper 1You are a highly accurate text verification system. Your sole function is to verify if the text in the <query> tag is present in the <context> tag. 2Your response must be a single word: either ’Yes’ or ’No’. 3 4<context> 5{formatted_context} 6</context> 7 8<query> 9{query_article} 10</query> 11 12<response> Prompt 1: Auditing q...

work page
[43]

{canary}

Check if the specific phrase "{canary}" is present in the context

work page
[44]

{Y1_TARGET}

If the context is empty, randomly output either "{Y1_TARGET}" or "{Y2_CONTROL}"

work page
[45]

{canary }

If the context is NOT empty and "{canary }" is found, output "{Y1_TARGET}"

work page
[46]

{canary }

If the context is NOT empty and "{canary }" is NOT found, output "{Y2_CONTROL}". 7 8CONTEXTUAL EXEMPLAR: 9{exemplar_context} 10 11RESPONSE: Prompt 3: Black-box auditing query for text generation. 1You are a highly accurate text classifier. Classify the following text into one of the provided categories based on the examples. 2The possible labels are: Worl...

work page
[47]

Pairs of Signal Texts y1 y2 L2 distance ”The old clock chimed a forgotten, dusty tune.” ”FOXBOROUGH – Looking at his ridiculously developed upper body, with huge biceps and hardly an ounce of fat.” 0.7476 ”[”Incorrect statement,”]” ”[]” 0.645 ”The red car sped down a long, winding road.” ”The blue boat sailed on a vast, open sea.” 0.5562 ”Real Madrid won ...

work page
[48]

The caseb= 0corresponds to no margin change, andb= 1corresponds to flipping one vote in favor of the canary and one against it

Proof for Lemma 3.1 Proof.For a fixed number of partitionsT, consider the canary-present contextV with = [k, T−k]and a general canary-absent contextV without = [k−b, T−k+b], where b∈[0,1]parameterizes the deterministic vote margin in- duced by the canary. The caseb= 0corresponds to no margin change, andb= 1corresponds to flipping one vote in favor of the ...

work page
[49]

It first partitions exemplars into disjoint partitions and initializes the clean vote vector (lines 1-2)

Detailed Algorithms for DP-ICL Mechanisms Private V oting Algorithm (Algorithm 2) handles tasks with a discrete label set. It first partitions exemplars into disjoint partitions and initializes the clean vote vector (lines 1-2). Each subset is concatenated with the query to form a prompt to obtain a class prediction from the LLM (lines 3- 5). The predicti...

work page