pith. sign in

arxiv: 2606.01849 · v2 · pith:DHZFYP5Pnew · submitted 2026-06-01 · 💻 cs.LG · cs.CL· cs.CR

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Pith reviewed 2026-06-28 15:35 UTC · model grok-4.3

classification 💻 cs.LG cs.CLcs.CR
keywords differential privacysynthetic datatext synthesisbenchmarkmachine learningprivacy preserving MLcapability evaluation
0
0 comments X

The pith

Differentially private synthetic text fails to transfer new capabilities from private corpora even at ε=100, while non-private synthesis succeeds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ContinuousBench, a benchmark that pairs fresh training corpora with QA sets designed to be unsolvable without the corpus and supported by hundreds of independent records. This setup tests whether DP synthetic data can substitute for original data access by measuring actual capability gains on new knowledge. Results show non-private synthesis transfers substantial knowledge, enabling strong QA performance, but state-of-the-art DP methods do not, even with loose privacy budgets. The benchmark regenerates quarterly with new data to avoid saturation issues in existing evaluations.

Core claim

On ContinuousBench, non-private synthesis transfers substantial knowledge from the original corpus, while state-of-the-art DP synthesis methods generally fail to do so, even at ε=100.

What carries the argument

ContinuousBench, a continuously regenerated benchmark that pairs never-before-seen corpora with derived QA sets constructed to be unsolvable without the corpus and learnable under DP noise.

If this is right

  • DP synthetic data cannot currently substitute for original corpus access when the goal is to acquire new, corpus-specific capabilities.
  • Improvements to DP text synthesis are needed before it can reliably transmit learnable signals from sensitive data.
  • Standard benchmarks may overestimate the utility of DP synthesis because they do not isolate corpus-dependent knowledge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The finding implies that for applications requiring novel knowledge extraction, privacy-preserving synthesis may still necessitate additional non-private data sources.
  • Extending the benchmark to other data types such as code or structured records could reveal whether the failure is specific to text.
  • If future DP methods close the gap, ContinuousBench would provide a standardized way to quantify progress on capability transfer.

Load-bearing premise

The QA sets are constructed to be unsolvable without the corpus and the tested knowledge is supported by hundreds of independent records allowing learnability under DP noise.

What would settle it

A DP synthesis method producing data that lets models score substantially above the no-corpus baseline on the QA sets would show the method succeeds where current ones fail.

Figures

Figures reproduced from arXiv: 2606.01849 by Alex Bie, Gautam Kamath, Lillian Tsai, Lucas Rosenblatt, Natalia Ponomareva, Peihan Liu, Rachel Cummings, Roxana Geambasu, Weiwei Kong, Yu Gan.

Figure 1
Figure 1. Figure 1: A saturated benchmark leaves little room to distinguish methods. We compare down￾stream accuracy of GEMMA 3 4B after 3 increasingly informative training regimes: no training, training on DP synthetic data (ε = 10), and training on the real corpus. Standard benchmarks are near-saturated, but CONTINUOUSBENCH reveals that DP synthesis has significant headroom. stylistically similar text, but rather, whether D… view at source ↗
Figure 2
Figure 2. Figure 2: Workflow of CONTINUOUSBENCH. Each release (A) pairs a freshly generated training corpus with a derived QA set. The participant runs their DP synthesis method (B) on the corpus to obtain synthetic data; our standardized harness (shaded) then fine-tunes (C) the same checkpoint used for synthesis on this data and scores it on the held-out QA test set (D). Releases are regenerated periodically (bottom loop), k… view at source ↗
Figure 3
Figure 3. Figure 3: GEMINON example records. A procedurally generated index entry (left), one of the articles (middle), and example QA pairs (right). support contains the article ids that support this answer. event-level redundancy across outlets. Unlike GEMINON, NEWS reflects real-world language, top￾ical breadth, extraction errors, uneven event coverage, and occasional cross-source inconsistencies, making it a stress test f… view at source ↗
Figure 4
Figure 4. Figure 4: NEWS example records. A CC-News source article (left) and one of the derived QA pairs (right). Each QA includes a closed-book baseline (Gemini answers without context) and open￾book answers from retrieved candidate articles; article whose corresponding closedbook answer is correct is counted towards support Estimating the number of records supporting a question. The two tracks estimate support count differ… view at source ↗
Figure 5
Figure 5. Figure 5: Capability gap between non-private and DP synthesis, measured by downstream QA ac [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: MAUVE scores for synthetic corpora, from either a 1B or 4B generator, against original [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evaluator training on DP-synthetic data vs. direct DP fine-tuning on the original corpus. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy (%) on NEWS as a function of QA support count. QAs are bucketed by support count; we plot the average support count and accuracy inside each bucket. Error bars show standard deviation. the floor; the singleton split remains low throughout training. For NEWS, higher-support buckets improve earlier and reach higher final accuracy across training regimes, but DP-synthetic training remains substantial… view at source ↗
Figure 9
Figure 9. Figure 9: LLM-match accuracy on the NEWS test QA set during evaluator (1B-Full) training on PE-generated corpora. Results are stratified by support count threshold k. For both generator mod￾els, accuracy generally decreases as training proceeds, indicating that PE-generated data does not provide stable learnable signal for the target news facts. We see a similar pattern in NEWS when tracking QA accuracy throughout e… view at source ↗
Figure 10
Figure 10. Figure 10: Evaluator training on DP-synthetic data vs. direct DP fine-tuning on the original cor￾pus, both with 4B models. This baseline removes the synthetic-data medi￾ation step, but downgrades the release object from a DP synthetic dataset to a DP model. It is therefore not a replacement for DP data synthesis; rather, it serves as a diagnostic up￾per bound on what the same privacy budget al￾lows a model to learn … view at source ↗
Figure 11
Figure 11. Figure 11: GEMINON QA accuracy over evaluator training steps. The left panel shows the high￾repetition QA split, while the right panel shows the singleton split. Repeated facts are learned rapidly from the real corpus and non-private synthetic data, whereas DP-synthetic training remains much lower. Singleton facts remain low across regimes, consistent with the intended privacy sanity check. 0 25 50 75 100 4B on real… view at source ↗
Figure 12
Figure 12. Figure 12: GEMINON QA accuracy by attribute type over evaluator training. We show 4B evaluators trained on the real corpus, non-private synthetic data, and DP-synthetic data at ε = 100; synthetic corpora are generated by the 4B generator. Many attributes improve together on the high-repetition split, suggesting broad acquisition of the domain schema and attribute-specific mappings. Accuracy remains much lower on the… view at source ↗
Figure 13
Figure 13. Figure 13: Per-entity GEMINON QA accuracy over evaluator training. Rows correspond to training checkpoints and columns correspond to Geminon entities. Entities are ordered by the first check￾point at which their high-repetition QA accuracy exceeds a threshold. Different entities become answerable at different stages, indicating gradual entity-specific factual acquisition. Singleton facts remain mostly unrecovered. 0… view at source ↗
Figure 14
Figure 14. Figure 14: NEWS QA accuracy over evaluator training steps, stratified by support-count bucket. Columns correspond to training data sources: DP-synthetic data at ε = 10, DP-synthetic data at ε = 100, non-private synthetic data, and the real corpus. Rows correspond to evaluator size. Higher-support buckets are learned earlier and reach higher accuracy, especially under real-corpus and non-private synthetic training. D… view at source ↗
Figure 15
Figure 15. Figure 15: Standard DP text synthesis benchmarks are saturated. We compare accuracy across four training regimes (no training, DP-synth, non-private synth, and train on real) on three satu￾rated benchmarks (Yelp Polarity, IMDB, OpenReview) and on CONTINUOUSBENCH GEMINON. On the standard benchmarks, the four regimes span only 3.0, 2.8, and 10.6 percentage points respec￾tively, leaving DP-synth and non-private synth o… view at source ↗
Figure 16
Figure 16. Figure 16: Example GEMINON index entries. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparison of Pokémon and Geminon stat distributions. [PITH_FULL_IMAGE:figures/full_fig_p043_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Example articles about Geminon Boreling. [PITH_FULL_IMAGE:figures/full_fig_p044_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Example articles about Asia Cup 2025: India vs Pakistan [PITH_FULL_IMAGE:figures/full_fig_p047_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Example QAs about Asia Cup 2025: India vs Pakistan [PITH_FULL_IMAGE:figures/full_fig_p048_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Support count distribution of the news QAs. The blue curve shows the per-article distri [PITH_FULL_IMAGE:figures/full_fig_p049_21.png] view at source ↗
read the original abstract

Differentially private (DP) text synthesis promises to unlock sensitive corpora for model training, but it remains unclear whether DP synthetic data transmits genuinely new knowledge and capabilities present only in those corpora. This is because existing evaluations rely on tasks that are nearly solvable without training, so strong benchmark performance does not establish that DP synthesis can substitute original data access. Thus, we introduce ContinuousBench, a continuously and automatically-regenerated benchmark that measures capability gain from DP synthetic text. Each quarter, a new release pairs a never-before-seen training corpus with a derived QA set, constructed to be: (1) unsolvable sans-corpus; and (2) learnable under DP, as the tested knowledge is supported by hundreds of independent records. Researchers produce DP synthetic data from the training corpus and run our standardized training and evaluation harness on their synthetic data to measure gains. We instantiate two tracks: Geminon, a procedurally-generated dataset about fictional creatures; and News, a stream of newly crawled public news articles. Although standard benchmarks are nearly saturated, on ContinuousBench we find that non-private synthesis transfers substantial knowledge from the original corpus, while state-of-the-art DP synthesis methods generally fail to do so, even at $\varepsilon=100$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ContinuousBench, a benchmark that pairs never-before-seen training corpora (Geminon procedural creatures and News articles) with derived QA sets constructed to be unsolvable without the corpus and learnable under DP. Experiments show non-private synthesis transfers substantial knowledge while state-of-the-art DP synthesis methods fail to do so even at ε=100, using a standardized training/evaluation harness.

Significance. If the QA construction reliably ensures unsolvability without the corpus and sufficient redundancy for DP learnability, the benchmark would fill a key gap in DP evaluation by moving beyond saturated tasks; the headline empirical contrast between non-private and DP synthesis would then carry substantial weight for privacy-preserving training.

major comments (2)
  1. [Abstract] Abstract (and benchmark construction section): the central claim that 'DP synthesis methods generally fail to do so' rests on QA items being unsolvable sans the specific corpus. The manuscript asserts this 'by construction' for both tracks but provides no concrete mechanism (base-model zero-shot filtering protocol, human validation procedure, or redundancy count) to establish the property; without such evidence the measured non-private gains could reflect general pretraining rather than corpus transfer, undermining the DP-failure conclusion.
  2. [Methods] Methods (QA derivation and statistical tests): the claim that 'the tested knowledge is supported by hundreds of independent records' (allowing learnability under DP noise) is load-bearing for interpreting DP failure at ε=100 as a genuine limitation rather than an artifact of insufficient signal. No explicit counting procedure, exclusion rules, or power analysis is referenced to support this redundancy threshold.
minor comments (2)
  1. [Abstract] Clarify the exact base models and prompting used for any pre-filtering of QA items to make the 'unsolvable sans-corpus' claim reproducible.
  2. Figure or table reporting per-track base-model accuracy on the QA sets (with and without corpus) would strengthen the construction claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on ContinuousBench. We address the major concerns regarding the benchmark construction below and will revise the manuscript accordingly to provide the requested details.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and benchmark construction section): the central claim that 'DP synthesis methods generally fail to do so' rests on QA items being unsolvable sans the specific corpus. The manuscript asserts this 'by construction' for both tracks but provides no concrete mechanism (base-model zero-shot filtering protocol, human validation procedure, or redundancy count) to establish the property; without such evidence the measured non-private gains could reflect general pretraining rather than corpus transfer, undermining the DP-failure conclusion.

    Authors: We agree that the current manuscript does not provide sufficient detail on the mechanisms used to ensure unsolvability by construction. In the revised version, we will expand the benchmark construction section to explicitly describe the base-model zero-shot filtering protocol, the human validation procedure employed, and the redundancy counting approach. This will substantiate that the QA items test corpus-specific knowledge rather than general pretraining capabilities. revision: yes

  2. Referee: [Methods] Methods (QA derivation and statistical tests): the claim that 'the tested knowledge is supported by hundreds of independent records' (allowing learnability under DP noise) is load-bearing for interpreting DP failure at ε=100 as a genuine limitation rather than an artifact of insufficient signal. No explicit counting procedure, exclusion rules, or power analysis is referenced to support this redundancy threshold.

    Authors: We acknowledge the need for explicit documentation of the redundancy verification. The revision will include a detailed description of the counting procedure for independent records, the exclusion rules applied to avoid overcounting, and a power analysis confirming that the signal is sufficient for learnability under the DP noise levels tested. This will support the interpretation of the DP synthesis results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical evaluations on new benchmark data.

full rationale

The paper introduces ContinuousBench with QA sets asserted to be unsolvable without the corpus and learnable under DP by construction, then reports empirical measurements showing non-private synthesis transfers knowledge while DP methods do not. These outcomes are obtained via standardized training/evaluation harness on generated data rather than any derivation that reduces to fitted parameters, self-definitions, or self-citation chains. No equations or claims equate a 'prediction' to its inputs by construction, and the central comparison rests on external experimental outcomes against the benchmark rather than internal redefinitions. The load-bearing properties of the QA sets are presented as design choices whose validity is tested by the reported performance gaps, not presupposed in a way that forces the headline result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the contribution is the benchmark construction and reported comparison.

pith-pipeline@v0.9.1-grok · 5781 in / 1027 out tokens · 27023 ms · 2026-06-28T15:35:06.063410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    Proceedings of the 41st International Conference on Machine Learning , pages =

    Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining , author =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , volume =

  2. [2]

    2024 , eprint=

    Gecko: Versatile Text Embeddings Distilled from Large Language Models , author=. 2024 , eprint=

  3. [3]

    2021 , eprint=

    MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , author=. 2021 , eprint=

  4. [4]

    NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research (SoLaR) , year=

    Training Private and Efficient Language Models with Synthetic Data from LLMs , author=. NeurIPS 2023 Workshop on Socially Responsible Language Modelling Research (SoLaR) , year=

  5. [5]

    Differentially Private Synthetic Data via Foundation Model

    Xie, Chulin and Lin, Zinan and Backurs, Arturs and Gopi, Sivakanth and Yu, Da and Inan, Huseyin A and Nori, Harsha and Jiang, Haotian and Zhang, Huishuai and Lee, Yin Tat and Li, Bo and Yekhanin, Sergey , booktitle =. Differentially Private Synthetic Data via Foundation Model. 2024 , editor =

  6. [6]

    Reuel, Anka and Hardy, Amelia and Smith, Chandler and Lamparth, Max and Hardy, Malcolm and Kochenderfer, Mykel J , journal=

  7. [7]

    Lin, Bill Yuchen and Deng, Yuntian and Chandu, Khyathi and Brahman, Faeze and Ravichander, Abhilasha and Pyatkin, Valentina and Dziri, Nouha and Bras, Ronan Le and Choi, Yejin , journal=

  8. [8]

    Li, Jeffrey and Fang, Alex and Smyrnis, Georgios and Ivgi, Maor and Jordan, Matt and Gadre, Samir and Bansal, Hritik and Guha, Etash and Keh, Sedrick and Arora, Kushal and others , journal=

  9. [9]

    Private prediction for large-scale synthetic text generation

    Amin, Kareem and Bie, Alex and Kong, Weiwei and Kurakin, Alexey and Ponomareva, Natalia and Syed, Umar and Terzis, Andreas and Vassilvitskii, Sergei. Private prediction for large-scale synthetic text generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024

  10. [10]

    Proceedings of the 42nd International Conference on Machine Learning , year =

    Synthesizing Privacy-Preserving Text Data via Finetuning without Finetuning Billion-Scale LLMs , author =. Proceedings of the 42nd International Conference on Machine Learning , year =

  11. [11]

    2025 , url=

    Wang, Shuaiqi and Raunak, Vikas and Backurs, Arturs and Reis, Victor and Zhou, Pei and Chen, Sihao and Yang, Longqi and Lin, Zinan and Yekhanin, Sergey and Fanti, Giulia , booktitle=. 2025 , url=

  12. [12]

    ACTG-ARL: Differentially Private Conditional Text Generation with RL-Boosted Control

    Yuzheng Hu and Ryan McKenna and Da Yu and Shanshan Wu and Han Zhao and Zheng Xu and Peter Kairouz , year=. doi:10.48550/arXiv.2510.18232 , url=. 2510.18232 , archivePrefix=

  13. [13]

    Natalia Ponomareva and Zheng Xu and H. Brendan McMahan and Peter Kairouz and Lucas Rosenblatt and Vincent Cohen-Addad and Cristóbal Guzmán and Ryan McKenna and Galen Andrew and Alex Bie and Da Yu and Alex Kurakin and Morteza Zadimoghaddam and Sergei Vassilvitskii and Andreas Terzis , year=. doi:10.48550/arXiv.2512.03238 , url=. 2512.03238 , archivePrefix=

  14. [14]

    2023 , eprint=

    Harnessing large-language models to generate private synthetic text , author=. 2023 , eprint=. doi:10.48550/arXiv.2306.01684 , url=

  15. [15]

    2021 , publisher =

    maca11 , title =. 2021 , publisher =

  16. [16]

    International Studies Association Annual Conference , volume=

    GDELT: Global Data on Events, Location, and Tone, 1979-2012 , author=. International Studies Association Annual Conference , volume=. 2013 , address=

  17. [17]

    2024 , publisher =

    Thiago Amancio , title =. 2024 , publisher =

  18. [18]

    2025 , url=

    Wu, Xiaobao and Pan, Liangming and Xie, Yuxi and Zhou, Ruiwen and Zhao, Shuai and Ma, Yubo and Du, Mingzhe and Mao, Rui and Luu, Anh Tuan and Wang, William Yang , booktitle=. 2025 , url=

  19. [19]

    F resh LLM s: Refreshing Large Language Models with Search Engine Augmentation

    Vu, Tu and Iyyer, Mohit and Wang, Xuezhi and Constant, Noah and Wei, Jerry and Wei, Jason and Tar, Chris and Sung, Yun-Hsuan and Zhou, Denny and Le, Quoc and Luong, Thang. F resh LLM s: Refreshing Large Language Models with Search Engine Augmentation. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.813

  20. [20]

    and Kolter, J

    Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C. and Kolter, J. Zico , booktitle=. 2024 , url=

  21. [21]

    2022 , volume =

    Liska, Adam and Kocisky, Tomas and Gribovskaya, Elena and Terzi, Tayfun and Sezener, Eren and Agrawal, Devang and De Masson D'Autume, Cyprien and Scholtes, Tim and Zaheer, Manzil and Young, Susannah and Gilsenan-Mcmahon, Ellen and Austin, Sophia and Blunsom, Phil and Lazaridou, Angeliki , booktitle =. 2022 , volume =

  22. [22]

    Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM '25) , pages=

    Evaluating Differentially Private Generation of Domain-Specific Text , author=. Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM '25) , pages=. 2025 , publisher=. doi:10.1145/3627673.3680074 , url=

  23. [23]

    Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains

    Ramesh, Krithika and Gandhi, Nupoor and Madaan, Pulkit and Bauer, Lisa and Peris, Charith and Field, Anjalie. Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.894

  24. [24]

    Brendan and Vassilvitskii, Sergei and Chien, Steve and Thakurta, Abhradeep , journal=

    Ponomareva, Natalia and Hazimeh, Hussein and Kurakin, Alex and Xu, Zheng and Denison, Carson and McMahan, H. Brendan and Vassilvitskii, Sergei and Chien, Steve and Thakurta, Abhradeep , journal=. How to. 2023 , doi=

  25. [25]

    International Conference on Learning Representations (ICLR) , year=

    Differentially Private Fine-tuning of Language Models , author=. International Conference on Learning Representations (ICLR) , year=

  26. [26]

    Proceedings of the 42nd International Conference on Machine Learning , pages =

    Scaling Laws for Differentially Private Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , volume =

  27. [27]

    MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , url =

    Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid , booktitle =. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , url =

  28. [28]

    2025 , eprint=

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

  29. [29]

    2025 , eprint=

    Gemma 3 Technical Report , author=. 2025 , eprint=

  30. [30]

    2509.20354 , archivePrefix=

    EmbeddingGemma Team , year=. 2509.20354 , archivePrefix=

  31. [31]

    Barbaresi, Adrien. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

  32. [32]

    Ou-Yang, Lucas , journal=

  33. [33]

    Chen, Kai and Li, Xiaochen and Gong, Chen and Mckenna, Ryan and Wang, Tianhao , title =. Proc. ACM Manag. Data , month = dec, articleno =. 2025 , issue_date =. doi:10.1145/3769764 , abstract =

  34. [34]

    2503.14681 , archivePrefix=

    Chen Gong and Kecen Li and Zinan Lin and Tianhao Wang , year=. 2503.14681 , archivePrefix=

  35. [35]

    Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006

    Calibrating noise to sensitivity in private data analysis , author=. Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3 , pages=. 2006 , organization=

  36. [36]

    Advances in Cryptology-EUROCRYPT 2006: 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St

    Our data, ourselves: Privacy via distributed noise generation , author=. Advances in Cryptology-EUROCRYPT 2006: 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25 , pages=. 2006 , organization=

  37. [37]

    McKenna, Ryan and Andrew, Galen and Balle, Borja and Doroshenko, Vadym and Ganesh, Arun and Kong, Weiwei and Kurakin, Alex and McMahan, Brendan and Pravilov, Mikhail , journal=

  38. [38]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

    Instruction pre-training: Language models are supervised multitask learners , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

  39. [39]

    arXiv preprint arXiv:2504.05571 , year=

    Knowledge-instruct: Effective continual pre-training from limited data using instructions , author=. arXiv preprint arXiv:2504.05571 , year=

  40. [40]

    and Daly, Raymond E

    Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher , title =. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , month =. 2011 , address =

  41. [41]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Measuring Massive Multitask Language Understanding , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  42. [42]

    2024 , howpublished =

    Yelp Open Dataset , author =. 2024 , howpublished =

  43. [43]

    2017 , eprint=

    TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , author=. 2017 , eprint=

  44. [44]

    and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav

    Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav. Natura...

  45. [45]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Aligning AI With Shared Human Values , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  46. [46]

    arXiv preprint arXiv:2503.05919 , year=

    From style to facts: Mapping the boundaries of knowledge injection with finetuning , author=. arXiv preprint arXiv:2503.05919 , year=

  47. [47]

    arXiv preprint arXiv:2409.07431 , year=

    Synthetic continued pretraining , author=. arXiv preprint arXiv:2409.07431 , year=

  48. [48]

    arXiv preprint arXiv:2309.14316 , year=

    Physics of language models: Part 3.1, knowledge storage and extraction , author=. arXiv preprint arXiv:2309.14316 , year=

  49. [49]

    arXiv:1509.01626 [cs] , author =

    Character-Level. arXiv:1509.01626 [cs] , author =. arXiv , eprinttype =:1509.01626 , primaryClass =

  50. [50]

    Wu, Yuping and Schlegel, Viktor and Del-Pinto, Warren and Nandakumar, Srinivasan and Zahid, Iqra and Sun, Yidan and Omar, Usama Farghaly and Jasmine, Amirah and Kaliya-Perumal, Arun-Kumar and Tham, Chun Shen and others , journal=

  51. [51]

    doi: 10.18653/v1/2023.acl-long.74

    Yue, Xiang and Inan, Huseyin and Li, Xuechen and Kumar, Girish and McAnallen, Julia and Shajari, Hoda and Sun, Huan and Levitan, David and Sim, Robert. Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. do...

  52. [52]

    Brendan and Mironov, Ilya and Talwar, Kunal and Zhang, Li , title =

    Abadi, Martin and Chu, Andy and Goodfellow, Ian and McMahan, H. Brendan and Mironov, Ilya and Talwar, Kunal and Zhang, Li , title =. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security , year =. doi:10.1145/2976749.2978318 , pages =

  53. [53]

    Sun, Yidan and Schlegel, Viktor and Nandakumar, Srinivasan and Zahid, Iqra and Wu, Yuping and Wu, Yulong and Li, Hao and Zhang, Jie and Del-Pinto, Warren and Nenadic, Goran and others , journal=

  54. [54]

    Contrastive Private Data Synthesis via Weighted Multi-

    Tianyuan Zou and Yang Liu and Peng Li and Yufei Xiong and Jianqing Zhang and Jingjing Liu and Xiaozhou Ye and Ye Ouyang and Ya-Qin Zhang , booktitle=. Contrastive Private Data Synthesis via Weighted Multi-. 2025 , url=

  55. [55]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Private Evolution Converges , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  56. [56]

    2025 , url=

    Jianqing Zhang and Yang Liu and JIE FU and Yang Hua and Tianyuan Zou and Jian Cao and Qiang Yang , booktitle=. 2025 , url=

  57. [57]

    The Fourteenth International Conference on Learning Representations , year=

    Secret-Protected Evolution for Differentially Private Synthetic Text Generation , author=. The Fourteenth International Conference on Learning Representations , year=

  58. [58]

    Banayeeanzade, Amin and Yang, Qingchuan and Fu, Deqing and Hong, Spencer and Babinsky, Erin and Samuel, Alfy and Kumar, Anoop and Jia, Robin and Karimireddy, Sai Praneeth , journal=

  59. [59]

    2024 , booktitle =

    Hou, Charlie and Shrivastava, Akshat and Zhan, Hongyuan and Conway, Rylan and Le, Trang and Sagar, Adithya and Fanti, Giulia and Lazar, Daniel , title =. 2024 , booktitle =

  60. [60]

    Differentially Private Language Models for Secure Data Sharing

    Mattern, Justus and Jin, Zhijing and Weggenmann, Benjamin and Sch. Differentially Private Language Models for Secure Data Sharing. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.323

  61. [61]

    The Fourth AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI) , primaryClass=

    Can Foundation Models Help Us Achieve Perfect Secrecy? , author=. The Fourth AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI) , primaryClass=. 2022 , url=

  62. [62]

    Scalable

    Lynn Chua and Badih Ghazi and Pritish Kamath and Ravi Kumar and Pasin Manurangsi and Amer Sinha and Chiyuan Zhang , booktitle=. Scalable. 2024 , url=

  63. [63]

    arXiv preprint arXiv:2508.15089 , year=

    Tighter privacy analysis for truncated Poisson sampling , author=. arXiv preprint arXiv:2508.15089 , year=

  64. [64]

    arXiv preprint arXiv:2204.13650 , year=

    Unlocking high-accuracy differentially private image classification through scale , author=. arXiv preprint arXiv:2204.13650 , year=

  65. [65]

    2022 ,booktitle =

    Large-Scale Differentially Private BERT ,author =. 2022 ,booktitle =

  66. [66]

    International Conference on Learning Representations , year=

    Large Language Models Can Be Strong Differentially Private Learners , author=. International Conference on Learning Representations , year=

  67. [67]

    2023 , volume =

    Sander, Tom and Stock, Pierre and Sablayrolles, Alexandre , booktitle =. 2023 , volume =