pith. machine review for the scientific record. sign in

arxiv: 2310.16789 · v3 · pith:AVYX74M7new · submitted 2023-10-25 · 💻 cs.CL · cs.CR· cs.LG

Detecting Pretraining Data from Large Language Models

Pith reviewed 2026-05-17 18:01 UTC · model grok-4.3

classification 💻 cs.CL cs.CRcs.LG
keywords pretraining data detectionmembership inferencelarge language modelsMin-K% ProbWIKIMIA benchmarkcopyright detectionmachine unlearningdata contamination
0
0 comments X

The pith

Min-K% Prob detects if text was in an LLM's pretraining data by averaging the lowest-probability tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the problem of identifying whether a given text appeared in an LLM's training data when only black-box access to the model is possible. It creates the WIKIMIA benchmark that splits data into pre- and post-training periods to provide reliable ground truth. The proposed Min-K% Prob method rests on the observation that unseen text tends to contain a small number of tokens the model assigns unusually low probability. This approach requires no reference model or knowledge of the training corpus and improves detection accuracy by 7.4 percent on the benchmark. The method is then shown to work in practical settings such as spotting copyrighted books and checking whether examples contaminated downstream tasks.

Core claim

Min-K% Prob works by selecting the K percent of tokens in an input that receive the smallest log probabilities under the target LLM and averaging those values; lower scores indicate the text is more likely to have been seen during pretraining. The paper shows this simple statistic outperforms prior reference-model methods on the WIKIMIA benchmark and remains effective when applied to copyrighted-book detection, downstream contamination checks, and verification of machine-unlearning success.

What carries the argument

Min-K% Prob, a membership score computed from the average log probability of the K lowest-probability tokens in the input sequence under the black-box LLM.

If this is right

  • Copyright holders can scan published books against released LLMs to find unauthorized use.
  • Developers can check whether benchmark examples leaked into pretraining and inflated reported results.
  • Auditors can test whether machine-unlearning procedures actually removed specific private examples.
  • Detection works without any auxiliary training or access to the original pretraining corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the method scales to frontier models, it could support regulatory requirements that companies disclose or remove specific data sources.
  • The same low-probability outlier idea might extend to detecting training data in image or audio models.
  • Repeated application across many queries could allow approximate reconstruction of which domains dominated an LLM's training mix.

Load-bearing premise

Unseen text is likely to contain a few outlier words that the model assigns very low probability, while text seen in training is less likely to have such outliers.

What would settle it

Train an LLM on a fully known corpus, hold out a test set of unseen text, and measure whether Min-K% Prob assigns reliably lower scores to the held-out texts than to the training texts.

read the original abstract

Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to three real-world scenarios, copyrighted book detection, contaminated downstream example detection and privacy auditing of machine unlearning, and find it a consistently effective solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Min-K% Prob, a black-box method to detect whether a given text sequence was included in an LLM's pretraining data. The approach rests on the hypothesis that unseen texts are more likely to contain a small number of outlier tokens with unusually low model probabilities. To evaluate this, the authors propose the WIKIMIA benchmark, which labels Wikipedia articles published before a model's training cutoff as positive (seen) examples and post-cutoff articles as negative (unseen) examples. Experiments report a 7.4% absolute improvement over prior reference-model baselines on WIKIMIA, and the method is applied to three practical tasks: copyrighted-book detection, downstream-data contamination checks, and auditing machine-unlearning privacy guarantees.

Significance. If the core hypothesis holds and the benchmark isolates membership rather than temporal shift, the work would supply a simple, training-free auditing tool useful for copyright, privacy, and contamination analyses. The dynamic nature of WIKIMIA is a practical contribution that can be updated as new models are released. The absence of any need for a reference model trained on similar data is a clear methodological advantage over earlier approaches.

major comments (3)
  1. [§3] §3 (WIKIMIA construction): The benchmark labels post-cutoff Wikipedia articles as negative examples solely on the basis of publication date. This risks conflating pretraining membership with temporal distribution shift in topics, vocabulary, or writing style; the reported 7.4% gain could therefore reflect the model's general difficulty with newer text rather than absence from the training corpus. Additional controls (e.g., topic-matched pre/post pairs or perplexity-matched baselines) are needed to establish that the signal is membership-specific.
  2. [Abstract and §4] Abstract and §4 (core hypothesis): The claim that 'an unseen example is likely to contain a few outlier words with low probabilities' is presented without direct ablation or statistical test isolating the contribution of the lowest-K% tokens versus overall perplexity or length. Because this hypothesis is load-bearing for both the method and the benchmark results, an explicit verification (e.g., comparing Min-K% against full-sequence perplexity or random-K% baselines) should be added.
  3. [Results] Results section (performance numbers): The 7.4% improvement on WIKIMIA is stated without reported standard deviations, number of runs, or statistical significance tests against the baselines. Given that the central empirical claim rests on this margin, confidence intervals or p-values must be supplied to support the superiority statement.
minor comments (2)
  1. [Methods] Provide an explicit mathematical definition or pseudocode for Min-K% Prob (including how ties and tokenization edge cases are handled) in the methods section.
  2. [Figures] Figure legends and axis labels in the experimental plots should be expanded to make the comparison between Min-K% and prior methods immediately readable without reference to the caption.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements where feasible.

read point-by-point responses
  1. Referee: [§3] §3 (WIKIMIA construction): The benchmark labels post-cutoff Wikipedia articles as negative examples solely on the basis of publication date. This risks conflating pretraining membership with temporal distribution shift in topics, vocabulary, or writing style; the reported 7.4% gain could therefore reflect the model's general difficulty with newer text rather than absence from the training corpus. Additional controls (e.g., topic-matched pre/post pairs or perplexity-matched baselines) are needed to establish that the signal is membership-specific.

    Authors: We acknowledge that temporal distribution shift is a valid concern and could partially explain performance differences. To strengthen the benchmark, we will add controls using topic-matched pre- and post-cutoff Wikipedia article pairs in a revised §3. We will also report results against perplexity-matched baselines to help isolate membership effects from general difficulty with newer text. These additions will clarify the extent to which the signal is membership-specific while preserving the dynamic and practical nature of WIKIMIA. revision: partial

  2. Referee: [Abstract and §4] Abstract and §4 (core hypothesis): The claim that 'an unseen example is likely to contain a few outlier words with low probabilities' is presented without direct ablation or statistical test isolating the contribution of the lowest-K% tokens versus overall perplexity or length. Because this hypothesis is load-bearing for both the method and the benchmark results, an explicit verification (e.g., comparing Min-K% against full-sequence perplexity or random-K% baselines) should be added.

    Authors: We agree that an explicit ablation would provide stronger support for the core hypothesis. In the revised manuscript we will add a dedicated ablation subsection in §4 that directly compares Min-K% Prob against full-sequence perplexity and random-K% token selection baselines, including statistical tests to quantify the contribution of focusing on the lowest-probability tokens. revision: yes

  3. Referee: [Results] Results section (performance numbers): The 7.4% improvement on WIKIMIA is stated without reported standard deviations, number of runs, or statistical significance tests against the baselines. Given that the central empirical claim rests on this margin, confidence intervals or p-values must be supplied to support the superiority statement.

    Authors: We thank the referee for highlighting this omission. We will revise the results section to report standard deviations across multiple runs, explicitly state the number of runs performed, and include statistical significance tests (p-values) comparing Min-K% Prob to the baselines to substantiate the reported improvement. revision: yes

Circularity Check

0 steps flagged

Min-K% Prob is a direct heuristic on target LLM probabilities, validated on external temporal benchmark

full rationale

The paper defines Min-K% Prob explicitly as the mean of the bottom k% token log-probabilities produced by the queried LLM itself, under the hypothesis that unseen text contains more low-probability outliers. This quantity is computed once per example and compared against a threshold; no parameter is fitted to the target detection labels, and no prior result from the same authors is invoked to force uniqueness or to smuggle an ansatz. The WIKIMIA benchmark supplies gold labels via an independent temporal cutoff that does not reference the Min-K% statistic, so the reported improvement is an empirical measurement rather than a definitional identity. No step in the derivation reduces the claimed detector to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about probability outliers and introduces a tunable percentage parameter whose exact value is not fixed by theory.

free parameters (1)
  • K in Min-K%
    The percentage threshold for selecting the lowest-probability tokens is a tunable hyperparameter whose specific value affects detection performance but is not derived from first principles.
axioms (1)
  • domain assumption An unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities.
    This hypothesis is explicitly stated as the basis for the Min-K% Prob method in the abstract.

pith-pipeline@v0.9.0 · 5616 in / 1372 out tokens · 64853 ms · 2026-05-17T18:01:57.778273+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Privacy Auditing with Zero (0) Training Run

    cs.CR 2026-05 unverdicted novelty 8.0

    Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

  2. Pretraining Exposure Explains Popularity Judgments in Large Language Models

    cs.CL 2026-05 unverdicted novelty 8.0

    LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.

  3. Learning the Signature of Memorization in Autoregressive Language Models

    cs.CL 2026-04 accept novelty 8.0

    A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

  4. Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

    cs.LG 2024-04 conditional novelty 8.0

    NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.

  5. DistractMIA: Black-Box Membership Inference on Vision-Language Models via Semantic Distraction

    cs.CV 2026-05 unverdicted novelty 7.0

    DistractMIA performs output-only black-box membership inference on vision-language models by inserting semantic distractors and measuring shifts in generated text responses.

  6. Dataset Watermarking for Closed LLMs with Provable Detection

    cs.LG 2026-05 unverdicted novelty 7.0

    A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tunin...

  7. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  8. How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles

    cs.AI 2026-04 unverdicted novelty 7.0

    A new auditing framework reveals widespread behavioral entanglement among LLMs and shows that reweighting ensembles based on measured independence improves verification accuracy by up to 4.5%.

  9. Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

    cs.CL 2026-05 unverdicted novelty 6.0

    TokenUnlearn identifies critical tokens via masking and entropy signals then applies hard selection or soft weighting to unlearn only those tokens, yielding better forgetting and retained utility than sequence-level b...

  10. Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

    cs.CR 2026-04 unverdicted novelty 6.0

    A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.

  11. SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks

    cs.CL 2026-04 unverdicted novelty 6.0

    SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.

  12. Representation-Guided Parameter-Efficient LLM Unlearning

    cs.CL 2026-04 unverdicted novelty 6.0

    REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

  13. Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders

    cs.IR 2026-04 unverdicted novelty 6.0

    KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.

  14. Auditing Data Membership in Reinforcement Learning With Verifiable Rewards

    cs.CR 2025-11 unverdicted novelty 6.0

    DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.

  15. Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

    cs.CL 2025-01 conditional novelty 6.0

    Constitutional Classifiers trained on synthetic data from natural language constitutions defend LLMs against universal jailbreaks, with no successful bypass found in over 3000 hours of red teaming and only minor deplo...

  16. DataComp-LM: In search of the next generation of training sets for language models

    cs.LG 2024-06 unverdicted novelty 6.0

    DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

  17. LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    cs.SE 2024-03 unverdicted novelty 6.0

    LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

  18. TOFU: A Task of Fictitious Unlearning for LLMs

    cs.LG 2024-01 conditional novelty 6.0

    TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.

  19. Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

    cs.LG 2026-05 unverdicted novelty 5.0

    Stable-GFlowNet improves training stability and attack diversity in LLM red-teaming by eliminating Z estimation via contrastive trajectory balance while preserving GFN optimality.

Reference graph

Works this paper leans on

130 extracted references · 130 canonical work pages · cited by 19 Pith papers · 7 internal anchors

  1. [1]

    Stability of stochastic gradient descent on nonsmooth convex losses

    Raef Bassily, Vitaly Feldman, Crist \'o bal Guzm \'a n, and Kunal Talwar. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, 33: 0 4381--4391, 2020

  2. [2]

    Pythia: A suite for analyzing large language models across training and scaling, 2023

    Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023

  3. [3]

    GPT-NeoX-20B: An Open-Source Autoregressive Language Model

    Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B : An open-source autoregressive language model. In Proceedings of the ACL Workshop on C...

  4. [4]

    Machine unlearning

    Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp.\ 141--159. IEEE, 2021

  5. [5]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...

  6. [6]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020 b

  7. [7]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.\ 2633--2650, 2021

  8. [8]

    Membership inference attacks from first principles

    Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp.\ 1897--1914. IEEE, 2022

  9. [11]

    Boolq: Exploring the surprising difficulty of natural yes/no questions

    Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL, 2019

  10. [13]

    Glam: Efficient scaling of language models with mixture-of-experts

    Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pp.\ 5547--5569. PMLR, 2022

  11. [14]

    Who’s harry potter? approximate unlearning in llms.arXiv preprint arXiv:2310.02238,

    Ronen Eldan and Mark Russinovich. Who's Harry Potter ? approximate unlearning in LLMs . arXiv preprint arXiv:2310.02238, 2023

  12. [15]

    Does learning require memorization? a short tale about a long tail

    Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp.\ 954--959, 2020

  13. [17]

    SimCSE : Simple contrastive learning of sentence embeddings

    Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE : Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2021

  14. [18]

    Making ai forget you: Data deletion in machine learning

    Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019

  15. [20]

    Recovering private text in federated learning of language models

    Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. Recovering private text in federated learning of language models. Advances in Neural Information Processing Systems, 35: 0 8130--8143, 2022

  16. [21]

    Adaptive machine unlearning

    Varun Gupta, Christopher Jung, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Chris Waites. Adaptive machine unlearning. Advances in Neural Information Processing Systems, 34: 0 16319--16330, 2021

  17. [22]

    Train faster, generalize better: Stability of stochastic gradient descent

    Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pp.\ 1225--1234. PMLR, 2016

  18. [23]

    A dataset auditing method for collaboratively trained machine learning models

    Yangsibo Huang, Chun-Yin Huang, Xiaoxiao Li, and Kai Li. A dataset auditing method for collaboratively trained machine learning models. IEEE Transactions on Medical Imaging, 2022

  19. [24]

    Approximate data deletion from machine learning models

    Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, and James Zou. Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics, pp.\ 2008--2016. PMLR, 2021

  20. [26]

    Auditing differentially private machine learning: How private is private sgd? Advances in Neural Information Processing Systems, 33: 0 22205--22216, 2020

    Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd? Advances in Neural Information Processing Systems, 33: 0 22205--22216, 2020

  21. [27]

    Evaluating differentially private machine learning in practice

    Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pp.\ 1895--1912, 2019

  22. [28]

    Deduplicating training data mitigates privacy risks in language models

    Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning, pp.\ 10697--10707. PMLR, 2022

  23. [29]

    California consumer privacy act, 2018

    California State Legislature. California consumer privacy act, 2018. URL https://oag.ca.gov/privacy/ccpa

  24. [30]

    Stolen memories: Leveraging model memorization for calibrated \ White-Box \ membership inference

    Klas Leino and Matt Fredrikson. Stolen memories: Leveraging model memorization for calibrated \ White-Box \ membership inference. In 29th USENIX security symposium (USENIX Security 20), pp.\ 1605--1622, 2020

  25. [31]

    Rouge: A package for automatic evaluation of summaries

    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.\ 74--81, 2004

  26. [32]

    Truthfulqa: Measuring how models mimic human falsehoods, 2021

    Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods, 2021

  27. [35]

    Maas, Raymond E

    Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.\ 142--150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. UR...

  28. [36]

    Data contamination: From memorization to exploitation

    Inbal Magar and Roy Schwartz. Data contamination: From memorization to exploitation. ArXiv, abs/2203.08242, 2022. URL https://api.semanticscholar.org/CorpusID:247475929

  29. [38]

    Data portraits: Recording foundation model training data, 2023

    Marc Marone and Benjamin Van Durme . Data portraits: Recording foundation model training data, 2023. URL https://arxiv.org/abs/2303.03919

  30. [43]

    Manning, and Chelsea Finn

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature, 2023. URL https://arxiv.org/abs/2301.11305

  31. [44]

    Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities, 2023

  32. [45]

    Gpt-4 and professional benchmarks: the wrong answer to the wrong question, 2023

    Arvind Narayanan. Gpt-4 and professional benchmarks: the wrong answer to the wrong question, 2023. URL https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmarks

  33. [46]

    Adversary instantiation: Lower bounds for differentially private machine learning

    Milad Nasr, Shuang Songi, Abhradeep Thakurta, Nicolas Papernot, and Nicholas Carlin. Adversary instantiation: Lower bounds for differentially private machine learning. In 2021 IEEE Symposium on security and privacy (SP), pp.\ 866--882. IEEE, 2021

  34. [48]

    Gpt-4 technical report, 2023

    OpenAI. Gpt-4 technical report, 2023

  35. [49]

    Did chatgpt cheat on your test?, 2023

    Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, and Eneko Agirre. Did chatgpt cheat on your test?, 2023. URL https://hitz-zentroa.github.io/lm-contamination/blog/

  36. [50]

    Remember what you want to forget: Algorithms for machine unlearning

    Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34: 0 18075--18086, 2021

  37. [51]

    Membership inference attacks against NLP classification models

    Virat Shejwalkar, Huseyin A Inan, Amir Houmansadr, and Robert Sim. Membership inference attacks against NLP classification models. In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021. URL https://openreview.net/forum?id=74lwg5oxheC

  38. [52]

    Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov

    R. Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp.\ 3--18, 2016

  39. [53]

    Membership inference attacks against machine learning models

    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp.\ 3--18. IEEE, 2017

  40. [54]

    Auditing data provenance in text-generation models

    Congzheng Song and Vitaly Shmatikov. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.\ 196--206, 2019

  41. [57]

    Redpajama: An open source recipe to reproduce llama training dataset, 2023

    TogetherCompute. Redpajama: An open source recipe to reproduce llama training dataset, 2023. URL https://github.com/togethercomputer/RedPajama-Data

  42. [59]

    Llama 2: Open foundation and fine-tuned chat models, 2023 b

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...

  43. [60]

    The eu general data protection regulation (gdpr)

    Paul Voigt and Axel Von dem Bussche. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10 0 (3152676): 0 10--5555, 2017

  44. [61]

    On the importance of difficulty calibration in membership inference attacks

    Lauren Watson, Chuan Guo, Graham Cormode, and Alexandre Sablayrolles. On the importance of difficulty calibration in membership inference attacks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=3eIrli0TwQ

  45. [62]

    Dai, and Quoc V Le

    Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=gEZrGCozdqR

  46. [63]

    according to

    Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. "according to ..." prompting language models improves quoting from pre-training data, 2023

  47. [64]

    Deltagrad: Rapid retraining of machine learning models

    Yinjun Wu, Edgar Dobriban, and Susan Davidson. Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning, pp.\ 10355--10366. PMLR, 2020

  48. [65]

    Learning with recoverable forgetting

    Jingwen Ye, Yifang Fu, Jie Song, Xingyi Yang, Songhua Liu, Xin Jin, Mingli Song, and Xinchao Wang. Learning with recoverable forgetting. In European Conference on Computer Vision, pp.\ 87--103. Springer, 2022

  49. [66]

    Privacy risk in machine learning: Analyzing the connection to overfitting

    Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp.\ 268--282, 2018 a . doi:10.1109/CSF.2018.00027

  50. [67]

    Privacy risk in machine learning: Analyzing the connection to overfitting

    Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp.\ 268--282. IEEE, 2018 b

  51. [68]

    u hle, Andrew Paverd, Olga Ohrimenko, Boris K \

    Santiago Zanella-B \'e guelin, Lukas Wutschitz, Shruti Tople, Victor R \"u hle, Andrew Paverd, Olga Ohrimenko, Boris K \"o pf, and Marc Brockschmidt. Analyzing information leakage of updates to natural language models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp.\ 363--375, 2020

  52. [71]

    International conference on machine learning , pages=

    Train faster, generalize better: Stability of stochastic gradient descent , author=. International conference on machine learning , pages=. 2016 , organization=

  53. [72]

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    The pile: An 800gb dataset of diverse text for language modeling , author=. arXiv preprint arXiv:2101.00027 , year=

  54. [73]

    Advances in Neural Information Processing Systems , volume=

    Stability of stochastic gradient descent on nonsmooth convex losses , author=. Advances in Neural Information Processing Systems , volume=

  55. [74]

    Language Models are Few-Shot Learners , url =

    Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

  56. [75]

    2017 IEEE Symposium on Security and Privacy (SP) , pages=

    Membership inference attacks against machine learning models , author=. 2017 IEEE Symposium on Security and Privacy (SP) , pages=

  57. [76]

    ArXiv , year=

    Data Contamination: From Memorization to Exploitation , author=. ArXiv , year=

  58. [77]

    Arvind Narayanan , title =

  59. [78]

    2023 , eprint=

    Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities , author=. 2023 , eprint=

  60. [79]

    arXiv preprint arXiv:2305.00118 , year=

    Speak, memory: An archaeology of books known to chatgpt/gpt-4 , author=. arXiv preprint arXiv:2305.00118 , year=

  61. [80]

    2023 , eprint=

    GPT-4 Technical Report , author=. 2023 , eprint=

  62. [81]

    OPT: Open Pre-trained Transformer Language Models

    Opt: Open pre-trained transformer language models , author=. arXiv preprint arXiv:2205.01068 , year=

  63. [82]

    Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , year=

    Yeom, Samuel and Giacomelli, Irene and Fredrikson, Matt and Jha, Somesh , booktitle=. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , year=

  64. [83]

    Advances in neural information processing systems , volume=

    A neural probabilistic language model , author=. Advances in neural information processing systems , volume=

  65. [84]

    2023 , eprint=

    Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author=. 2023 , eprint=

  66. [85]

    Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel , booktitle=

  67. [86]

    2023 , eprint=

    Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

  68. [87]

    Understanding Membership Inferences on Well-Generalized Learning Models

    Understanding membership inferences on well-generalized learning models , author=. arXiv preprint arXiv:1802.04889 , year=

  69. [88]

    International Conference on Learning Representations , year=

    On the Importance of Difficulty Calibration in Membership Inference Attacks , author=. International Conference on Learning Representations , year=

  70. [89]

    arXiv preprint arXiv:2308.04430 , year=

    SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore , author=. arXiv preprint arXiv:2308.04430 , year=

  71. [90]

    2022 IEEE Symposium on Security and Privacy (SP) , pages=

    Membership inference attacks from first principles , author=. 2022 IEEE Symposium on Security and Privacy (SP) , pages=. 2022 , organization=

  72. [91]

    Los Angeles Times , year =

    Jonan Valdez , title =. Los Angeles Times , year =

  73. [92]

    Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

    Mireshghallah, Fatemehsadat and Goyal, Kartik and Uniyal, Archit and Berg-Kirkpatrick, Taylor and Shokri, Reza. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.570

  74. [93]

    and Finn, Chelsea , title =

    Mitchell, Eric and Lee, Yoonho and Khazatsky, Alexander and Manning, Christopher D. and Finn, Chelsea , title =

  75. [94]

    Membership Inference Attacks against Language Models via Neighbourhood Comparison

    Mattern, Justus and Mireshghallah, Fatemehsadat and Jin, Zhijing and Schoelkopf, Bernhard and Sachan, Mrinmaya and Berg-Kirkpatrick, Taylor. Membership Inference Attacks against Language Models via Neighbourhood Comparison. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.719

  76. [95]

    LLaMA: Open and Efficient Foundation Language Models

    Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

  77. [96]

    30th USENIX Security Symposium (USENIX Security 21) , pages=

    Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=

  78. [97]

    Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harrison Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and ...

  79. [98]

    2017 IEEE symposium on security and privacy (SP) , pages=

    Membership inference attacks against machine learning models , author=. 2017 IEEE symposium on security and privacy (SP) , pages=. 2017 , organization=

  80. [99]

    arXiv preprint arXiv:2106.11384 , year=

    Membership inference on word embedding and beyond , author=. arXiv preprint arXiv:2106.11384 , year=

Showing first 80 references.