Detecting Pretraining Data from Large Language Models
Pith reviewed 2026-05-17 18:01 UTC · model grok-4.3
The pith
Min-K% Prob detects if text was in an LLM's pretraining data by averaging the lowest-probability tokens.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Min-K% Prob works by selecting the K percent of tokens in an input that receive the smallest log probabilities under the target LLM and averaging those values; lower scores indicate the text is more likely to have been seen during pretraining. The paper shows this simple statistic outperforms prior reference-model methods on the WIKIMIA benchmark and remains effective when applied to copyrighted-book detection, downstream contamination checks, and verification of machine-unlearning success.
What carries the argument
Min-K% Prob, a membership score computed from the average log probability of the K lowest-probability tokens in the input sequence under the black-box LLM.
If this is right
- Copyright holders can scan published books against released LLMs to find unauthorized use.
- Developers can check whether benchmark examples leaked into pretraining and inflated reported results.
- Auditors can test whether machine-unlearning procedures actually removed specific private examples.
- Detection works without any auxiliary training or access to the original pretraining corpus.
Where Pith is reading between the lines
- If the method scales to frontier models, it could support regulatory requirements that companies disclose or remove specific data sources.
- The same low-probability outlier idea might extend to detecting training data in image or audio models.
- Repeated application across many queries could allow approximate reconstruction of which domains dominated an LLM's training mix.
Load-bearing premise
Unseen text is likely to contain a few outlier words that the model assigns very low probability, while text seen in training is less likely to have such outliers.
What would settle it
Train an LLM on a fully known corpus, hold out a test set of unseen text, and measure whether Min-K% Prob assigns reliably lower scores to the held-out texts than to the training texts.
read the original abstract
Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to three real-world scenarios, copyrighted book detection, contaminated downstream example detection and privacy auditing of machine unlearning, and find it a consistently effective solution.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Min-K% Prob, a black-box method to detect whether a given text sequence was included in an LLM's pretraining data. The approach rests on the hypothesis that unseen texts are more likely to contain a small number of outlier tokens with unusually low model probabilities. To evaluate this, the authors propose the WIKIMIA benchmark, which labels Wikipedia articles published before a model's training cutoff as positive (seen) examples and post-cutoff articles as negative (unseen) examples. Experiments report a 7.4% absolute improvement over prior reference-model baselines on WIKIMIA, and the method is applied to three practical tasks: copyrighted-book detection, downstream-data contamination checks, and auditing machine-unlearning privacy guarantees.
Significance. If the core hypothesis holds and the benchmark isolates membership rather than temporal shift, the work would supply a simple, training-free auditing tool useful for copyright, privacy, and contamination analyses. The dynamic nature of WIKIMIA is a practical contribution that can be updated as new models are released. The absence of any need for a reference model trained on similar data is a clear methodological advantage over earlier approaches.
major comments (3)
- [§3] §3 (WIKIMIA construction): The benchmark labels post-cutoff Wikipedia articles as negative examples solely on the basis of publication date. This risks conflating pretraining membership with temporal distribution shift in topics, vocabulary, or writing style; the reported 7.4% gain could therefore reflect the model's general difficulty with newer text rather than absence from the training corpus. Additional controls (e.g., topic-matched pre/post pairs or perplexity-matched baselines) are needed to establish that the signal is membership-specific.
- [Abstract and §4] Abstract and §4 (core hypothesis): The claim that 'an unseen example is likely to contain a few outlier words with low probabilities' is presented without direct ablation or statistical test isolating the contribution of the lowest-K% tokens versus overall perplexity or length. Because this hypothesis is load-bearing for both the method and the benchmark results, an explicit verification (e.g., comparing Min-K% against full-sequence perplexity or random-K% baselines) should be added.
- [Results] Results section (performance numbers): The 7.4% improvement on WIKIMIA is stated without reported standard deviations, number of runs, or statistical significance tests against the baselines. Given that the central empirical claim rests on this margin, confidence intervals or p-values must be supplied to support the superiority statement.
minor comments (2)
- [Methods] Provide an explicit mathematical definition or pseudocode for Min-K% Prob (including how ties and tokenization edge cases are handled) in the methods section.
- [Figures] Figure legends and axis labels in the experimental plots should be expanded to make the comparison between Min-K% and prior methods immediately readable without reference to the caption.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements where feasible.
read point-by-point responses
-
Referee: [§3] §3 (WIKIMIA construction): The benchmark labels post-cutoff Wikipedia articles as negative examples solely on the basis of publication date. This risks conflating pretraining membership with temporal distribution shift in topics, vocabulary, or writing style; the reported 7.4% gain could therefore reflect the model's general difficulty with newer text rather than absence from the training corpus. Additional controls (e.g., topic-matched pre/post pairs or perplexity-matched baselines) are needed to establish that the signal is membership-specific.
Authors: We acknowledge that temporal distribution shift is a valid concern and could partially explain performance differences. To strengthen the benchmark, we will add controls using topic-matched pre- and post-cutoff Wikipedia article pairs in a revised §3. We will also report results against perplexity-matched baselines to help isolate membership effects from general difficulty with newer text. These additions will clarify the extent to which the signal is membership-specific while preserving the dynamic and practical nature of WIKIMIA. revision: partial
-
Referee: [Abstract and §4] Abstract and §4 (core hypothesis): The claim that 'an unseen example is likely to contain a few outlier words with low probabilities' is presented without direct ablation or statistical test isolating the contribution of the lowest-K% tokens versus overall perplexity or length. Because this hypothesis is load-bearing for both the method and the benchmark results, an explicit verification (e.g., comparing Min-K% against full-sequence perplexity or random-K% baselines) should be added.
Authors: We agree that an explicit ablation would provide stronger support for the core hypothesis. In the revised manuscript we will add a dedicated ablation subsection in §4 that directly compares Min-K% Prob against full-sequence perplexity and random-K% token selection baselines, including statistical tests to quantify the contribution of focusing on the lowest-probability tokens. revision: yes
-
Referee: [Results] Results section (performance numbers): The 7.4% improvement on WIKIMIA is stated without reported standard deviations, number of runs, or statistical significance tests against the baselines. Given that the central empirical claim rests on this margin, confidence intervals or p-values must be supplied to support the superiority statement.
Authors: We thank the referee for highlighting this omission. We will revise the results section to report standard deviations across multiple runs, explicitly state the number of runs performed, and include statistical significance tests (p-values) comparing Min-K% Prob to the baselines to substantiate the reported improvement. revision: yes
Circularity Check
Min-K% Prob is a direct heuristic on target LLM probabilities, validated on external temporal benchmark
full rationale
The paper defines Min-K% Prob explicitly as the mean of the bottom k% token log-probabilities produced by the queried LLM itself, under the hypothesis that unseen text contains more low-probability outliers. This quantity is computed once per example and compared against a threshold; no parameter is fitted to the target detection labels, and no prior result from the same authors is invoked to force uniqueness or to smuggle an ansatz. The WIKIMIA benchmark supplies gold labels via an independent temporal cutoff that does not reference the Min-K% statistic, so the reported improvement is an empirical measurement rather than a definitional identity. No step in the derivation reduces the claimed detector to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- K in Min-K%
axioms (1)
- domain assumption An unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities.
Forward citations
Cited by 19 Pith papers
-
Privacy Auditing with Zero (0) Training Run
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
-
Pretraining Exposure Explains Popularity Judgments in Large Language Models
LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
-
Learning the Signature of Memorization in Autoregressive Language Models
A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.
-
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
-
DistractMIA: Black-Box Membership Inference on Vision-Language Models via Semantic Distraction
DistractMIA performs output-only black-box membership inference on vision-language models by inserting semantic distractors and measuring shifts in generated text responses.
-
Dataset Watermarking for Closed LLMs with Provable Detection
A new watermarking method for closed LLMs boosts random word-pair co-occurrences via rephrasing and detects the signal statistically in outputs, working reliably even when the watermarked data is only 1% of fine-tunin...
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles
A new auditing framework reveals widespread behavioral entanglement among LLMs and shows that reweighting ensembles based on measured independence improves verification accuracy by up to 4.5%.
-
Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning
TokenUnlearn identifies critical tokens via masking and entropy signals then applies hard selection or soft weighting to unlearn only those tokens, yielding better forgetting and retained utility than sequence-level b...
-
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
-
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks
SPENCE shows older NL2SQL benchmarks like Spider have high performance sensitivity to syntactic changes, indicating likely training contamination, while newer ones like BIRD show little sensitivity and appear largely clean.
-
Representation-Guided Parameter-Efficient LLM Unlearning
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
-
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders
KnowSA_CKP uses comparative knowledge probing to selectively augment LLM prompts for items with knowledge gaps, improving recommendation accuracy and context efficiency.
-
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.
-
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Constitutional Classifiers trained on synthetic data from natural language constitutions defend LLMs against universal jailbreaks, with no successful bypass found in over 3000 hours of red teaming and only minor deplo...
-
DataComp-LM: In search of the next generation of training sets for language models
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
-
TOFU: A Task of Fictitious Unlearning for LLMs
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
-
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance
Stable-GFlowNet improves training stability and attack diversity in LLM red-teaming by eliminating Z estimation via contrastive trajectory balance while preserving GFN optimality.
Reference graph
Works this paper leans on
-
[1]
Stability of stochastic gradient descent on nonsmooth convex losses
Raef Bassily, Vitaly Feldman, Crist \'o bal Guzm \'a n, and Kunal Talwar. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, 33: 0 4381--4391, 2020
work page 2020
-
[2]
Pythia: A suite for analyzing large language models across training and scaling, 2023
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023
work page 2023
-
[3]
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B : An open-source autoregressive language model. In Proceedings of the ACL Workshop on C...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[4]
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp.\ 141--159. IEEE, 2021
work page 2021
-
[5]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...
work page 1901
-
[6]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020 b
work page 1901
-
[7]
Extracting training data from large language models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.\ 2633--2650, 2021
work page 2021
-
[8]
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp.\ 1897--1914. IEEE, 2022
work page 2022
-
[11]
Boolq: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL, 2019
work page 2019
-
[13]
Glam: Efficient scaling of language models with mixture-of-experts
Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pp.\ 5547--5569. PMLR, 2022
work page 2022
-
[14]
Who’s harry potter? approximate unlearning in llms.arXiv preprint arXiv:2310.02238,
Ronen Eldan and Mark Russinovich. Who's Harry Potter ? approximate unlearning in LLMs . arXiv preprint arXiv:2310.02238, 2023
-
[15]
Does learning require memorization? a short tale about a long tail
Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp.\ 954--959, 2020
work page 2020
-
[17]
SimCSE : Simple contrastive learning of sentence embeddings
Tianyu Gao, Xingcheng Yao, and Danqi Chen. SimCSE : Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2021
work page 2021
-
[18]
Making ai forget you: Data deletion in machine learning
Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019
work page 2019
-
[20]
Recovering private text in federated learning of language models
Samyak Gupta, Yangsibo Huang, Zexuan Zhong, Tianyu Gao, Kai Li, and Danqi Chen. Recovering private text in federated learning of language models. Advances in Neural Information Processing Systems, 35: 0 8130--8143, 2022
work page 2022
-
[21]
Varun Gupta, Christopher Jung, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Chris Waites. Adaptive machine unlearning. Advances in Neural Information Processing Systems, 34: 0 16319--16330, 2021
work page 2021
-
[22]
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pp.\ 1225--1234. PMLR, 2016
work page 2016
-
[23]
A dataset auditing method for collaboratively trained machine learning models
Yangsibo Huang, Chun-Yin Huang, Xiaoxiao Li, and Kai Li. A dataset auditing method for collaboratively trained machine learning models. IEEE Transactions on Medical Imaging, 2022
work page 2022
-
[24]
Approximate data deletion from machine learning models
Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, and James Zou. Approximate data deletion from machine learning models. In International Conference on Artificial Intelligence and Statistics, pp.\ 2008--2016. PMLR, 2021
work page 2008
-
[26]
Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd? Advances in Neural Information Processing Systems, 33: 0 22205--22216, 2020
work page 2020
-
[27]
Evaluating differentially private machine learning in practice
Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pp.\ 1895--1912, 2019
work page 1912
-
[28]
Deduplicating training data mitigates privacy risks in language models
Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning, pp.\ 10697--10707. PMLR, 2022
work page 2022
-
[29]
California consumer privacy act, 2018
California State Legislature. California consumer privacy act, 2018. URL https://oag.ca.gov/privacy/ccpa
work page 2018
-
[30]
Stolen memories: Leveraging model memorization for calibrated \ White-Box \ membership inference
Klas Leino and Matt Fredrikson. Stolen memories: Leveraging model memorization for calibrated \ White-Box \ membership inference. In 29th USENIX security symposium (USENIX Security 20), pp.\ 1605--1622, 2020
work page 2020
-
[31]
Rouge: A package for automatic evaluation of summaries
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp.\ 74--81, 2004
work page 2004
-
[32]
Truthfulqa: Measuring how models mimic human falsehoods, 2021
Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods, 2021
work page 2021
-
[35]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.\ 142--150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. UR...
work page 2011
-
[36]
Data contamination: From memorization to exploitation
Inbal Magar and Roy Schwartz. Data contamination: From memorization to exploitation. ArXiv, abs/2203.08242, 2022. URL https://api.semanticscholar.org/CorpusID:247475929
-
[38]
Data portraits: Recording foundation model training data, 2023
Marc Marone and Benjamin Van Durme . Data portraits: Recording foundation model training data, 2023. URL https://arxiv.org/abs/2303.03919
-
[43]
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. Detectgpt: Zero-shot machine-generated text detection using probability curvature, 2023. URL https://arxiv.org/abs/2301.11305
-
[44]
Maximilian Mozes, Xuanli He, Bennett Kleinberg, and Lewis D. Griffin. Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities, 2023
work page 2023
-
[45]
Gpt-4 and professional benchmarks: the wrong answer to the wrong question, 2023
Arvind Narayanan. Gpt-4 and professional benchmarks: the wrong answer to the wrong question, 2023. URL https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmarks
work page 2023
-
[46]
Adversary instantiation: Lower bounds for differentially private machine learning
Milad Nasr, Shuang Songi, Abhradeep Thakurta, Nicolas Papernot, and Nicholas Carlin. Adversary instantiation: Lower bounds for differentially private machine learning. In 2021 IEEE Symposium on security and privacy (SP), pp.\ 866--882. IEEE, 2021
work page 2021
- [48]
-
[49]
Did chatgpt cheat on your test?, 2023
Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, and Eneko Agirre. Did chatgpt cheat on your test?, 2023. URL https://hitz-zentroa.github.io/lm-contamination/blog/
work page 2023
-
[50]
Remember what you want to forget: Algorithms for machine unlearning
Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34: 0 18075--18086, 2021
work page 2021
-
[51]
Membership inference attacks against NLP classification models
Virat Shejwalkar, Huseyin A Inan, Amir Houmansadr, and Robert Sim. Membership inference attacks against NLP classification models. In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021. URL https://openreview.net/forum?id=74lwg5oxheC
work page 2021
-
[52]
Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov
R. Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp.\ 3--18, 2016
work page 2017
-
[53]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp.\ 3--18. IEEE, 2017
work page 2017
-
[54]
Auditing data provenance in text-generation models
Congzheng Song and Vitaly Shmatikov. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.\ 196--206, 2019
work page 2019
-
[57]
Redpajama: An open source recipe to reproduce llama training dataset, 2023
TogetherCompute. Redpajama: An open source recipe to reproduce llama training dataset, 2023. URL https://github.com/togethercomputer/RedPajama-Data
work page 2023
-
[59]
Llama 2: Open foundation and fine-tuned chat models, 2023 b
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...
work page 2023
-
[60]
The eu general data protection regulation (gdpr)
Paul Voigt and Axel Von dem Bussche. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10 0 (3152676): 0 10--5555, 2017
work page 2017
-
[61]
On the importance of difficulty calibration in membership inference attacks
Lauren Watson, Chuan Guo, Graham Cormode, and Alexandre Sablayrolles. On the importance of difficulty calibration in membership inference attacks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=3eIrli0TwQ
work page 2022
-
[62]
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=gEZrGCozdqR
work page 2022
-
[63]
Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. "according to ..." prompting language models improves quoting from pre-training data, 2023
work page 2023
-
[64]
Deltagrad: Rapid retraining of machine learning models
Yinjun Wu, Edgar Dobriban, and Susan Davidson. Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning, pp.\ 10355--10366. PMLR, 2020
work page 2020
-
[65]
Learning with recoverable forgetting
Jingwen Ye, Yifang Fu, Jie Song, Xingyi Yang, Songhua Liu, Xin Jin, Mingli Song, and Xinchao Wang. Learning with recoverable forgetting. In European Conference on Computer Vision, pp.\ 87--103. Springer, 2022
work page 2022
-
[66]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp.\ 268--282, 2018 a . doi:10.1109/CSF.2018.00027
-
[67]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp.\ 268--282. IEEE, 2018 b
work page 2018
-
[68]
u hle, Andrew Paverd, Olga Ohrimenko, Boris K \
Santiago Zanella-B \'e guelin, Lukas Wutschitz, Shruti Tople, Victor R \"u hle, Andrew Paverd, Olga Ohrimenko, Boris K \"o pf, and Marc Brockschmidt. Analyzing information leakage of updates to natural language models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pp.\ 363--375, 2020
work page 2020
-
[71]
International conference on machine learning , pages=
Train faster, generalize better: Stability of stochastic gradient descent , author=. International conference on machine learning , pages=. 2016 , organization=
work page 2016
-
[72]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The pile: An 800gb dataset of diverse text for language modeling , author=. arXiv preprint arXiv:2101.00027 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[73]
Advances in Neural Information Processing Systems , volume=
Stability of stochastic gradient descent on nonsmooth convex losses , author=. Advances in Neural Information Processing Systems , volume=
-
[74]
Language Models are Few-Shot Learners , url =
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...
-
[75]
2017 IEEE Symposium on Security and Privacy (SP) , pages=
Membership inference attacks against machine learning models , author=. 2017 IEEE Symposium on Security and Privacy (SP) , pages=
work page 2017
-
[76]
Data Contamination: From Memorization to Exploitation , author=. ArXiv , year=
-
[77]
Arvind Narayanan , title =
-
[78]
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities , author=. 2023 , eprint=
work page 2023
-
[79]
arXiv preprint arXiv:2305.00118 , year=
Speak, memory: An archaeology of books known to chatgpt/gpt-4 , author=. arXiv preprint arXiv:2305.00118 , year=
- [80]
-
[81]
OPT: Open Pre-trained Transformer Language Models
Opt: Open pre-trained transformer language models , author=. arXiv preprint arXiv:2205.01068 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[82]
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , year=
Yeom, Samuel and Giacomelli, Irene and Fredrikson, Matt and Jha, Somesh , booktitle=. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , year=
-
[83]
Advances in neural information processing systems , volume=
A neural probabilistic language model , author=. Advances in neural information processing systems , volume=
-
[84]
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author=. 2023 , eprint=
work page 2023
-
[85]
Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel , booktitle=
-
[86]
Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=
work page 2023
-
[87]
Understanding Membership Inferences on Well-Generalized Learning Models
Understanding membership inferences on well-generalized learning models , author=. arXiv preprint arXiv:1802.04889 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[88]
International Conference on Learning Representations , year=
On the Importance of Difficulty Calibration in Membership Inference Attacks , author=. International Conference on Learning Representations , year=
-
[89]
arXiv preprint arXiv:2308.04430 , year=
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore , author=. arXiv preprint arXiv:2308.04430 , year=
-
[90]
2022 IEEE Symposium on Security and Privacy (SP) , pages=
Membership inference attacks from first principles , author=. 2022 IEEE Symposium on Security and Privacy (SP) , pages=. 2022 , organization=
work page 2022
- [91]
-
[92]
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
Mireshghallah, Fatemehsadat and Goyal, Kartik and Uniyal, Archit and Berg-Kirkpatrick, Taylor and Shokri, Reza. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.570
-
[93]
Mitchell, Eric and Lee, Yoonho and Khazatsky, Alexander and Manning, Christopher D. and Finn, Chelsea , title =
-
[94]
Membership Inference Attacks against Language Models via Neighbourhood Comparison
Mattern, Justus and Mireshghallah, Fatemehsadat and Jin, Zhijing and Schoelkopf, Bernhard and Sachan, Mrinmaya and Berg-Kirkpatrick, Taylor. Membership Inference Attacks against Language Models via Neighbourhood Comparison. Findings of the Association for Computational Linguistics: ACL 2023. 2023. doi:10.18653/v1/2023.findings-acl.719
-
[95]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[96]
30th USENIX Security Symposium (USENIX Security 21) , pages=
Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=
-
[97]
Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harrison Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and ...
work page 2021
-
[98]
2017 IEEE symposium on security and privacy (SP) , pages=
Membership inference attacks against machine learning models , author=. 2017 IEEE symposium on security and privacy (SP) , pages=. 2017 , organization=
work page 2017
-
[99]
arXiv preprint arXiv:2106.11384 , year=
Membership inference on word embedding and beyond , author=. arXiv preprint arXiv:2106.11384 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.