pith. sign in

arxiv: 2605.26133 · v1 · pith:4RV4Q7GXnew · submitted 2026-05-21 · 💻 cs.CL · cs.AI· cs.LG

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Pith reviewed 2026-06-30 17:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords pretraining data exposuremembership inferencedata contaminationlarge language modelsprivacysecuritysurvey
0
0 comments X

The pith

A survey unifies membership inference and data contamination as pretraining data exposure in large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the first unified survey of pretraining data exposure in large language models by combining studies on membership inference and data contamination under one framework. It formalizes exposure across different levels and reviews various attack and defense methods from both areas. The work synthesizes empirical findings and identifies open challenges and future directions. A reader would care because this helps address privacy concerns and ensures the integrity of model evaluations when training data is opaque and massive.

Core claim

The central claim is that pretraining data exposure can be formalized across exposure levels to bring together the conceptually related but previously isolated areas of data contamination and membership inference, allowing a review of attack and defense methods, synthesis of findings, and highlighting of challenges in LLMs.

What carries the argument

The PDE framework, which determines whether specific data appeared in an LLM's pretraining corpus and unifies data contamination and membership inference.

Load-bearing premise

That membership inference and data contamination are conceptually close enough to be productively unified under a single PDE framework without important distinctions being lost or the survey becoming too broad to be useful.

What would settle it

Empirical evidence that methods and findings from membership inference and data contamination cannot be meaningfully compared or combined because their core mechanisms differ fundamentally.

read the original abstract

Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining Data Exposure (PDE) increase due to the scale and opacity of training datasets. PDE refers to determining whether specific data appeared in an LLM's pretraining corpus. It is critical for ensuring evaluation integrity and protecting privacy, intersecting two key areas: data contamination and membership inference. Though conceptually related, these areas have often been studied in isolation. This paper offers the first unified survey of both under the PDE framework. We formalize PDE across exposure levels, review attack and defense methods, synthesize empirical findings, and highlight open challenges and future research directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to deliver the first unified survey of membership inference and data contamination in LLMs under a new Pretraining Data Exposure (PDE) framework. It formalizes PDE across exposure levels, reviews attack and defense methods from both literatures, synthesizes empirical findings, and identifies open challenges and future directions.

Significance. If the unification holds without losing key distinctions, the survey could serve as a useful reference that connects two previously isolated research threads, aiding work on LLM privacy risks and evaluation integrity. The organizational contribution of the PDE framework and the synthesis of methods are the primary potential strengths.

minor comments (2)
  1. The abstract states that the areas 'have often been studied in isolation' but does not cite prior attempts at partial unification; adding 1-2 sentences in the introduction with explicit comparison to any overlapping prior surveys would strengthen the novelty claim.
  2. Section headings and subsection numbering are not provided in the supplied text; ensuring consistent numbering and a clear table of contents would improve navigability for a survey of this length.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The feedback affirms the value of unifying membership inference and data contamination under the PDE framework. No specific major comments were listed in the report, so we have no individual points to address point-by-point at this stage. We will incorporate any additional feedback from the editor or further referee comments in the revision.

Circularity Check

0 steps flagged

No significant circularity: survey with no derivations

full rationale

This paper is a literature survey that proposes an organizational PDE framework to unify two previously separate research areas. It contains no new equations, fitted parameters, predictions, or derivations of any kind. The central claim of providing the 'first unified survey' is a statement about coverage and synthesis rather than a mathematical result that could reduce to its inputs by construction. No self-citation load-bearing steps, ansatzes, or renamings of known results appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; the abstract introduces no new free parameters, mathematical axioms, or invented entities. All content is drawn from cited prior literature.

pith-pipeline@v0.9.1-grok · 5661 in / 1072 out tokens · 25008 ms · 2026-06-30T17:18:21.443174+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 47 canonical work pages · 7 internal anchors

  1. [1]

    anthropic.com/news/claude-3-family, accessed: 2025-03-16

    Anthropic: Introducing the next generation of claude (2024), https://www. anthropic.com/news/claude-3-family, accessed: 2025-03-16

  2. [2]

    Conference of the European Chapter of the Association for Computational Linguistics (2024)

    Balloccu, S., Schmidtov’a, P., Lango, M., Dusek, O.: Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. Conference of the European Chapter of the Association for Computational Linguistics (2024). https://doi.org/10.48550/arxiv.2402.03927

  3. [3]

    arXiv.org (2024)

    Cao, J., Zhang, W., Cheung, S.: Concerned with data contamination? assessing countermeasures in code language model. arXiv.org (2024). https://doi.org/10. 48550/arxiv.2403.16898

  4. [4]

    Quantifying Memorization Across Neural Language Models

    Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C.: Quantifying Memorization Across Neural Language Models (March 2023). https://doi.org/10. 48550/arXiv.2202.07646, http://arxiv.org/abs/2202.07646, arXiv:2202.07646 [cs] 8 Z. Tong et al

  5. [5]

    https://doi.org/10.48550/arXiv.2403.00393

    Chandran, N., Sitaram, S., Gupta, D., Sharma, R., Mittal, K., Swaminathan, M.: Private benchmarking to prevent contamination and improve comparative evalua- tion of llms (2024). https://doi.org/10.48550/arXiv.2403.00393

  6. [6]

    Chen, S., Chen, Y., Li, Z., Jiang, Y., Wan, Z., He, Y., Ran, D., Gu, T., Li, H., Xie, T., Ray, B.: Recent advances in large langauge model benchmarks against data contamination: From static to dynamic evaluation (2025)

  7. [7]

    https://doi.org/10.48550/arXiv.2502.14425, http://arxiv.org/abs/2502.14425, arXiv:2502.14425 [cs]

    Cheng, Y., Chang, Y., Wu, Y.: A Survey on Data Contamination for Large Language Models (February 2025). https://doi.org/10.48550/arXiv.2502.14425, http://arxiv.org/abs/2502.14425, arXiv:2502.14425 [cs]

  8. [8]

    arXiv.org (2024)

    Dekoninck, J., Muller, M.N., Baader, M., Fischer, M., Vechev, M.T.: Evading data contamination detection for language models is (too) easy. arXiv.org (2024). https: //doi.org/10.48550/arxiv.2402.02823

  9. [10]

    arXiv preprint arXiv:2406.14644 (2024)

    Deng, C., Zhao, Y., Heng, Y., Li, Y., Cao, J., Tang, X., Cohan, A.: Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644 (2024)

  10. [11]

    https://doi.org/10.48550/arXiv.2311.09783, http://arxiv.org/abs/ 2311.09783, arXiv:2311.09783 [cs]

    Deng, C., Zhao, Y., Tang, X., Gerstein, M., Cohan, A.: Investigating Data Contamination in Modern Benchmarks for Large Language Models (Apr 2024). https://doi.org/10.48550/arXiv.2311.09783, http://arxiv.org/abs/ 2311.09783, arXiv:2311.09783 [cs]

  11. [12]

    https: //doi.org/10.48550/arXiv.2402.15938

    Dong, Y., Jiang, X., Liu, H., Jin, Z., Li, G.: Generalization or memorization: Data contamination and trustworthy evaluation for large language models (2024). https: //doi.org/10.48550/arXiv.2402.15938

  12. [13]

    Duan, M., Suri, A., Mireshghallah, N., Min, S., Shi, W., Zettlemoyer, L.S., Tsvetkov, Y., Choi, Y., Evans, D., Hajishirzi, H.: Do membership inference at- tacks work on large language models? (2024)

  13. [14]

    Fang, J., Jiang, H., Wang, K., Ma, Y ., Jie, S., Wang, X., He, X., and Chua, T.-S

    Eldan, R., Russinovich, M.: Who’s harry potter? approximate unlearning in llms. arXiv preprint arXiv:2310.02238 (2023)

  14. [16]

    https://doi.org/10.48550/arXiv.2311.06062

    Fu, W., Wang, H., Gao, C., Liu, G., Li, Y., Jiang, T.: Practical membership infer- ence attacks against fine-tuned large language models via self-prompt calibration (2023). https://doi.org/10.48550/arXiv.2311.06062

  15. [17]

    arXiv preprint arXiv:2410.18966 (2024)

    Fu, Y., Uzuner, O., Yetisgen, M., Xia, F.: Does data contamination detection work (well) for llms? a survey and evaluation on detection assumptions. arXiv preprint arXiv:2410.18966 (2024)

  16. [18]

    ACM Computing Surveys (CSUR)54(11s), 1–37 (2022)

    Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR)54(11s), 1–37 (2022)

  17. [19]

    arXiv preprint arXiv:2305.16157 (2023)

    Ishihara, S.: Training data extraction from pre-trained language models: A survey. arXiv preprint arXiv:2305.16157 (2023)

  18. [20]

    https://doi.org/10.48550/arXiv.2305.10160, http:// arxiv.org/abs/2305.10160, arXiv:2305.10160 [cs]

    Jacovi, A., Caciularu, A., Goldman, O., Goldberg, Y.: Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks (October 2023). https://doi.org/10.48550/arXiv.2305.10160, http:// arxiv.org/abs/2305.10160, arXiv:2305.10160 [cs]

  19. [21]

    LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

    Jain, N., Han, K., Gu, A., Li, W.D., Yan, F., Zhang, T., Wang, S., Solar-Lezama, A., Sen, K., Stoica, I.: Livecodebench: Holistic and contamination free evaluation of large language models for code (2024). https://doi.org/10.48550/arXiv.2403.07974 Title Suppressed Due to Excessive Length 9

  20. [23]

    https://doi.org/10.48550/arXiv.2404.11262

    Kaneko, M., Ma, Y., Wata, Y., Okazaki, N.: Sampling-based pseudo-likelihood for membership inference attacks (2024). https://doi.org/10.48550/arXiv.2404.11262

  21. [24]

    https: //doi.org/10.48550/arXiv.2410.07582

    Kim, G., Li, Y., Spiliopoulou, E., Ma, J., Ballesteros, M., Wang, W.Y.: Detecting training data of large language models via expectation maximization (2024). https: //doi.org/10.48550/arXiv.2410.07582

  22. [25]

    https://doi.org/10.48550/arXiv.2312.16337, http:// arxiv.org/abs/2312.16337, arXiv:2312.16337 [cs]

    Li, C., Flanigan, J.: Task Contamination: Language Models May Not Be Few- Shot Anymore (Dec 2023). https://doi.org/10.48550/arXiv.2312.16337, http:// arxiv.org/abs/2312.16337, arXiv:2312.16337 [cs]

  23. [26]

    Li, Y., Wong, T.L., Hung, C.T., Zhao, J., Zheng, D., Liu, K.W., Lyu, M.R., Wang, L.: C2leva: Toward comprehensive and contamination-free language model evalua- tion (2024)

  24. [27]

    Li, Y.: Estimating contamination via perplexity: Quantifying memorisation in lan- guage model evaluation (2023), https://arxiv.org/abs/2309.10677

  25. [30]

    arXiv.org (2023)

    Liu, Y.: An open source data contamination report for llama series models. arXiv.org (2023). https://doi.org/10.48550/arxiv.2310.17589

  26. [31]

    arXiv preprint arXiv:2407.16997 (2024)

    Liu, Y., Zhang, Y., Jaakkola, T., Chang, S.: Revisiting who’s harry potter: To- wards targeted unlearning from a causal intervention perspective. arXiv preprint arXiv:2407.16997 (2024)

  27. [32]

    https://doi.org/10.1109/SP46215.2023.10179300

    Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., Zanella-B’eguelin, S.: An- alyzing leakage of personally identifiable information in language models (2023). https://doi.org/10.1109/SP46215.2023.10179300

  28. [33]

    https://doi.org/10.1145/ 3702980

    Majdinasab, V., Nikanjam, A., Khomh, F.: Trained without my consent: Detecting code inclusion in language models trained on code (2024). https://doi.org/10.1145/ 3702980

  29. [34]

    Mancera, G., DeAlcala, D., Fiérrez, J., Tolosana, R., Morales, A.: Is my text in your ai model? gradient-based membership inference test applied to llms (2025)

  30. [35]

    Annual Meeting of the Association for Computational Lin- guistics (2023)

    Mattern, J., Mireshghallah, F., Jin, Z., Schölkopf, B., Sachan, M., Berg- Kirkpatrick, T.: Membership inference attacks against language models via neigh- bourhood comparison. Annual Meeting of the Association for Computational Lin- guistics (2023). https://doi.org/10.48550/arxiv.2305.18462

  31. [36]

    https://doi.org/10.48550/arXiv.2402.09363

    Meeus, M., Shilov, I., Faysse, M., de Montjoye, Y.A.: Copyright traps for large language models (2024). https://doi.org/10.48550/arXiv.2402.09363

  32. [37]

    Conference on Empirical Methods in Natural Language Processing (2022)

    Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., Shokri, R.: Quan- tifying privacy risks of masked language models using membership inference at- tacks. Conference on Empirical Methods in Natural Language Processing (2022). https://doi.org/10.48550/arxiv.2203.03929

  33. [38]

    arXiv.org (2024) 10 Z

    Mozaffari, H., Marathe, V.J.: Semantic membership inference attack against large language models. arXiv.org (2024) 10 Z. Tong et al

  34. [39]

    https://doi.org/ 10.48550/arXiv.2410.08858

    Nie, Y., Wang, C., Wang, K., Xu, G., Xu, G., Wang, H.: Decoding secret memo- rization in code llms through token-level characterization (2024). https://doi.org/ 10.48550/arXiv.2410.08858

  35. [40]

    Niu, L., Mirza, M.S., Maradni, Z., Pöpper, C.: Codexleaks: Privacy leaks from code generation language models in github copilot (2023)

  36. [41]

    https://doi.org/ 10.48550/arXiv.2310.17623, http://arxiv.org/abs/2310.17623, arXiv:2310.17623 [cs]

    Oren, Y., Meister, N., Chatterji, N., Ladhak, F., Hashimoto, T.B.: Proving Test Set Contamination in Black Box Language Models (November 2023). https://doi.org/ 10.48550/arXiv.2310.17623, http://arxiv.org/abs/2310.17623, arXiv:2310.17623 [cs]

  37. [42]

    CONDA (2024)

    Palavalli, M., Bertsch, A., Gormley, M.R.: A taxonomy for data contamination in large language models. CONDA (2024). https://doi.org/10.48550/arxiv.2407. 08716

  38. [43]

    https://doi

    Panaitescu-Liess, M.A., Che, Z., An, B., Xu, Y., Pathmanathan, P., Chakraborty, S., Zhu, S., Goldstein, T., Huang, F.: Can watermarking large language models prevent copyrighted text generation and hide training data? (2024). https://doi. org/10.48550/arXiv.2407.17417

  39. [44]

    Conference on Empirical Methods in Natural Language Processing (2024)

    Qian, K., Wan, S., Tang, C., Wang, Y., Zhang, X., Chen, M., Yu, Z.: Var- bench: Robust language model benchmarking through dynamic variable pertur- bation. Conference on Empirical Methods in Natural Language Processing (2024). https://doi.org/10.48550/arxiv.2406.17681

  40. [45]

    https://doi.org/10.48550/ arXiv.2404.00699, http://arxiv.org/abs/2404.00699, arXiv:2404.00699 [cs]

    Ravaut, M., Ding, B., Jiao, F., Chen, H., Li, X., Zhao, R., Qin, C., Xiong, C., Joty, S.: How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library (August 2024). https://doi.org/10.48550/ arXiv.2404.00699, http://arxiv.org/abs/2404.00699, arXiv:2404.00699 [cs]

  41. [46]

    https://doi.org/10.48550/arXiv.2310.18018, http://arxiv.org/abs/2310.18018, arXiv:2310.18018 [cs]

    Sainz, O., Campos, J.A., García-Ferrero, I., Etxaniz, J., Lacalle, O.L.d., Agirre, E.: NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark (October 2023). https://doi.org/10.48550/arXiv.2310.18018, http://arxiv.org/abs/2310.18018, arXiv:2310.18018 [cs]

  42. [47]

    https://doi.org/10.48550/arXiv.2409.09927, http://arxiv

    Samuel, V., Zhou, Y., Zou, H.P.: Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Chal- lenges (December 2024). https://doi.org/10.48550/arXiv.2409.09927, http://arxiv. org/abs/2409.09927, arXiv:2409.09927 [cs]

  43. [48]

    Detecting Pretraining Data from Large Language Models

    Shi, W., Ajith, A., Xia, M., Huang, Y., Liu, D., Blevins, T., Chen, D., Zettlemoyer, L.: Detecting Pretraining Data from Large Language Models (March 2024). https://doi.org/10.48550/arXiv.2310.16789, http://arxiv.org/abs/ 2310.16789, arXiv:2310.16789 [cs]

  44. [50]

    arXiv: Cryptography and Security (2016)

    Shokri, R., Shokri, R., Stronati, M., Stronati, M., Stronati, M., Song, C., Song, C., Shmatikov, V., Shmatikov, V.: Membership inference attacks against machine learning models. arXiv: Cryptography and Security (2016)

  45. [51]

    Gemini: A Family of Highly Capable Multimodal Models

    Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  46. [52]

    Placeholder Journal (2025)

    Tran, T., Liu, R., Xiong, L.: Tokens for learning, tokens for unlearning: Mitigating membership inference attacks in large language models via dual-purpose training. Placeholder Journal (2025)

  47. [53]

    Nordic Conference of Computational Linguistics (2023) Title Suppressed Due to Excessive Length 11

    Vakili, T., Dalianis, H.: Using membership inference attacks to evaluate privacy- preserving language modeling fails for pseudonymizing data. Nordic Conference of Computational Linguistics (2023) Title Suppressed Due to Excessive Length 11

  48. [54]

    https://doi.org/10.48550/arXiv.2404.14296

    Wan, Y., Wan, G., Zhang, S., Zhang, H., Zhou, P., Jin, H., Sun, L.: Does your neural code completion model use my code? a membership inference approach (2024). https://doi.org/10.48550/arXiv.2404.14296

  49. [55]

    Wang, J.G., Wang, J., Li, M., Neel, S.: Pandora’s white-box: Precise training data detection and extraction in large language models (2024)

  50. [56]

    arXiv.org (2024)

    Wei, R., Li, M., Ghassemi, M., Kreavci’c, E., Li, Y., Yue, X., Li, B., Potluru, V.K., Li, P., Chien, E.: Underestimated privacy risks for minority populations in large language model unlearning. arXiv.org (2024). https://doi.org/10.48550/ arxiv.2412.08559

  51. [57]

    Neural Information Processing Systems (2024)

    Wen, Y., Marchyok, L., Hong, S., Geiping, J., Goldstein, T., Carlini, N.: Pri- vacy backdoors: Enhancing membership inference through poisoning pre-trained models. Neural Information Processing Systems (2024). https://doi.org/10.48550/ arxiv.2404.01231

  52. [58]

    LiveBench: A Challenging, Contamination-Limited LLM Benchmark

    White, C., Dooley, S., ManleyRoberts, Pal, A., Feuer, B., Jain, S., Shwartz- Ziv, R., Jain, N., Saifullah, K., Naidu, S., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger,W.,Goldblum,M.,Abacus.AI,Nyu,Nvidia:Livebench:Achallenging, contamination-free llm benchmark. arXiv.org (2024). https://doi.org/10.48550/ arxiv.2406.19314

  53. [59]

    https://doi

    Wu, X., Pan, L., Xie, Y., Zhou, R., Zhao, S., Ma, Y., Du, M., Mao, R., Luu, A., Wang, W.Y.: Antileak-bench: Preventing data contamination by automatically constructing benchmarks with updated real-world knowledge (2024). https://doi. org/10.48550/arXiv.2412.13670

  54. [60]

    Caterini

    Wu, Z., Lou, J., Zheng, Z., Chen, C.: Memhunter: Automated and verifiable memo- rization detection at dataset-scale in llms (2024). https://doi.org/10.48550/arXiv. 2412.07261

  55. [61]

    Benchmark Data Contamination of Large Language Models: A Survey

    Xu, C., Guan, S., Greene, D., Kechadi, M., et al.: Benchmark data contamination of large language models: A survey. arXiv preprint arXiv:2406.04244 (2024)

  56. [62]

    IEEE Trans- actions on Software Engineering (2023)

    Yang, Z., Zhao, Z., Wang, C., Shi, J., Kim, D., Han, D., Lo, D.: Gotcha! this model uses my code! evaluating membership leakage risks in code models. IEEE Trans- actions on Software Engineering (2023). https://doi.org/10.1109/tse.2024.3482719

  57. [63]

    Conference on Empirical Methods in Natural Language Processing (2024)

    Zhang, R., Bertran, M., Roth, A.: Order of magnitude speedups for llm member- ship inference. Conference on Empirical Methods in Natural Language Processing (2024). https://doi.org/10.48550/arxiv.2409.14513

  58. [64]

    Zhang, J

    Zhang, S., Li, H.: Code membership inference for detecting unauthorized data use in code pre-trained language models (2023). https://doi.org/10.48550/arXiv.2312. 07200

  59. [65]

    Zhao, S., Zhu, L., Quan, R., Yang, Y.: Protecting copyrighted material with unique identifiers in large language model training (2024)

  60. [66]

    Zhao, Y.F., Zhang, J.: Does training with synthetic data truly protect privacy? (2025)

  61. [67]

    https: //doi.org/10.48550/arXiv.2311.01964

    Zhou, K., Zhu, Y., Chen, Z., Chen, W., Zhao, W.X., Chen, X., Lin, Y., Wen, J.R., Han, J.: Don’t make your llm an evaluation benchmark cheater (2023). https: //doi.org/10.48550/arXiv.2311.01964

  62. [68]

    Conference on Empirical Methods in Natural Language Pro- cessing (2024)

    Zhu, Q., Cheng, Q., Peng, R., Li, X., Liu, T., Peng, R., Qiu, X., Huang, X.: Inference-time decontamination: Reusing leaked benchmarks for large language model evaluation. Conference on Empirical Methods in Natural Language Pro- cessing (2024). https://doi.org/10.48550/arxiv.2406.13990