pith. sign in

arxiv: 2503.22760 · v1 · submitted 2025-03-27 · 💻 cs.CR · cs.LG· cs.PL

Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation

Pith reviewed 2026-05-22 22:18 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.PL
keywords unintended memorizationdisclosure riskscode generationlarge language modelstraining data privacymalicious disclosureunintentional disclosuredata processing
0
0 comments X

The pith

Data processing changes for code-generating LLMs shift unintended memorization risks and can raise one disclosure type while lowering another.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper decomposes the risk that code LLMs will output sensitive training data into two parts: unintentional disclosure, in which the model reveals secrets without the user trying to extract them, and malicious disclosure, in which an attacker armed with partial knowledge can force the model to surface those secrets. It supplies methods for measuring both risks on the same models and datasets so that the effects of data-source and data-processing decisions can be compared directly. When the methods are applied, the results indicate that such decisions produce large changes in overall risk, that a single change can move the two risks in opposite directions, and that disclosure rates also differ according to the category of sensitive information involved.

Core claim

The central claim is that changes in data source and processing are associated with substantial changes in unintended memorization risk, that the same operational changes may increase one risk while mitigating another, and that the risk of disclosing sensitive information varies not only by prompt strategies or test datasets but also by the types of sensitive information.

What carries the argument

Side-by-side assessment methods that measure unintentional disclosure (model reveals secrets unprompted) and malicious disclosure (attacker with partial knowledge extracts secrets) on matched releases of training datasets and models.

If this is right

  • Operational choices made during dataset construction directly determine the balance of disclosure risks in the resulting model.
  • Mitigating one form of disclosure can increase exposure on the other form, so both must be tracked together.
  • Disclosure likelihood is not uniform; certain categories of sensitive information are more readily surfaced than others under the same conditions.
  • Privacy and security testing must be performed at the level of the training-data supply chain rather than only on the final model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams building code LLMs could run the same paired assessments on multiple candidate data pipelines to select the variant with the most acceptable risk profile.
  • The observed variation by information type points toward possible targeted filtering steps that focus on high-risk categories such as credentials or API tokens.
  • The side-by-side measurement approach could be applied to non-code models or to other data modalities to check whether similar trade-offs appear.

Load-bearing premise

The assessment methods accurately measure true disclosure risks without substantial false positives or negatives, and the tested categories of sensitive information represent real secrets found in code repositories.

What would settle it

A controlled test in which a documented data-processing change produces no measurable difference in either disclosure risk across repeated prompt sets and information categories.

Figures

Figures reproduced from arXiv: 2503.22760 by Nick Judd, Rafiqul Rabin, Sean McGregor.

Figure 1
Figure 1. Figure 1: Count of potentially sensitive information in the Dolma training data. Results show that the count of email addresses and secret keys generally [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Model propensity to disclose sensitive information in training data. Results show that the likelihood of unintentional disclosure decreased over iterated [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Malicious disclosure of sensitive information across various prompting strategies based on pass@10 results. Results show that malicious disclosures [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Unintentional disclosure of sensitive information across various test datasets based on pass@10 results. Results show that unintentional disclosure is [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

This paper explores the risk that a large language model (LLM) trained for code generation on data mined from software repositories will generate content that discloses sensitive information included in its training data. We decompose this risk, known in the literature as ``unintended memorization,'' into two components: unintentional disclosure (where an LLM presents secrets to users without the user seeking them out) and malicious disclosure (where an LLM presents secrets to an attacker equipped with partial knowledge of the training data). We observe that while existing work mostly anticipates malicious disclosure, unintentional disclosure is also a concern. We describe methods to assess unintentional and malicious disclosure risks side-by-side across different releases of training datasets and models. We demonstrate these methods through an independent assessment of the Open Language Model (OLMo) family of models and its Dolma training datasets. Our results show, first, that changes in data source and processing are associated with substantial changes in unintended memorization risk; second, that the same set of operational changes may increase one risk while mitigating another; and, third, that the risk of disclosing sensitive information varies not only by prompt strategies or test datasets but also by the types of sensitive information. These contributions rely on data mining to enable greater privacy and security testing required for the LLM training data supply chain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper decomposes unintended memorization risk in code-generation LLMs into unintentional disclosure (secrets revealed without attacker intent) and malicious disclosure (secrets revealed to an attacker with partial knowledge). It presents side-by-side assessment methods for these risks, applies them to successive OLMo model releases and Dolma training datasets, and reports three findings: data-source and processing changes produce substantial shifts in risk; the same changes can increase one risk while decreasing the other; and disclosure risk varies by prompt strategy, test dataset, and category of sensitive information.

Significance. If the assessment procedures are shown to be reliable, the work supplies a concrete, reproducible framework for measuring privacy/security trade-offs in the LLM data supply chain and demonstrates that operational decisions (filtering, deduplication, source selection) are not uniformly risk-reducing. This is directly actionable for practitioners curating training corpora and for regulators evaluating model releases.

major comments (3)
  1. [Methods] Methods section: the three headline results rest on the claim that the described side-by-side procedures correctly quantify true disclosure risk. No ground-truth injection experiments, false-positive/negative rate measurements, or coverage argument for the chosen sensitive-information taxonomy are supplied; without these, observed differences could be measurement artifacts.
  2. [Results] Results / Evaluation: the abstract and results narrative report associations and differential effects but supply no measurement protocols, statistical controls, error bars, or data-exclusion rules. This prevents verification that the reported changes in risk are robust to reasonable variations in evaluation design.
  3. [§4] §4 (Application to OLMo/Dolma): the claim that risk varies by type of sensitive information is load-bearing for the third contribution, yet the paper provides no argument that the selected categories are representative of secrets actually present in public code repositories.
minor comments (2)
  1. [Methods] Define the exact operational criteria used to label an output as an unintentional versus malicious disclosure; a short decision tree or pseudocode would remove ambiguity.
  2. [Discussion] Add a limitations paragraph discussing the scope of the secret taxonomy and the generalizability of the OLMo/Dolma findings to other code LLMs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We appreciate the constructive feedback and will revise the manuscript to address the concerns raised regarding the validation of our assessment methods and the robustness of our results.

read point-by-point responses
  1. Referee: [Methods] Methods section: the three headline results rest on the claim that the described side-by-side procedures correctly quantify true disclosure risk. No ground-truth injection experiments, false-positive/negative rate measurements, or coverage argument for the chosen sensitive-information taxonomy are supplied; without these, observed differences could be measurement artifacts.

    Authors: We agree that the manuscript would be strengthened by a more explicit discussion of the limitations of our assessment procedures. The methods are intended to provide observable measures of disclosure under controlled prompt conditions rather than a complete quantification of 'true' risk. In the revised version, we will expand the Methods section to include a dedicated limitations subsection addressing potential measurement artifacts, the absence of ground-truth injections, and the rationale for the sensitive information taxonomy. We note that performing injection experiments would require a separate study design and is beyond the scope of the current work, which focuses on real-world data from Dolma. revision: partial

  2. Referee: [Results] Results / Evaluation: the abstract and results narrative report associations and differential effects but supply no measurement protocols, statistical controls, error bars, or data-exclusion rules. This prevents verification that the reported changes in risk are robust to reasonable variations in evaluation design.

    Authors: We will revise the Results section to include detailed measurement protocols, including the exact criteria for identifying disclosures, data exclusion rules (e.g., how duplicates or non-sensitive items were handled), and any statistical analyses performed. Where applicable, we will add error bars based on variations across prompt templates or subsets of the test data. This will improve the verifiability of the reported changes. revision: yes

  3. Referee: [§4] §4 (Application to OLMo/Dolma): the claim that risk varies by type of sensitive information is load-bearing for the third contribution, yet the paper provides no argument that the selected categories are representative of secrets actually present in public code repositories.

    Authors: The sensitive information categories were selected based on established taxonomies from secret detection literature (e.g., references to tools like TruffleHog and studies on credential leaks in GitHub). We will add an argument in §4 or a new appendix section citing evidence from public reports on the prevalence of these secret types in code repositories to support their representativeness. If additional categories are suggested, we can discuss expanding the taxonomy in future work. revision: partial

Circularity Check

0 steps flagged

Empirical measurement study with no derivation chain or self-referential reductions

full rationale

The paper is an empirical assessment of disclosure risks in LLMs using side-by-side methods applied to public OLMo/Dolma releases. It reports observed associations between data processing changes and risk levels, plus variation by information type. No equations, fitted parameters renamed as predictions, self-definitional constructs, uniqueness theorems, or ansatzes appear in the described chain. The central claims rest on the application of described assessment procedures to datasets, which are externally falsifiable via replication on the same public artifacts rather than reducing to self-citation or internal definition. The reader's noted assumption about method accuracy is a validity concern, not a circularity issue per the rules.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical observations from assessment methods applied to specific models and datasets; no free parameters, invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)
  • domain assumption Sensitive information in code repositories can be identified and categorized into testable types for disclosure measurement.
    Invoked when the abstract states that risk varies by types of sensitive information.

pith-pipeline@v0.9.0 · 5763 in / 1326 out tokens · 89107 ms · 2026-05-22T22:18:45.647398+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

    cs.SE 2026-04 unverdicted novelty 4.0

    Forum discussions highlight four security concerns with GitHub Copilot: data leakage, code licensing problems, adversarial attacks such as prompt injection, and generation of insecure code.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    The Stack: 3 TB of permissively licensed source code,

    D. Kocetkov, R. Li, L. B. Allal, J. Li, C. Mou, C. M. Ferrandis, Y . Jernite, M. Mitchell, S. Hughes, T. Wolf et al., “The Stack: 3 TB of permissively licensed source code,” arXiv preprint arXiv:2211.15533 , 2022

  2. [2]

    Automated detection of password leakage from public github repos- itories,

    R. Feng, Z. Yan, S. Peng, and Y . Zhang, “Automated detection of password leakage from public github repos- itories,” in Proceedings of the 44th International Confer- ence on Software Engineering , 2022, pp. 175–186

  3. [3]

    Pushed by accident: A mixed- methods study on strategies of handling secret infor- mation in source code repositories,

    A. Krause, J. H. Klemmer, N. Huaman, D. Wermke, Y . Acar, and S. Fahl, “Pushed by accident: A mixed- methods study on strategies of handling secret infor- mation in source code repositories,” in 32nd USENIX Security Symposium, 2023, pp. 2527–2544

  4. [4]

    The secret sharer: Evaluating and testing unintended memorization in neural networks,

    N. Carlini, C. Liu, ´U. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in 28th USENIX Se- curity Symposium, 2019, pp. 267–284

  5. [5]

    Understanding unintended memorization in lan- guage models under federated learning,

    O. D. Thakkar, S. Ramaswamy, R. Mathews, and F. Bea- ufays, “Understanding unintended memorization in lan- guage models under federated learning,” in Proceedings of the Third Workshop on Privacy in Natural Language Processing. Online: ACL, Jun. 2021, pp. 1–10

  6. [6]

    Quantifying memorization across neural language models,

    N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying memorization across neural language models,” in The Eleventh International Confer- ence on Learning Representations , 2023

  7. [7]

    Memorization and generalization in neural code intelligence models,

    M. R. I. Rabin, A. Hussain, M. A. Alipour, and V . J. Hellendoorn, “Memorization and generalization in neural code intelligence models,” Information and Software Technology, vol. 153, p. 107066, 2023

  8. [8]

    Controlling the extraction of memorized data from large language models via prompt-tuning,

    M. Ozdayi, C. Peris, J. FitzGerald, C. Dupuy, J. Maj- mudar, H. Khan, R. Parikh, and R. Gupta, “Controlling the extraction of memorized data from large language models via prompt-tuning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Toronto, Canada: ACL, Jul. 2023, pp. 1512–1521

  9. [9]

    Your code secret belongs to me: neural code completion tools can memorize hard-coded credentials,

    Y . Huang, Y . Li, W. Wu, J. Zhang, and M. R. Lyu, “Your code secret belongs to me: neural code completion tools can memorize hard-coded credentials,” Proceedings of the ACM on Software Engineering , vol. 1, no. FSE, pp. 2515–2537, 2024

  10. [10]

    Traces of memorisation in large language models for code,

    A. Al-Kaswan, M. Izadi, and A. Van Deursen, “Traces of memorisation in large language models for code,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , 2024, pp. 1–12

  11. [11]

    CodexLeaks: Privacy leaks from code generation lan- guage models in GitHub Copilot,

    L. Niu, S. Mirza, Z. Maradni, and C. P ¨opper, “CodexLeaks: Privacy leaks from code generation lan- guage models in GitHub Copilot,” in 32nd USENIX Security Symposium, 2023, pp. 2133–2150

  12. [12]

    Unveiling memorization in code models,

    Z. Yang, Z. Zhao, C. Wang, J. Shi, D. Kim, D. Han, and D. Lo, “Unveiling memorization in code models,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , 2024, pp. 1–13

  13. [13]

    Evaluating Large Language Models Trained on Code

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman et al. , “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

  14. [14]

    Program Synthesis with Large Language Models

    J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le et al. , “Program synthesis with large language models,” arXiv preprint arXiv:2108.07732 , 2021

  15. [15]

    Code Llama: Open Foundation Models for Code

    B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapin et al. , “Code Llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023

  16. [16]

    StarCoder: may the source be with you!

    R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al. , “StarCoder: may the source be with you!” arXiv preprint arXiv:2305.06161, 2023

  17. [17]

    StarCoder 2 and The Stack v2: The Next Generation

    A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy- Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y . Weiet al., “Starcoder 2 and The Stack v2: The next generation,” arXiv preprint arXiv:2402.19173 , 2024

  18. [18]

    OLMo: Accelerating the Science of Language Models

    D. Groeneveld, I. Beltagy, P. Walsh, A. Bhagia, R. Kin- ney, O. Tafjord, A. H. Jha, H. Ivison, I. Magnusson, Y . Wanget al., “OLMo: Accelerating the science of lan- guage models,” arXiv preprint arXiv:2402.00838 , 2024

  19. [19]

    Dolma: An open corpus of three trillion tokens for language model pretraining research,

    L. Soldaini, R. Kinney, A. Bhagia, D. Schwenk, D. Atkinson, R. Authur, B. Bogin, K. Chandu, J. Dumas, Y . Elazaret al., “Dolma: An open corpus of three trillion tokens for language model pretraining research,” arXiv preprint arXiv:2402.00159, 2024

  20. [20]

    To Err is AI: A case study informing LLM flaw reporting practices,

    S. McGregor, A. Ettinger, N. Judd, P. Albee, L. Jiang, K. Rao, W. Smith, S. Longpre, A. Ghosh et al. , “To Err is AI: A case study informing LLM flaw reporting practices,” arXiv preprint arXiv:2410.12104 , 2024

  21. [21]

    Coordinated flaw disclosure for AI: Beyond security vulnerabilities,

    S. Cattell, A. Ghosh, and L.-A. Kaffee, “Coordinated flaw disclosure for AI: Beyond security vulnerabilities,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , vol. 7, 2024, pp. 267–280

  22. [22]

    Are large pre- trained language models leaking your personal informa- tion?

    J. Huang, H. Shao, and K. C.-C. Chang, “Are large pre- trained language models leaking your personal informa- tion?” in Findings of the Association for Computational Linguistics: EMNLP 2022 . Abu Dhabi, United Arab Emirates: ACL, Dec. 2022, pp. 2038–2047

  23. [23]

    Analyzing leakage of personally identifiable information in language models,

    N. Lukas, A. Salem, R. Sim, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Analyzing leakage of personally identifiable information in language models,” in 2023 IEEE Symposium on Security and Privacy (SP) . IEEE, 2023, pp. 346–363

  24. [24]

    What challenges do developers face about checked-in secrets in software artifacts?

    S. K. Basak, L. Neil, B. Reaves, and L. Williams, “What challenges do developers face about checked-in secrets in software artifacts?” in Proceedings of the 45th International Conference on Software Engineering , ser. ICSE ’23. IEEE Press, 2023, p. 1635–1647

  25. [25]

    Extracting training data from large language models,

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX Security Symposium (USENIX Security 21) , 2021, pp. 2633–2650

  26. [26]

    Code membership inference for detecting unauthorized data use in code pre-trained language models,

    S. Zhang, H. Li, and R. Ji, “Code membership inference for detecting unauthorized data use in code pre-trained language models,” in Findings of the Association for Computational Linguistics: EMNLP 2024 . Miami, Florida, USA: ACL, Nov. 2024, pp. 10 593–10 603

  27. [27]

    Learning to refuse: Towards mitigating privacy risks in LLMs,

    Z. Liu, T. Zhu, C. Tan, and W. Chen, “Learning to refuse: Towards mitigating privacy risks in LLMs,”arXiv preprint arXiv:2407.10058, 2024

  28. [28]

    Does learning require memorization? a short tale about a long tail,

    V . Feldman, “Does learning require memorization? a short tale about a long tail,” in Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020, pp. 954–959

  29. [29]

    Beyond memorization: Violating privacy via inference with large language models,

    R. Staab, M. Vero, M. Balunovic, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,” inThe Twelfth International Conference on Learning Representations , 2024

  30. [30]

    Not my voice! a taxonomy of ethical and safety harms of speech generators,

    W. Hutiri, O. Papakyriakopoulos, and A. Xiang, “Not my voice! a taxonomy of ethical and safety harms of speech generators,” in The 2024 ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro Brazil: ACM, Jun. 2024, pp. 359–376

  31. [31]

    AI evaluation authorities: A case study mapping model audits to persistent standards,

    A. Chadda, S. McGregor, J. Hostetler, and A. Bren- nen, “AI evaluation authorities: A case study mapping model audits to persistent standards,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 21, 2024, pp. 23 035–23 040

  32. [32]

    Analysing Mathematical Reasoning Abilities of Neural Models

    D. Saxton, E. Grefenstette, F. Hill, and P. Kohli, “Analysing mathematical reasoning abilities of neural models,” arXiv preprint arXiv:1904.01557 , 2019

  33. [33]

    An exploratory investigation into code license infringe- ments in large language model training datasets,

    J. Katzy, R. Popescu, A. Van Deursen, and M. Izadi, “An exploratory investigation into code license infringe- ments in large language model training datasets,” in Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software En- gineering, 2024, pp. 74–85