Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation
Pith reviewed 2026-05-22 22:18 UTC · model grok-4.3
The pith
Data processing changes for code-generating LLMs shift unintended memorization risks and can raise one disclosure type while lowering another.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that changes in data source and processing are associated with substantial changes in unintended memorization risk, that the same operational changes may increase one risk while mitigating another, and that the risk of disclosing sensitive information varies not only by prompt strategies or test datasets but also by the types of sensitive information.
What carries the argument
Side-by-side assessment methods that measure unintentional disclosure (model reveals secrets unprompted) and malicious disclosure (attacker with partial knowledge extracts secrets) on matched releases of training datasets and models.
If this is right
- Operational choices made during dataset construction directly determine the balance of disclosure risks in the resulting model.
- Mitigating one form of disclosure can increase exposure on the other form, so both must be tracked together.
- Disclosure likelihood is not uniform; certain categories of sensitive information are more readily surfaced than others under the same conditions.
- Privacy and security testing must be performed at the level of the training-data supply chain rather than only on the final model.
Where Pith is reading between the lines
- Teams building code LLMs could run the same paired assessments on multiple candidate data pipelines to select the variant with the most acceptable risk profile.
- The observed variation by information type points toward possible targeted filtering steps that focus on high-risk categories such as credentials or API tokens.
- The side-by-side measurement approach could be applied to non-code models or to other data modalities to check whether similar trade-offs appear.
Load-bearing premise
The assessment methods accurately measure true disclosure risks without substantial false positives or negatives, and the tested categories of sensitive information represent real secrets found in code repositories.
What would settle it
A controlled test in which a documented data-processing change produces no measurable difference in either disclosure risk across repeated prompt sets and information categories.
Figures
read the original abstract
This paper explores the risk that a large language model (LLM) trained for code generation on data mined from software repositories will generate content that discloses sensitive information included in its training data. We decompose this risk, known in the literature as ``unintended memorization,'' into two components: unintentional disclosure (where an LLM presents secrets to users without the user seeking them out) and malicious disclosure (where an LLM presents secrets to an attacker equipped with partial knowledge of the training data). We observe that while existing work mostly anticipates malicious disclosure, unintentional disclosure is also a concern. We describe methods to assess unintentional and malicious disclosure risks side-by-side across different releases of training datasets and models. We demonstrate these methods through an independent assessment of the Open Language Model (OLMo) family of models and its Dolma training datasets. Our results show, first, that changes in data source and processing are associated with substantial changes in unintended memorization risk; second, that the same set of operational changes may increase one risk while mitigating another; and, third, that the risk of disclosing sensitive information varies not only by prompt strategies or test datasets but also by the types of sensitive information. These contributions rely on data mining to enable greater privacy and security testing required for the LLM training data supply chain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper decomposes unintended memorization risk in code-generation LLMs into unintentional disclosure (secrets revealed without attacker intent) and malicious disclosure (secrets revealed to an attacker with partial knowledge). It presents side-by-side assessment methods for these risks, applies them to successive OLMo model releases and Dolma training datasets, and reports three findings: data-source and processing changes produce substantial shifts in risk; the same changes can increase one risk while decreasing the other; and disclosure risk varies by prompt strategy, test dataset, and category of sensitive information.
Significance. If the assessment procedures are shown to be reliable, the work supplies a concrete, reproducible framework for measuring privacy/security trade-offs in the LLM data supply chain and demonstrates that operational decisions (filtering, deduplication, source selection) are not uniformly risk-reducing. This is directly actionable for practitioners curating training corpora and for regulators evaluating model releases.
major comments (3)
- [Methods] Methods section: the three headline results rest on the claim that the described side-by-side procedures correctly quantify true disclosure risk. No ground-truth injection experiments, false-positive/negative rate measurements, or coverage argument for the chosen sensitive-information taxonomy are supplied; without these, observed differences could be measurement artifacts.
- [Results] Results / Evaluation: the abstract and results narrative report associations and differential effects but supply no measurement protocols, statistical controls, error bars, or data-exclusion rules. This prevents verification that the reported changes in risk are robust to reasonable variations in evaluation design.
- [§4] §4 (Application to OLMo/Dolma): the claim that risk varies by type of sensitive information is load-bearing for the third contribution, yet the paper provides no argument that the selected categories are representative of secrets actually present in public code repositories.
minor comments (2)
- [Methods] Define the exact operational criteria used to label an output as an unintentional versus malicious disclosure; a short decision tree or pseudocode would remove ambiguity.
- [Discussion] Add a limitations paragraph discussing the scope of the secret taxonomy and the generalizability of the OLMo/Dolma findings to other code LLMs.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's comments. We appreciate the constructive feedback and will revise the manuscript to address the concerns raised regarding the validation of our assessment methods and the robustness of our results.
read point-by-point responses
-
Referee: [Methods] Methods section: the three headline results rest on the claim that the described side-by-side procedures correctly quantify true disclosure risk. No ground-truth injection experiments, false-positive/negative rate measurements, or coverage argument for the chosen sensitive-information taxonomy are supplied; without these, observed differences could be measurement artifacts.
Authors: We agree that the manuscript would be strengthened by a more explicit discussion of the limitations of our assessment procedures. The methods are intended to provide observable measures of disclosure under controlled prompt conditions rather than a complete quantification of 'true' risk. In the revised version, we will expand the Methods section to include a dedicated limitations subsection addressing potential measurement artifacts, the absence of ground-truth injections, and the rationale for the sensitive information taxonomy. We note that performing injection experiments would require a separate study design and is beyond the scope of the current work, which focuses on real-world data from Dolma. revision: partial
-
Referee: [Results] Results / Evaluation: the abstract and results narrative report associations and differential effects but supply no measurement protocols, statistical controls, error bars, or data-exclusion rules. This prevents verification that the reported changes in risk are robust to reasonable variations in evaluation design.
Authors: We will revise the Results section to include detailed measurement protocols, including the exact criteria for identifying disclosures, data exclusion rules (e.g., how duplicates or non-sensitive items were handled), and any statistical analyses performed. Where applicable, we will add error bars based on variations across prompt templates or subsets of the test data. This will improve the verifiability of the reported changes. revision: yes
-
Referee: [§4] §4 (Application to OLMo/Dolma): the claim that risk varies by type of sensitive information is load-bearing for the third contribution, yet the paper provides no argument that the selected categories are representative of secrets actually present in public code repositories.
Authors: The sensitive information categories were selected based on established taxonomies from secret detection literature (e.g., references to tools like TruffleHog and studies on credential leaks in GitHub). We will add an argument in §4 or a new appendix section citing evidence from public reports on the prevalence of these secret types in code repositories to support their representativeness. If additional categories are suggested, we can discuss expanding the taxonomy in future work. revision: partial
Circularity Check
Empirical measurement study with no derivation chain or self-referential reductions
full rationale
The paper is an empirical assessment of disclosure risks in LLMs using side-by-side methods applied to public OLMo/Dolma releases. It reports observed associations between data processing changes and risk levels, plus variation by information type. No equations, fitted parameters renamed as predictions, self-definitional constructs, uniqueness theorems, or ansatzes appear in the described chain. The central claims rest on the application of described assessment procedures to datasets, which are externally falsifiable via replication on the same public artifacts rather than reducing to self-citation or internal definition. The reader's noted assumption about method accuracy is a validity concern, not a circularity issue per the rules.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sensitive information in code repositories can be identified and categorized into testable types for disclosure measurement.
Forward citations
Cited by 1 Pith paper
-
Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot
Forum discussions highlight four security concerns with GitHub Copilot: data leakage, code licensing problems, adversarial attacks such as prompt injection, and generation of insecure code.
Reference graph
Works this paper leans on
-
[1]
The Stack: 3 TB of permissively licensed source code,
D. Kocetkov, R. Li, L. B. Allal, J. Li, C. Mou, C. M. Ferrandis, Y . Jernite, M. Mitchell, S. Hughes, T. Wolf et al., “The Stack: 3 TB of permissively licensed source code,” arXiv preprint arXiv:2211.15533 , 2022
-
[2]
Automated detection of password leakage from public github repos- itories,
R. Feng, Z. Yan, S. Peng, and Y . Zhang, “Automated detection of password leakage from public github repos- itories,” in Proceedings of the 44th International Confer- ence on Software Engineering , 2022, pp. 175–186
work page 2022
-
[3]
A. Krause, J. H. Klemmer, N. Huaman, D. Wermke, Y . Acar, and S. Fahl, “Pushed by accident: A mixed- methods study on strategies of handling secret infor- mation in source code repositories,” in 32nd USENIX Security Symposium, 2023, pp. 2527–2544
work page 2023
-
[4]
The secret sharer: Evaluating and testing unintended memorization in neural networks,
N. Carlini, C. Liu, ´U. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in 28th USENIX Se- curity Symposium, 2019, pp. 267–284
work page 2019
-
[5]
Understanding unintended memorization in lan- guage models under federated learning,
O. D. Thakkar, S. Ramaswamy, R. Mathews, and F. Bea- ufays, “Understanding unintended memorization in lan- guage models under federated learning,” in Proceedings of the Third Workshop on Privacy in Natural Language Processing. Online: ACL, Jun. 2021, pp. 1–10
work page 2021
-
[6]
Quantifying memorization across neural language models,
N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying memorization across neural language models,” in The Eleventh International Confer- ence on Learning Representations , 2023
work page 2023
-
[7]
Memorization and generalization in neural code intelligence models,
M. R. I. Rabin, A. Hussain, M. A. Alipour, and V . J. Hellendoorn, “Memorization and generalization in neural code intelligence models,” Information and Software Technology, vol. 153, p. 107066, 2023
work page 2023
-
[8]
Controlling the extraction of memorized data from large language models via prompt-tuning,
M. Ozdayi, C. Peris, J. FitzGerald, C. Dupuy, J. Maj- mudar, H. Khan, R. Parikh, and R. Gupta, “Controlling the extraction of memorized data from large language models via prompt-tuning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Toronto, Canada: ACL, Jul. 2023, pp. 1512–1521
work page 2023
-
[9]
Your code secret belongs to me: neural code completion tools can memorize hard-coded credentials,
Y . Huang, Y . Li, W. Wu, J. Zhang, and M. R. Lyu, “Your code secret belongs to me: neural code completion tools can memorize hard-coded credentials,” Proceedings of the ACM on Software Engineering , vol. 1, no. FSE, pp. 2515–2537, 2024
work page 2024
-
[10]
Traces of memorisation in large language models for code,
A. Al-Kaswan, M. Izadi, and A. Van Deursen, “Traces of memorisation in large language models for code,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , 2024, pp. 1–12
work page 2024
-
[11]
CodexLeaks: Privacy leaks from code generation lan- guage models in GitHub Copilot,
L. Niu, S. Mirza, Z. Maradni, and C. P ¨opper, “CodexLeaks: Privacy leaks from code generation lan- guage models in GitHub Copilot,” in 32nd USENIX Security Symposium, 2023, pp. 2133–2150
work page 2023
-
[12]
Unveiling memorization in code models,
Z. Yang, Z. Zhao, C. Wang, J. Shi, D. Kim, D. Han, and D. Lo, “Unveiling memorization in code models,” in Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , 2024, pp. 1–13
work page 2024
-
[13]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman et al. , “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
Program Synthesis with Large Language Models
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le et al. , “Program synthesis with large language models,” arXiv preprint arXiv:2108.07732 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Code Llama: Open Foundation Models for Code
B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapin et al. , “Code Llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
StarCoder: may the source be with you!
R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al. , “StarCoder: may the source be with you!” arXiv preprint arXiv:2305.06161, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
StarCoder 2 and The Stack v2: The Next Generation
A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy- Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y . Weiet al., “Starcoder 2 and The Stack v2: The next generation,” arXiv preprint arXiv:2402.19173 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
OLMo: Accelerating the Science of Language Models
D. Groeneveld, I. Beltagy, P. Walsh, A. Bhagia, R. Kin- ney, O. Tafjord, A. H. Jha, H. Ivison, I. Magnusson, Y . Wanget al., “OLMo: Accelerating the science of lan- guage models,” arXiv preprint arXiv:2402.00838 , 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Dolma: An open corpus of three trillion tokens for language model pretraining research,
L. Soldaini, R. Kinney, A. Bhagia, D. Schwenk, D. Atkinson, R. Authur, B. Bogin, K. Chandu, J. Dumas, Y . Elazaret al., “Dolma: An open corpus of three trillion tokens for language model pretraining research,” arXiv preprint arXiv:2402.00159, 2024
-
[20]
To Err is AI: A case study informing LLM flaw reporting practices,
S. McGregor, A. Ettinger, N. Judd, P. Albee, L. Jiang, K. Rao, W. Smith, S. Longpre, A. Ghosh et al. , “To Err is AI: A case study informing LLM flaw reporting practices,” arXiv preprint arXiv:2410.12104 , 2024
-
[21]
Coordinated flaw disclosure for AI: Beyond security vulnerabilities,
S. Cattell, A. Ghosh, and L.-A. Kaffee, “Coordinated flaw disclosure for AI: Beyond security vulnerabilities,” in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , vol. 7, 2024, pp. 267–280
work page 2024
-
[22]
Are large pre- trained language models leaking your personal informa- tion?
J. Huang, H. Shao, and K. C.-C. Chang, “Are large pre- trained language models leaking your personal informa- tion?” in Findings of the Association for Computational Linguistics: EMNLP 2022 . Abu Dhabi, United Arab Emirates: ACL, Dec. 2022, pp. 2038–2047
work page 2022
-
[23]
Analyzing leakage of personally identifiable information in language models,
N. Lukas, A. Salem, R. Sim, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Analyzing leakage of personally identifiable information in language models,” in 2023 IEEE Symposium on Security and Privacy (SP) . IEEE, 2023, pp. 346–363
work page 2023
-
[24]
What challenges do developers face about checked-in secrets in software artifacts?
S. K. Basak, L. Neil, B. Reaves, and L. Williams, “What challenges do developers face about checked-in secrets in software artifacts?” in Proceedings of the 45th International Conference on Software Engineering , ser. ICSE ’23. IEEE Press, 2023, p. 1635–1647
work page 2023
-
[25]
Extracting training data from large language models,
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX Security Symposium (USENIX Security 21) , 2021, pp. 2633–2650
work page 2021
-
[26]
Code membership inference for detecting unauthorized data use in code pre-trained language models,
S. Zhang, H. Li, and R. Ji, “Code membership inference for detecting unauthorized data use in code pre-trained language models,” in Findings of the Association for Computational Linguistics: EMNLP 2024 . Miami, Florida, USA: ACL, Nov. 2024, pp. 10 593–10 603
work page 2024
-
[27]
Learning to refuse: Towards mitigating privacy risks in LLMs,
Z. Liu, T. Zhu, C. Tan, and W. Chen, “Learning to refuse: Towards mitigating privacy risks in LLMs,”arXiv preprint arXiv:2407.10058, 2024
-
[28]
Does learning require memorization? a short tale about a long tail,
V . Feldman, “Does learning require memorization? a short tale about a long tail,” in Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, 2020, pp. 954–959
work page 2020
-
[29]
Beyond memorization: Violating privacy via inference with large language models,
R. Staab, M. Vero, M. Balunovic, and M. Vechev, “Beyond memorization: Violating privacy via inference with large language models,” inThe Twelfth International Conference on Learning Representations , 2024
work page 2024
-
[30]
Not my voice! a taxonomy of ethical and safety harms of speech generators,
W. Hutiri, O. Papakyriakopoulos, and A. Xiang, “Not my voice! a taxonomy of ethical and safety harms of speech generators,” in The 2024 ACM Conference on Fairness, Accountability, and Transparency. Rio de Janeiro Brazil: ACM, Jun. 2024, pp. 359–376
work page 2024
-
[31]
AI evaluation authorities: A case study mapping model audits to persistent standards,
A. Chadda, S. McGregor, J. Hostetler, and A. Bren- nen, “AI evaluation authorities: A case study mapping model audits to persistent standards,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 21, 2024, pp. 23 035–23 040
work page 2024
-
[32]
Analysing Mathematical Reasoning Abilities of Neural Models
D. Saxton, E. Grefenstette, F. Hill, and P. Kohli, “Analysing mathematical reasoning abilities of neural models,” arXiv preprint arXiv:1904.01557 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[33]
J. Katzy, R. Popescu, A. Van Deursen, and M. Izadi, “An exploratory investigation into code license infringe- ments in large language model training datasets,” in Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software En- gineering, 2024, pp. 74–85
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.