Probing Privacy Leaks in LLM-based Code Generation via Test Generation

Chunrong Fang; Juan Zhai; Weisong Sun; Xia Feng; Xiaofang Zhang; Yang Liu; Yifei Ge; Yuchen Chen; Zhenpeng Chen; Zhenyu Chen

arxiv: 2605.15248 · v1 · pith:ZGQEWN3Rnew · submitted 2026-05-14 · 💻 cs.SE · cs.CR

Probing Privacy Leaks in LLM-based Code Generation via Test Generation

Yifei Ge , Zhenpeng Chen , Weisong Sun , Yuchen Chen , Chunrong Fang , Juan Zhai , Xiaofang Zhang , Xia Feng

show 2 more authors

Yang Liu Zhenyu Chen

This is my paper

Pith reviewed 2026-05-19 16:16 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords privacy leakageLLM code generationtest generationpersonally identifiable informationprompt engineeringsoftware securitydata memorization

0 comments

The pith

A pipeline using test generation and a privacy feature library detects 2.56 times more privacy leaks in LLM code generation than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to better detect when large language models for code generation have memorized and reproduce personally identifiable information from their training data. Current detection approaches use manually or automatically designed prompts that do not match how such information actually appears in real code. The new pipeline instead simulates practical code generation tasks and uses automatically generated test cases driven by a library of privacy features to pull out the leaked data. Experiments across five popular LLMs confirm this finds substantially more verified leaks than prior baselines. This matters because it provides a more realistic way to audit privacy risks in widely used code assistants.

Core claim

We propose a pipeline that simulates practical privacy-related code generation scenarios and adopts a test-driven strategy to elicit the memorized information from the generated test cases. We further introduce an automatically constructed privacy feature library that replaces manual prompt engineering by providing realistic templates and examples to guide test case generation. Large-scale experiments on 5 widely used LLMs show that our pipeline exposes more confirmed privacy leakage, achieving a 2.56 times increase in detected leakage compared to existing baselines.

What carries the argument

A test-driven strategy paired with an automatically constructed privacy feature library that supplies realistic templates and examples to guide test case generation for eliciting memorized PII.

If this is right

LLMs can leak more PII under realistic code-generation prompts than ad-hoc tests reveal.
Automatic privacy feature libraries can replace manual prompt design for leakage detection.
The test-driven approach scales across multiple LLMs and yields consistently higher detection rates.
Confirmed leaks identified this way can guide targeted removal of sensitive data from training sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The detection method could be embedded into continuous auditing tools for deployed code LLMs.
Similar test-generation ideas might apply to other memorized content such as security vulnerabilities or copyrighted code.
Training pipelines could incorporate generated tests as a regularizer to discourage memorization of PII.

Load-bearing premise

Existing privacy-leakage detection methods rely on ad-hoc prompt construction that does not adequately approximate the real-world contexts in which PII appears in code corpora.

What would settle it

Apply both the new pipeline and baseline methods to the same five LLMs, count the distinct confirmed PII leaks extracted in each case, and check whether the new method produces at least 2.5 times as many verified leaks; failure to do so would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2605.15248 by Chunrong Fang, Juan Zhai, Weisong Sun, Xia Feng, Xiaofang Zhang, Yang Liu, Yifei Ge, Yuchen Chen, Zhenpeng Chen, Zhenyu Chen.

**Figure 2.** Figure 2: Overview of the privacy leakage evaluation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 5.** Figure 5: Clustering of template tokens (Λtmp), representing structure-dominated parts of the code. Different colors indicate different privacy attributes [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: shows the prompt format used in Section 4.2 to instantiate privacy-related codegeneration questions. Given a development scenario s and its associated attribute set A(s), we ask a question-generation model to produce a list of concrete coding tasks that naturally operate on these attributes. The resulting questions serve as the inputs to the evaluated LLM in the next stage, ensuring that privacy attribut… view at source ↗

**Figure 7.** Figure 7: An example of generating a code snippet involving privacy attributes, followed by test cases generated for it that contain potential privacy content. C Additional Quantitative Results C.1 Results for the GPT Family [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison between Human and Judge LLM. D Results Validation D.1 Judge LLM Reliability [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: visualizes the ablation study reported in the main text by comparing leakage outcomes with and without the privacy feature library (FL). Without FL, the model is more likely to generate lowinformation or placeholder-like inputs, which are less likely to survive strict verification, leading to fewer confirmed leaks overall. This effect is especially pronounced for attributes with stricter or less intuitiv… view at source ↗

read the original abstract

The widespread availability of large-scale code datasets has fueled the rapid development of large language models (LLMs) for code-related tasks. These datasets may include sensitive personally identifiable information (PII), which can lead to privacy leakage when LLMs memorize and reproduce it. However, existing privacy-leakage detection methods rely on ad-hoc prompt construction (manually or automatically designed). Therefore, they do not adequately approximate the real-world contexts in which PII appears in code corpora, making it difficult to extract realistic privacy leakage. In this paper, we propose a pipeline that simulates practical privacy-related code generation scenarios and adopts a test-driven strategy to elicit the memorized information from the generated test cases. We further introduce an automatically constructed privacy feature library that replaces manual prompt engineering by providing realistic templates and examples to guide test case generation. Large-scale experiments on 5 widely used LLMs show that our pipeline exposes more confirmed privacy leakage, achieving a 2.56 times increase in detected leakage compared to existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The test-driven pipeline with auto-built privacy library claims a 2.56x gain in confirmed leaks over baselines, but the confirmation step lacks the details needed to judge whether the gain is real.

read the letter

The key point is that the authors replace manual prompt engineering with an automatically constructed privacy feature library and use test generation to simulate realistic code contexts, then report exposing more confirmed privacy leaks across five LLMs. That combination is the main novelty they put forward against prior ad-hoc methods. It is a straightforward practical step that could make detection less dependent on researcher intuition about prompts. The large-scale experiments on common models give the work some weight as an empirical comparison. The framing around real-world PII appearance in code corpora is also reasonable and directly targets the weakness they identify in earlier work. The central soft spot is the confirmation of leaks. The abstract gives no description of the exact procedure, whether it is string matching to a known corpus, membership inference, manual review, or a heuristic tied to their own feature library. If the confirmation is not fully independent of the test-generation stage or is applied differently to baselines, the 2.56 times multiplier risks being an artifact rather than evidence of better elicitation. The stress-test concern lands here because the paper does not yet show that the method rules out plausible generation versus actual memorization. This paper is for people working on privacy and security tooling for code LLMs. A reader who needs concrete ideas for automated leak detection could extract the pipeline structure and library construction approach even if the evaluation needs more rigor. I would send it to peer review because the topic is practically relevant and the core idea is grounded enough to warrant referee time, provided the authors add clear, reproducible criteria for confirming leaks and uniform controls across conditions.

Referee Report

1 major / 1 minor

Summary. The paper proposes a test-driven pipeline augmented by an automatically constructed privacy feature library to simulate realistic code-generation contexts containing PII. It reports large-scale experiments on five LLMs showing that the pipeline elicits 2.56 times more confirmed privacy leaks than existing ad-hoc baseline methods.

Significance. If the confirmation procedure is shown to be independent of the generation method and able to distinguish memorization from plausible generation, the work would meaningfully advance privacy auditing for code LLMs by replacing manual prompt engineering with a more systematic, test-driven approach. The scale of the evaluation across five models is a clear strength.

major comments (1)

[Abstract and experimental results] Abstract and experimental results section: the central claim of a 2.56× increase in 'confirmed privacy leakage' rests on an unspecified confirmation procedure. No description is given of the exact matching criterion (string match against a known PII corpus, membership inference, manual review, or heuristic), whether the same procedure was applied uniformly to baselines, or how it rules out plausible PII-containing code rather than regurgitated training examples. This directly affects whether the measured improvement can be attributed to better elicitation of real-world contexts.

minor comments (1)

[Abstract] The abstract states that the privacy feature library 'replaces manual prompt engineering' but does not clarify whether any manual curation was still required for the library templates themselves.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the confirmation procedure below and will revise the paper accordingly to improve clarity.

read point-by-point responses

Referee: [Abstract and experimental results] Abstract and experimental results section: the central claim of a 2.56× increase in 'confirmed privacy leakage' rests on an unspecified confirmation procedure. No description is given of the exact matching criterion (string match against a known PII corpus, membership inference, manual review, or heuristic), whether the same procedure was applied uniformly to baselines, or how it rules out plausible PII-containing code rather than regurgitated training examples. This directly affects whether the measured improvement can be attributed to better elicitation of real-world contexts.

Authors: We agree that the confirmation procedure requires more explicit description in the abstract and experimental results section. In the revised manuscript we will add a dedicated paragraph detailing that confirmation relies on exact string matching against the specific PII instances stored in the automatically constructed privacy feature library. The identical matching criterion is applied uniformly to outputs from our pipeline and all baseline methods. The test-driven design further reduces the chance of counting plausible but non-memorized PII by requiring the generated code to reproduce the exact library-derived PII inside the realistic context supplied by the test case; we will expand the discussion to clarify why this targets regurgitation more directly than ad-hoc prompting. revision: yes

Circularity Check

0 steps flagged

Empirical comparison of leakage detection pipelines with no self-referential derivation

full rationale

The paper presents an empirical pipeline for eliciting privacy leaks via test generation and an automatically constructed feature library, then reports a measured 2.56× increase in confirmed leaks over baselines across five LLMs. No equations, fitted parameters renamed as predictions, or self-definitional steps appear. The central claim rests on experimental counts of confirmed leakage rather than any derivation that reduces to its own inputs by construction. The confirmation procedure is described as independent of the generation method in the abstract framing, and the work is self-contained against external baselines without load-bearing self-citation chains or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that LLMs memorize PII from training data and that simulated test cases can reliably elicit it without introducing artifacts.

axioms (2)

domain assumption LLMs trained on code corpora containing PII will memorize and reproduce that information under appropriate prompting.
Stated in the opening of the abstract as the motivation for the work.
domain assumption Test generation can approximate real-world code contexts sufficiently to extract memorized PII.
Core premise of the proposed pipeline.

pith-pipeline@v0.9.0 · 5729 in / 1223 out tokens · 59826 ms · 2026-05-19T16:16:11.816549+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a test-driven pipeline ... automatically constructed privacy feature library ... Judge LLM combined with GitHub-based verification
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

token-wise pseudo-NLL scores ... lower quartile ... semantic clustering with DBSCAN

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 5 internal anchors

[1]

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , journal =

Hamel Husain and Ho. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , journal =

work page
[2]

AI-based Programming Assistants for Privacy-related Code Generation: The Developers’ Experience , volume =

Kashumi Madampe and John Grundy and Nalin Arachchilage , journal =. AI-based Programming Assistants for Privacy-related Code Generation: The Developers’ Experience , volume =

work page
[3]

CodeT: Code Generation with Generated Tests

Codet: Code generation with generated tests , author=. arXiv preprint arXiv:2207.10397 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering , pages=

Using large language models to generate junit tests: An empirical study , author=. Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering , pages=

work page
[5]

arXiv preprint arXiv:2412.18843 , year=

Improving the readability of automatically generated tests using large language models , author=. arXiv preprint arXiv:2412.18843 , year=

work page arXiv
[6]

32nd USENIX Security Symposium (USENIX Security 23) , pages=

\ CodexLeaks \ : Privacy leaks from code generation language models in \ GitHub \ copilot , author=. 32nd USENIX Security Symposium (USENIX Security 23) , pages=

work page
[7]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=

work page
[8]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[9]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[10]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=

Enterprise data breach: causes, challenges, prevention, and future directions , author=. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=. 2017 , publisher=

work page 2017
[11]

2022 , howpublished =

work page 2022
[12]

2023 , howpublished =

work page 2023
[13]

2016 , howpublished =

work page 2016
[14]

, author=

How bad can it git? characterizing secret leakage in public github repositories. , author=. NDSS , year=

work page
[15]

A.; Kamath, G.; Kulkarni, J.; Lee, Y

Differentially private fine-tuning of language models , author=. arXiv preprint arXiv:2110.06500 , year=

work page arXiv
[16]

arXiv preprint arXiv:2205.01863 , year=

Provably confidential language modelling , author=. arXiv preprint arXiv:2205.01863 , year=

work page arXiv
[17]

Proceedings of the Third Workshop on Privacy in Natural Language Processing , pages=

Understanding unintended memorization in language models under federated learning , author=. Proceedings of the Third Workshop on Privacy in Natural Language Processing , pages=

work page
[18]

2018 IEEE 31st computer security foundations symposium (CSF) , pages=

Privacy risk in machine learning: Analyzing the connection to overfitting , author=. 2018 IEEE 31st computer security foundations symposium (CSF) , pages=. 2018 , organization=

work page 2018
[19]

2019 IEEE symposium on security and privacy (SP) , pages=

Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning , author=. 2019 IEEE symposium on security and privacy (SP) , pages=. 2019 , organization=

work page 2019
[20]

, author=

Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis. , author=. NDSS , volume=

work page
[21]

Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , pages=

Recon: Revealing and controlling pii leaks in mobile network traffic , author=. Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , pages=

work page
[22]

Optus notifies customers of cyberattack compromising customer information , year =

work page
[23]

Holmes , title =

A. Holmes , title =. 2021 , howpublished =

work page 2021
[24]

2025 , month = feb, url =

Daniel, Lars , title =. 2025 , month = feb, url =

work page 2025
[25]

2024 , month = dec, day =

Xiao Xiao , title =. 2024 , month = dec, day =

work page 2024
[26]

California Consumer Privacy Act of 2018 (CCPA) , year =

work page 2018
[27]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=

Secretbench: A dataset of software secrets , author=. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=. 2023 , organization=

work page 2023
[29]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization =

work page 2025
[30]

ACM computing surveys , volume=

Survey of hallucination in natural language generation , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023
[31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Security attacks on llm-based code completion tools , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[32]

28th USENIX security symposium (USENIX security 19) , pages=

The secret sharer: Evaluating and testing unintended memorization in neural networks , author=. 28th USENIX security symposium (USENIX security 19) , pages=

work page
[33]

Proceedings of the 2020 ACM SIGSAC conference on computer and communications security , pages=

Analyzing information leakage of updates to natural language models , author=. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security , pages=

work page 2020
[34]

arXiv preprint arXiv:2203.13920 , year=

Canary extraction in natural language understanding models , author=. arXiv preprint arXiv:2203.13920 , year=

work page arXiv
[35]

Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

Codereval: A benchmark of pragmatic code generation with generative pre-trained models , author=. Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

work page
[36]

2024 IEEE International Conference on Artificial Intelligence Testing (AITest) , pages=

ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation , author=. 2024 IEEE International Conference on Artificial Intelligence Testing (AITest) , pages=. 2024 , organization=

work page 2024
[37]

arXiv preprint arXiv:2503.03988 , year=

AI-based Programming Assistants for Privacy-related Code Generation: The Developers' Experience , author=. arXiv preprint arXiv:2503.03988 , year=

work page arXiv
[38]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Evaluating large language models in class-level code generation , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page
[39]

arXiv preprint arXiv:2412.18573 , year=

Top General Performance= Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark , author=. arXiv preprint arXiv:2412.18573 , year=

work page arXiv
[40]

Ippolito, F

Preventing verbatim memorization in language models gives a false sense of privacy , author=. arXiv preprint arXiv:2210.17546 , year=

work page arXiv
[41]

Transactions of the Association for Computational Linguistics , volume=

How much do language models copy from their training data? evaluating linguistic novelty in text generation using raven , author=. Transactions of the Association for Computational Linguistics , volume=

work page
[42]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Codebreaker: Dynamic Extraction Attacks on Code Language Models , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025
[43]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Fuzz-testing meets llm-based agents: An automated and efficient framework for jailbreaking text-to-image generation models , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025
[44]

International Journal of Advanced Computer Science and Applications , volume=

Personally identifiable information (pii) detection in the unstructured large text corpus using natural language processing and unsupervised learning technique , author=. International Journal of Advanced Computer Science and Applications , volume=. 2021 , publisher=

work page 2021
[45]

ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Recovering from privacy-preserving masking with large language models , author=. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2024 , organization=

work page 2024
[46]

The Twelfth International Conference on Learning Representations,

Weijia Shi and Anirudh Ajith and Mengzhou Xia and Yangsibo Huang and Daogao Liu and Terra Blevins and Danqi Chen and Luke Zettlemoyer , title =. The Twelfth International Conference on Learning Representations,

work page
[47]

arXiv preprint arXiv:2512.05459 , year=

PrivCode: When Code Generation Meets Differential Privacy , author=. arXiv preprint arXiv:2512.05459 , year=

work page arXiv
[48]

gpt-oss-120b & gpt-oss-20b Model Card

gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[49]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Unveiling memorization in code models , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page
[50]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Traces of memorisation in large language models for code , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page
[51]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[52]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024
[53]

GPT-4o System Card

Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[54]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , journal =

Hamel Husain and Ho. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search , journal =

work page

[2] [2]

AI-based Programming Assistants for Privacy-related Code Generation: The Developers’ Experience , volume =

Kashumi Madampe and John Grundy and Nalin Arachchilage , journal =. AI-based Programming Assistants for Privacy-related Code Generation: The Developers’ Experience , volume =

work page

[3] [3]

CodeT: Code Generation with Generated Tests

Codet: Code generation with generated tests , author=. arXiv preprint arXiv:2207.10397 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering , pages=

Using large language models to generate junit tests: An empirical study , author=. Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering , pages=

work page

[5] [5]

arXiv preprint arXiv:2412.18843 , year=

Improving the readability of automatically generated tests using large language models , author=. arXiv preprint arXiv:2412.18843 , year=

work page arXiv

[6] [6]

32nd USENIX Security Symposium (USENIX Security 23) , pages=

\ CodexLeaks \ : Privacy leaks from code generation language models in \ GitHub \ copilot , author=. 32nd USENIX Security Symposium (USENIX Security 23) , pages=

work page

[7] [7]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=

work page

[8] [8]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[9] [9]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page

[10] [10]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=

Enterprise data breach: causes, challenges, prevention, and future directions , author=. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , volume=. 2017 , publisher=

work page 2017

[11] [11]

2022 , howpublished =

work page 2022

[12] [12]

2023 , howpublished =

work page 2023

[13] [13]

2016 , howpublished =

work page 2016

[14] [14]

, author=

How bad can it git? characterizing secret leakage in public github repositories. , author=. NDSS , year=

work page

[15] [15]

A.; Kamath, G.; Kulkarni, J.; Lee, Y

Differentially private fine-tuning of language models , author=. arXiv preprint arXiv:2110.06500 , year=

work page arXiv

[16] [16]

arXiv preprint arXiv:2205.01863 , year=

Provably confidential language modelling , author=. arXiv preprint arXiv:2205.01863 , year=

work page arXiv

[17] [17]

Proceedings of the Third Workshop on Privacy in Natural Language Processing , pages=

Understanding unintended memorization in language models under federated learning , author=. Proceedings of the Third Workshop on Privacy in Natural Language Processing , pages=

work page

[18] [18]

2018 IEEE 31st computer security foundations symposium (CSF) , pages=

Privacy risk in machine learning: Analyzing the connection to overfitting , author=. 2018 IEEE 31st computer security foundations symposium (CSF) , pages=. 2018 , organization=

work page 2018

[19] [19]

2019 IEEE symposium on security and privacy (SP) , pages=

Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning , author=. 2019 IEEE symposium on security and privacy (SP) , pages=. 2019 , organization=

work page 2019

[20] [20]

, author=

Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis. , author=. NDSS , volume=

work page

[21] [21]

Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , pages=

Recon: Revealing and controlling pii leaks in mobile network traffic , author=. Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , pages=

work page

[22] [22]

Optus notifies customers of cyberattack compromising customer information , year =

work page

[23] [23]

Holmes , title =

A. Holmes , title =. 2021 , howpublished =

work page 2021

[24] [24]

2025 , month = feb, url =

Daniel, Lars , title =. 2025 , month = feb, url =

work page 2025

[25] [25]

2024 , month = dec, day =

Xiao Xiao , title =. 2024 , month = dec, day =

work page 2024

[26] [26]

California Consumer Privacy Act of 2018 (CCPA) , year =

work page 2018

[27] [27]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=

Secretbench: A dataset of software secrets , author=. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) , pages=. 2023 , organization=

work page 2023

[29] [29]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization =

work page 2025

[30] [30]

ACM computing surveys , volume=

Survey of hallucination in natural language generation , author=. ACM computing surveys , volume=. 2023 , publisher=

work page 2023

[31] [31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Security attacks on llm-based code completion tools , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[32] [32]

28th USENIX security symposium (USENIX security 19) , pages=

The secret sharer: Evaluating and testing unintended memorization in neural networks , author=. 28th USENIX security symposium (USENIX security 19) , pages=

work page

[33] [33]

Proceedings of the 2020 ACM SIGSAC conference on computer and communications security , pages=

Analyzing information leakage of updates to natural language models , author=. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security , pages=

work page 2020

[34] [34]

arXiv preprint arXiv:2203.13920 , year=

Canary extraction in natural language understanding models , author=. arXiv preprint arXiv:2203.13920 , year=

work page arXiv

[35] [35]

Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

Codereval: A benchmark of pragmatic code generation with generative pre-trained models , author=. Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

work page

[36] [36]

2024 IEEE International Conference on Artificial Intelligence Testing (AITest) , pages=

ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation , author=. 2024 IEEE International Conference on Artificial Intelligence Testing (AITest) , pages=. 2024 , organization=

work page 2024

[37] [37]

arXiv preprint arXiv:2503.03988 , year=

AI-based Programming Assistants for Privacy-related Code Generation: The Developers' Experience , author=. arXiv preprint arXiv:2503.03988 , year=

work page arXiv

[38] [38]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Evaluating large language models in class-level code generation , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page

[39] [39]

arXiv preprint arXiv:2412.18573 , year=

Top General Performance= Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark , author=. arXiv preprint arXiv:2412.18573 , year=

work page arXiv

[40] [40]

Ippolito, F

Preventing verbatim memorization in language models gives a false sense of privacy , author=. arXiv preprint arXiv:2210.17546 , year=

work page arXiv

[41] [41]

Transactions of the Association for Computational Linguistics , volume=

How much do language models copy from their training data? evaluating linguistic novelty in text generation using raven , author=. Transactions of the Association for Computational Linguistics , volume=

work page

[42] [42]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Codebreaker: Dynamic Extraction Attacks on Code Language Models , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025

[43] [43]

2025 IEEE Symposium on Security and Privacy (SP) , pages=

Fuzz-testing meets llm-based agents: An automated and efficient framework for jailbreaking text-to-image generation models , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

work page 2025

[44] [44]

International Journal of Advanced Computer Science and Applications , volume=

Personally identifiable information (pii) detection in the unstructured large text corpus using natural language processing and unsupervised learning technique , author=. International Journal of Advanced Computer Science and Applications , volume=. 2021 , publisher=

work page 2021

[45] [45]

ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

Recovering from privacy-preserving masking with large language models , author=. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2024 , organization=

work page 2024

[46] [46]

The Twelfth International Conference on Learning Representations,

Weijia Shi and Anirudh Ajith and Mengzhou Xia and Yangsibo Huang and Daogao Liu and Terra Blevins and Danqi Chen and Luke Zettlemoyer , title =. The Twelfth International Conference on Learning Representations,

work page

[47] [47]

arXiv preprint arXiv:2512.05459 , year=

PrivCode: When Code Generation Meets Differential Privacy , author=. arXiv preprint arXiv:2512.05459 , year=

work page arXiv

[48] [48]

gpt-oss-120b & gpt-oss-20b Model Card

gpt-oss-120b & gpt-oss-20b model card , author=. arXiv preprint arXiv:2508.10925 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[49] [49]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Unveiling memorization in code models , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page

[50] [50]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Traces of memorisation in large language models for code , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

work page

[51] [51]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

work page 2025

[52] [52]

2024 , eprint=

DeepSeek-V3 Technical Report , author=. 2024 , eprint=

work page 2024

[53] [53]

GPT-4o System Card

Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv