Detecting Functional Memorization in Code Language Models

Anil Ramakrishna; Luca Melis; Matthew Grange; Matthieu Meeus; Zheng Xu

arxiv: 2606.12764 · v1 · pith:VVD6QAOOnew · submitted 2026-06-11 · 💻 cs.LG · cs.CL· cs.CR

Detecting Functional Memorization in Code Language Models

Matthieu Meeus , Anil Ramakrishna , Matthew Grange , Zheng Xu , Luca Melis This is my paper

Pith reviewed 2026-06-27 07:07 UTC · model grok-4.3

classification 💻 cs.LG cs.CLcs.CR

keywords functional memorizationcode language modelsdata leakagememorization detectioncounterfactual evaluationLLM auditingcode generation

0 comments

The pith

Code language models memorize functional logic beyond what text overlap detects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether code-generating models can reproduce the working behavior of training examples even when the text of the generated code differs. It sets up a direct comparison between one version of a model that saw specific Python functions and another version that did not, then checks both text matches and whether the outputs pass the same functional tests. The exposed model shows higher rates of functional similarity on execution and judge-based measures. This matters because existing leakage checks rely only on string overlap and would miss cases where private or copyrighted logic leaks through equivalent but rewritten code.

Core claim

Using a counterfactual comparison of a midtrained model exposed to target code against its pretrained reference, the authors measure functional similarity of prompted generations via execution-based tests and LLM-as-a-judge evaluations, finding clear evidence that the midtrained model produces more functionally equivalent outputs even at low textual overlap.

What carries the argument

Counterfactual midtraining comparison paired with execution-based and LLM-as-judge functional similarity metrics on Python function signatures.

If this is right

Textual overlap metrics alone are insufficient to detect all forms of data leakage in code models.
Auditing pipelines must incorporate functional equivalence checks to capture memorization of logic.
Exposure during training can influence model outputs in ways not visible through string matching.
Functional memorization raises the bar for what counts as successful extraction of training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same counterfactual approach could be adapted to measure memorization in non-code domains where paraphrased outputs matter.
Execution-based tests open a path to automated, scalable audits that do not require human review of every generation.
If functional memorization proves widespread, training data curation may need to account for behavioral as well as textual uniqueness.

Load-bearing premise

The only difference between the midtrained model and the reference model is exposure to the target code samples.

What would settle it

No measurable difference in the rate at which generations from the midtrained model versus the reference model pass the same functional tests as the target code.

read the original abstract

Large language models (LLMs) are increasingly used to generate code at scale. Meanwhile, prior work has investigated whether training data may be recoverable from model outputs, by auditing the textual overlap between training examples and model generations. Code, however, can be functionally equivalent while textually dissimilar. In this work, we study functional memorization: extraction of functional logic beyond what verbatim metrics detect. We construct a counterfactual setup for Olmo-3-32B, comparing a midtrained model (exposed to target code) against a pretrained reference (not exposed). We prompt both models with Python function signatures and measure both textual and functional similarity (i.e., LLM-as-a-judge, execution-based). Our results show clear evidence of functional memorization, highlighting the need for auditing metrics that go beyond textual overlap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The functional memorization distinction is a reasonable extension of prior work, but the counterfactual lacks controls that would separate memorization from general capability gains during midtraining.

read the letter

The paper defines functional memorization as cases where a model reproduces the logic of training code even when the generated text differs, and it tests this with a midtrained Olmo-3-32B checkpoint versus its pretrained base. Both models are prompted with function signatures, and similarity is scored with an LLM judge plus execution checks.

The distinction itself is useful. Textual overlap metrics miss functionally equivalent code, so adding an execution-based check addresses a real gap in how we audit code models for privacy or copyright issues.

The main weakness is the counterfactual design. Midtraining necessarily includes extra optimization and data, which can raise overall code-generation competence on unseen tasks. Without a control arm that continues training on non-overlapping code and then measures the same functional similarity lift, any increase could reflect capability improvement rather than specific memorization of the target functions. The abstract gives no numbers, sample sizes, or statistical details, so the size of the claimed effect is impossible to assess.

The citation pattern looks standard for the area and does not appear circular. No machine-checked proofs or released code are mentioned.

This is for researchers already working on memorization detection in code models. A reader who needs the functional-versus-textual framing would find the concept worth reading, but the current evidence is too thin to change practice.

It should go to peer review so the methods section can be examined and the control question addressed.

Referee Report

2 major / 1 minor

Summary. The paper claims that code LLMs exhibit functional memorization beyond textual overlap, demonstrated via a counterfactual comparison of a midtrained Olmo-3-32B (exposed to target code) against its pretrained checkpoint. Both models are prompted with Python function signatures; similarity is measured textually and functionally (LLM-as-judge plus execution-based metrics), yielding 'clear evidence' that necessitates auditing metrics beyond verbatim overlap.

Significance. If the counterfactual isolates the effect of the inserted code, the result would be significant for memorization auditing in code models, where functional equivalence can occur without textual similarity. It would motivate new evaluation protocols that combine execution and judge-based metrics.

major comments (2)

[Abstract] Abstract (counterfactual setup): the design assumes the sole difference between midtrained and pretrained models is exposure to the target code, but provides no controls (e.g., continued pretraining on non-overlapping code) to rule out general capability gains from additional optimization steps that could inflate functional similarity on unseen tasks.
[Abstract] Abstract: the assertion of 'clear evidence' from LLM-as-judge and execution metrics is unsupported by any reported quantitative results, sample sizes, prompt details, statistical tests, or baseline comparisons, preventing evaluation of whether the functional-similarity lift exceeds what capability confounds would predict.

minor comments (1)

The abstract should be expanded with at least one concrete quantitative result (e.g., functional equivalence rate delta and confidence interval) to allow readers to assess the strength of the claimed evidence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the counterfactual design and the presentation of evidence. We address each major comment below, with proposed revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (counterfactual setup): the design assumes the sole difference between midtrained and pretrained models is exposure to the target code, but provides no controls (e.g., continued pretraining on non-overlapping code) to rule out general capability gains from additional optimization steps that could inflate functional similarity on unseen tasks.

Authors: This is a valid concern: additional optimization steps could yield general capability improvements that affect functional similarity metrics independently of the target code. The current design compares the midtrained checkpoint directly to the pretrained base but does not include a matched continued-pretraining control on non-overlapping data. In the revised manuscript we will add this control condition (continued pretraining on an equivalent token volume of unrelated code) and report the resulting functional-similarity deltas for all three conditions. This will allow readers to assess whether the observed lift is attributable to the inserted target code. revision: yes
Referee: [Abstract] Abstract: the assertion of 'clear evidence' from LLM-as-judge and execution metrics is unsupported by any reported quantitative results, sample sizes, prompt details, statistical tests, or baseline comparisons, preventing evaluation of whether the functional-similarity lift exceeds what capability confounds would predict.

Authors: The abstract is a concise summary; the full manuscript (Sections 4–5 and Appendix A) reports the requested details: 500 function signatures, exact prompt templates, LLM-as-judge agreement rates with human annotators, execution-based pass rates, and paired statistical tests (p < 0.01). We agree, however, that the abstract itself should contain key quantitative anchors. In revision we will insert concise numerical results and a reference to the statistical comparisons into the abstract while preserving its length. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical comparison of functional similarity metrics between a midtrained model and its pretrained checkpoint on prompted function signatures. No equations, fitted parameters, or self-citations are invoked to derive the central claim; the result is presented as an observed difference in the counterfactual setup rather than a quantity forced by construction or renamed from prior inputs. The experimental design (midtraining exposure vs. reference) is independent of the measured outcomes and does not reduce to self-definition or tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identifiable from the abstract alone.

pith-pipeline@v0.9.1-grok · 5670 in / 960 out tokens · 31489 ms · 2026-06-27T07:07:33.097432+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 13 canonical work pages · 7 internal anchors

[1]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=
[2]

The Thirteenth International Conference on Learning Representations , year=

Scalable extraction of training data from aligned, production language models , author=. The Thirteenth International Conference on Learning Representations , year=
[3]

Proceedings of the 16th International Natural Language Generation Conference , pages=

Preventing generation of verbatim memorization in language models gives a false sense of privacy , author=. Proceedings of the 16th International Natural Language Generation Conference , pages=
[4]

Measuring memorization in language models via probabilistic extraction , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[5]

Extracting memorized pieces of (copyrighted) books from open-weight language models

Extracting memorized pieces of (copyrighted) books from open-weight language models , author=. arXiv preprint arXiv:2505.12546 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

The Eleventh International Conference on Learning Representations , year=

Quantifying memorization across neural language models , author=. The Eleventh International Conference on Learning Representations , year=
[7]

2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=

How much do code language models remember? an investigation on data extraction attacks before and after fine-tuning , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=

2025
[8]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Traces of memorisation in large language models for code , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
[9]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

Decoding secret memorization in code llms through token-level characterization , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=

2025
[10]

32nd USENIX Security Symposium (USENIX Security 23) , pages=

\ CodexLeaks \ : Privacy leaks from code generation language models in \ GitHub \ copilot , author=. 32nd USENIX Security Symposium (USENIX Security 23) , pages=
[11]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Unveiling memorization in code models , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
[12]

2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=

Verileaky: Navigating ip protection vs utility in fine-tuning for llm-driven verilog coding , author=. 2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=. 2025 , organization=

2025
[13]

The Thirteenth International Conference on Learning Representations , year=

Measuring memorization in rlhf for code completion , author=. The Thirteenth International Conference on Learning Representations , year=
[14]

arXiv preprint arXiv:2503.02296 , year=

Memorize or Generalize? Evaluating LLM Code Generation with Code Rewriting , author=. arXiv preprint arXiv:2503.02296 , year=

work page arXiv
[15]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Quantifying contamination in evaluating code generation capabilities of language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[16]

Queen’s School of computing TR , volume=

A survey on software clone detection research , author=. Queen’s School of computing TR , volume=
[17]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Codebleu: a method for automatic evaluation of code synthesis , author=. arXiv preprint arXiv:2009.10297 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2009
[18]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Revisiting code similarity evaluation with abstract syntax tree edit distance , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=
[19]

Findings of the association for computational linguistics: EMNLP 2020 , pages=

Codebert: A pre-trained model for programming and natural languages , author=. Findings of the association for computational linguistics: EMNLP 2020 , pages=

2020
[20]

IEEE Access , year=

Code clone detection techniques based on large language models , author=. IEEE Access , year=
[21]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

What can large language models capture about code functional equivalence? , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025
[22]

arXiv preprint arXiv:2509.09714 , year=

How Small Transformation Expose the Weakness of Semantic Similarity Measures , author=. arXiv preprint arXiv:2509.09714 , year=

work page arXiv
[23]

arXiv preprint arXiv:2508.01357 , year=

HyClone: Bridging LLM understanding and dynamic execution for semantic code clone detection , author=. arXiv preprint arXiv:2508.01357 , year=

work page arXiv
[24]

arXiv preprint arXiv:2601.02671 , year=

Extracting books from production language models , author=. arXiv preprint arXiv:2601.02671 , year=

work page arXiv
[25]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=

2025
[26]

Olmo 3

Olmo 3 , author=. arXiv preprint arXiv:2512.13961 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

SmolLM2: When Smol Goes Big--Data-Centric Training of a Small Language Model , author=. arXiv preprint arXiv:2502.02737 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

2026 , month =

Maxwell Zeff , title =. 2026 , month =

2026
[30]

The New York Times Magazine , year =

Clive Thompson , title =. The New York Times Magazine , year =
[31]

2026 , month =

Updates to GitHub Copilot Interaction Data Usage Policy , howpublished =. 2026 , month =

2026
[32]

2025 , month =

Connie Loizos , title =. 2025 , month =

2025
[33]

Code Llama: Open Foundation Models for Code

Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Advances in Neural Information Processing Systems , volume=

Counterfactual memorization in neural language models , author=. Advances in Neural Information Processing Systems , volume=
[35]

Science , volume=

Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=

2022
[36]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering , pages=

An exploratory investigation into code license infringements in large language model training datasets , author=. Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering , pages=

2024
[38]

GitHub, Inc

Doe v. GitHub, Inc. , author =. 2024 , month =

2024
[39]

Forty-second International Conference on Machine Learning , year=

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On , author=. Forty-second International Conference on Machine Learning , year=
[40]

2024 , eprint=

StarCoder 2 and The Stack v2: The Next Generation , author=. 2024 , eprint=

2024
[41]

Proceedings of the ACM on Software Engineering , volume=

Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials , author=. Proceedings of the ACM on Software Engineering , volume=. 2024 , publisher=

2024
[42]

International Conference on Machine Learning , pages=

Deduplicating training data mitigates privacy risks in language models , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[43]

The Twelfth International Conference on Learning Representations , year=

Detecting Pretraining Data from Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[44]

Advances in neural information processing systems , volume=

What neural networks memorize and why: Discovering the long tail via influence estimation , author=. Advances in neural information processing systems , volume=
[45]

33rd USENIX Security Symposium (USENIX Security 24) , pages=

Did the neurons read your book? document-level membership inference for large language models , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=
[46]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Membership inference attacks against language models via neighbourhood comparison , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023
[47]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Exploring the limits of strong membership inference attacks on large language models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[48]

Nature Communications , year=

The mosaic memory of large language models , author=. Nature Communications , year=
[49]

Proceedings of the 41st International Conference on Machine Learning , pages=

Physics of language models: part 3.1, knowledge storage and extraction , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
[50]

arXiv preprint arXiv:2510.18554 , year=

Extracting alignment data in open models , author=. arXiv preprint arXiv:2510.18554 , year=

work page arXiv
[51]

2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) , pages=

Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it) , author=. 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) , pages=. 2025 , organization=

2025
[52]

First Conference on Language Modeling , year=

Do Membership Inference Attacks Work on Large Language Models? , author=. First Conference on Language Modeling , year=
[53]

How much do language models memorize?arXiv preprint arXiv:2505.24832, 2025

How much do language models memorize? , author=. arXiv preprint arXiv:2505.24832 , year=

work page arXiv
[54]

Advances in Neural Information Processing Systems , volume=

Rethinking llm memorization through the lens of adversarial compression , author=. Advances in Neural Information Processing Systems , volume=

[1] [1]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=

[2] [2]

The Thirteenth International Conference on Learning Representations , year=

Scalable extraction of training data from aligned, production language models , author=. The Thirteenth International Conference on Learning Representations , year=

[3] [3]

Proceedings of the 16th International Natural Language Generation Conference , pages=

Preventing generation of verbatim memorization in language models gives a false sense of privacy , author=. Proceedings of the 16th International Natural Language Generation Conference , pages=

[4] [4]

Measuring memorization in language models via probabilistic extraction , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[5] [5]

Extracting memorized pieces of (copyrighted) books from open-weight language models

Extracting memorized pieces of (copyrighted) books from open-weight language models , author=. arXiv preprint arXiv:2505.12546 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

The Eleventh International Conference on Learning Representations , year=

Quantifying memorization across neural language models , author=. The Eleventh International Conference on Learning Representations , year=

[7] [7]

2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=

How much do code language models remember? an investigation on data extraction attacks before and after fine-tuning , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=

2025

[8] [8]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Traces of memorisation in large language models for code , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

[9] [9]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

Decoding secret memorization in code llms through token-level characterization , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=

2025

[10] [10]

32nd USENIX Security Symposium (USENIX Security 23) , pages=

\ CodexLeaks \ : Privacy leaks from code generation language models in \ GitHub \ copilot , author=. 32nd USENIX Security Symposium (USENIX Security 23) , pages=

[11] [11]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Unveiling memorization in code models , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

[12] [12]

2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=

Verileaky: Navigating ip protection vs utility in fine-tuning for llm-driven verilog coding , author=. 2025 IEEE International Conference on LLM-Aided Design (ICLAD) , pages=. 2025 , organization=

2025

[13] [13]

The Thirteenth International Conference on Learning Representations , year=

Measuring memorization in rlhf for code completion , author=. The Thirteenth International Conference on Learning Representations , year=

[14] [14]

arXiv preprint arXiv:2503.02296 , year=

Memorize or Generalize? Evaluating LLM Code Generation with Code Rewriting , author=. arXiv preprint arXiv:2503.02296 , year=

work page arXiv

[15] [15]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Quantifying contamination in evaluating code generation capabilities of language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[16] [16]

Queen’s School of computing TR , volume=

A survey on software clone detection research , author=. Queen’s School of computing TR , volume=

[17] [17]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Codebleu: a method for automatic evaluation of code synthesis , author=. arXiv preprint arXiv:2009.10297 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2009

[18] [18]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Revisiting code similarity evaluation with abstract syntax tree edit distance , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

[19] [19]

Findings of the association for computational linguistics: EMNLP 2020 , pages=

Codebert: A pre-trained model for programming and natural languages , author=. Findings of the association for computational linguistics: EMNLP 2020 , pages=

2020

[20] [20]

IEEE Access , year=

Code clone detection techniques based on large language models , author=. IEEE Access , year=

[21] [21]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

What can large language models capture about code functional equivalence? , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025

[22] [22]

arXiv preprint arXiv:2509.09714 , year=

How Small Transformation Expose the Weakness of Semantic Similarity Measures , author=. arXiv preprint arXiv:2509.09714 , year=

work page arXiv

[23] [23]

arXiv preprint arXiv:2508.01357 , year=

HyClone: Bridging LLM understanding and dynamic execution for semantic code clone detection , author=. arXiv preprint arXiv:2508.01357 , year=

work page arXiv

[24] [24]

arXiv preprint arXiv:2601.02671 , year=

Extracting books from production language models , author=. arXiv preprint arXiv:2601.02671 , year=

work page arXiv

[25] [25]

2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=

A Multiple Representation Transformer with Optimized Abstract Syntax Tree for Efficient Code Clone Detection , author=. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) , pages=. 2025 , organization=

2025

[26] [26]

Olmo 3

Olmo 3 , author=. arXiv preprint arXiv:2512.13961 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

SmolLM2: When Smol Goes Big--Data-Centric Training of a Small Language Model , author=. arXiv preprint arXiv:2502.02737 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

2026 , month =

Maxwell Zeff , title =. 2026 , month =

2026

[30] [30]

The New York Times Magazine , year =

Clive Thompson , title =. The New York Times Magazine , year =

[31] [31]

2026 , month =

Updates to GitHub Copilot Interaction Data Usage Policy , howpublished =. 2026 , month =

2026

[32] [32]

2025 , month =

Connie Loizos , title =. 2025 , month =

2025

[33] [33]

Code Llama: Open Foundation Models for Code

Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Advances in Neural Information Processing Systems , volume=

Counterfactual memorization in neural language models , author=. Advances in Neural Information Processing Systems , volume=

[35] [35]

Science , volume=

Competition-level code generation with alphacode , author=. Science , volume=. 2022 , publisher=

2022

[36] [36]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering , pages=

An exploratory investigation into code license infringements in large language model training datasets , author=. Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering , pages=

2024

[38] [38]

GitHub, Inc

Doe v. GitHub, Inc. , author =. 2024 , month =

2024

[39] [39]

Forty-second International Conference on Machine Learning , year=

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On , author=. Forty-second International Conference on Machine Learning , year=

[40] [40]

2024 , eprint=

StarCoder 2 and The Stack v2: The Next Generation , author=. 2024 , eprint=

2024

[41] [41]

Proceedings of the ACM on Software Engineering , volume=

Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials , author=. Proceedings of the ACM on Software Engineering , volume=. 2024 , publisher=

2024

[42] [42]

International Conference on Machine Learning , pages=

Deduplicating training data mitigates privacy risks in language models , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022

[43] [43]

The Twelfth International Conference on Learning Representations , year=

Detecting Pretraining Data from Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

[44] [44]

Advances in neural information processing systems , volume=

What neural networks memorize and why: Discovering the long tail via influence estimation , author=. Advances in neural information processing systems , volume=

[45] [45]

33rd USENIX Security Symposium (USENIX Security 24) , pages=

Did the neurons read your book? document-level membership inference for large language models , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=

[46] [46]

Findings of the Association for Computational Linguistics: ACL 2023 , pages=

Membership inference attacks against language models via neighbourhood comparison , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=

2023

[47] [47]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Exploring the limits of strong membership inference attacks on large language models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[48] [48]

Nature Communications , year=

The mosaic memory of large language models , author=. Nature Communications , year=

[49] [49]

Proceedings of the 41st International Conference on Machine Learning , pages=

Physics of language models: part 3.1, knowledge storage and extraction , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

[50] [50]

arXiv preprint arXiv:2510.18554 , year=

Extracting alignment data in open models , author=. arXiv preprint arXiv:2510.18554 , year=

work page arXiv

[51] [51]

2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) , pages=

Sok: Membership inference attacks on llms are rushing nowhere (and how to fix it) , author=. 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) , pages=. 2025 , organization=

2025

[52] [52]

First Conference on Language Modeling , year=

Do Membership Inference Attacks Work on Large Language Models? , author=. First Conference on Language Modeling , year=

[53] [53]

How much do language models memorize?arXiv preprint arXiv:2505.24832, 2025

How much do language models memorize? , author=. arXiv preprint arXiv:2505.24832 , year=

work page arXiv

[54] [54]

Advances in Neural Information Processing Systems , volume=

Rethinking llm memorization through the lens of adversarial compression , author=. Advances in Neural Information Processing Systems , volume=