Recognition: unknown
A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair
Pith reviewed 2026-05-09 20:58 UTC · model grok-4.3
The pith
LLM program repair tools succeed less often on semantics-preserving bug variants, revealing memorization of training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
All evaluated LLMs exhibit substantial drops in patch generation success rates on transformed benchmarks, from -4.1% for GPT-4o to -15.98% for Llama-3.1, and this degradation strongly correlates with NLL on the original benchmarks, indicating that models perform better on instances they are more likely to have memorized.
What carries the argument
Metamorphic testing using semantics-preserving transformations on program repair benchmarks, paired with negative log-likelihood as a proxy for memorization detection.
If this is right
- APR evaluations on standard benchmarks like Defects4J overestimate LLM performance due to potential memorization.
- Combining metamorphic variants with NLL analysis strengthens detection of data leakage.
- Metamorphic testing can be used to create leakage-resistant benchmarks for future LLM evaluations.
- State-of-the-art LLMs vary in their susceptibility, with some showing larger performance drops than others.
Where Pith is reading between the lines
- Training data for these LLMs likely includes the common bug fixes from Defects4J and similar datasets.
- The method could be extended to diagnose memorization in other code-related LLM tasks such as code completion or summarization.
- Developers of APR tools might consider training or fine-tuning on transformed or augmented bug data to improve generalization.
Load-bearing premise
The semantics-preserving transformations do not change the inherent difficulty of repairing the bugs beyond removing memorization advantages.
What would settle it
Observing no significant drop in repair success rates on the transformed benchmarks or finding no correlation between performance degradation and NLL values would falsify the claim of data leakage via memorization.
Figures
read the original abstract
LLM-based automated program repair (APR) techniques have shown promising results in reducing debugging costs. However, prior results can be affected by data leakage: large language models (LLMs) may memorize bug fixes when evaluation benchmarks overlap with their pretraining data, leading to inflated performance estimates. In this paper, we investigate whether we can better reveal data leakage by combining metamorphic testing (MT) with negative log-likelihood (NLL), which has been used in prior work as a proxy for memorization. We construct variant benchmarks by applying semantics-preserving transformations to two widely used datasets, Defects4J and GitBug-Java. Using these benchmarks, we evaluate the repair success rates of seven LLMs on both original and transformed versions, and analyze the relationship between performance degradation and NLL. Our results show that all evaluated state-of-the-art LLMs exhibit substantial drops in patch generation success rates on transformed benchmarks, ranging from -4.1% for GPT-4o to -15.98% for Llama-3.1. Furthermore, we find that this degradation strongly correlates with NLL on the original benchmarks, suggesting that models perform better on instances they are more likely to have memorized. These findings show that combining MT with NLL provides stronger and more reliable evidence of data leakage, while metamorphic testing alone can help mitigate its effects in LLM-based APR evaluations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that combining metamorphic testing (MT) via semantics-preserving transformations on Defects4J and GitBug-Java with negative log-likelihood (NLL) as a memorization proxy provides stronger evidence of data leakage in LLM-based APR. Across seven LLMs, it reports consistent drops in patch generation success on transformed benchmarks (e.g., -4.1% for GPT-4o to -15.98% for Llama-3.1) that correlate with original NLL, concluding that MT mitigates leakage effects in evaluations.
Significance. If the performance degradation can be isolated to loss of memorization rather than transformation-induced difficulty, the work would supply a practical diagnostic for more trustworthy LLM-APR benchmarking. The multi-model, multi-dataset design and direct use of NLL correlation add empirical weight, though the approach remains observational rather than providing machine-checked proofs or parameter-free derivations.
major comments (2)
- Abstract and evaluation description: The attribution of success-rate drops to reduced memorization assumes that the semantics-preserving transformations (variable renaming, statement reordering, equivalent expression substitution) do not independently increase repair difficulty via altered token sequences or syntactic patterns. No control experiments—such as difficulty metrics on non-memorized models, human repair times, or number of plausible patches—are reported to rule out this confound, making the causal link to leakage load-bearing but unisolated.
- Abstract: The reported 'strong correlation' between degradation and NLL lacks any mention of the correlation coefficient, statistical significance tests, confidence intervals, or controls for other variables (e.g., bug complexity or test-suite size) that could affect repair difficulty, weakening the claim that NLL serves as a clean proxy.
minor comments (1)
- Abstract: The exact transformation rules and their implementation details are not specified, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. Below we provide point-by-point responses to the major comments and describe the revisions we will make.
read point-by-point responses
-
Referee: Abstract and evaluation description: The attribution of success-rate drops to reduced memorization assumes that the semantics-preserving transformations (variable renaming, statement reordering, equivalent expression substitution) do not independently increase repair difficulty via altered token sequences or syntactic patterns. No control experiments—such as difficulty metrics on non-memorized models, human repair times, or number of plausible patches—are reported to rule out this confound, making the causal link to leakage load-bearing but unisolated.
Authors: We agree that additional controls would help isolate the effect. Our approach uses the correlation with NLL as evidence that the degradation is linked to memorization. We will revise the evaluation section to explicitly discuss this potential limitation and propose future experiments involving non-memorized models or human studies. The observed consistency across models and datasets supports our interpretation, but we acknowledge the causal link is not fully proven. revision: partial
-
Referee: Abstract: The reported 'strong correlation' between degradation and NLL lacks any mention of the correlation coefficient, statistical significance tests, confidence intervals, or controls for other variables (e.g., bug complexity or test-suite size) that could affect repair difficulty, weakening the claim that NLL serves as a clean proxy.
Authors: We will update the abstract to include the specific correlation statistics from our analysis, including the coefficient, p-value, and confidence intervals. We will also add a brief discussion on controlling for variables such as bug complexity, noting that the transformations preserve semantics and test suites, and the correlation holds after accounting for dataset variations. revision: yes
Circularity Check
No circularity: empirical observations only
full rationale
The paper is an empirical study that applies semantics-preserving transformations to existing benchmarks (Defects4J, GitBug-Java), measures repair success rates of seven LLMs on original vs. transformed versions, and reports correlations with NLL values. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. All claims rest on direct experimental measurements rather than any reduction of outputs to inputs by construction. The reader's noted assumption about transformation neutrality is a validity concern, not a circularity issue.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Semantics-preserving transformations preserve bug repair difficulty except for memorization effects.
- domain assumption Negative log-likelihood on original benchmarks is a valid proxy for memorization.
Forward citations
Cited by 1 Pith paper
-
Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey
A systematic survey of 93 studies that maps the bidirectional relationship between metamorphic testing and LLMs, proposing a taxonomy for MT applied to LLMs and LLMs applied to MT.
Reference graph
Works this paper leans on
-
[1]
On the dichotomy of debugging behavior among program- mers,
M. Beller, N. Spruit, D. Spinellis, and A. Zaidman, “On the dichotomy of debugging behavior among program- mers,” inICSE, 2018, pp. 572–583
2018
-
[2]
The cost of poor software quality in the us: A 2020 report,
H. Krasner, “The cost of poor software quality in the us: A 2020 report,” Consortium for Information & Software Quality (CISQ), Tech. Rep., 2021
2020
-
[3]
A survey on automated program repair tech- niques,
K. Huang, Z. Xu, S. Yang, H. Sun, X. Li, Z. Yan, and Y . Zhang, “A survey on automated program repair tech- niques,” 2023
2023
-
[4]
A systematic literature review on large language models for automated program repair,
Q. Zhang, C. Fang, Y . Xie, Y . Ma, W. Sun, Y . Yang, and Z. Chen, “A systematic literature review on large language models for automated program repair,” 2024
2024
-
[5]
How far can we go with practical function-level program repair?
J. Xiang, X. Xu, F. Kong, M. Wu, Z. Zhang, H. Zhang, and Y . Zhang, “How far can we go with practical function-level program repair?” 2024
2024
-
[6]
Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,
C. S. Xia and L. Zhang, “Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” inISSTA, 2024, pp. 819–831
2024
-
[7]
Hybrid automated program repair by combining large language models and program analysis,
F. Li, J. Jiang, J. Sun, and H. Zhang, “Hybrid automated program repair by combining large language models and program analysis,” 2024
2024
-
[8]
Defects4j: A database of existing faults to enable controlled testing studies for java programs,
R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” inISSTA, 2014, pp. 437–440
2014
-
[9]
Don’t make your llm an evaluation benchmark cheater,
K. Zhou, Y . Zhu, Z. Chen, W. Chen, W. X. Zhao, X. Chen, Y . Lin, J.-R. Wen, and J. Han, “Don’t make your llm an evaluation benchmark cheater,” 2023
2023
-
[10]
Breaking the silence: The threats of using llms in software engineer- ing,
J. Sallou, T. Durieux, and A. Panichella, “Breaking the silence: The threats of using llms in software engineer- ing,” inICSE NIER, 2024, pp. 102–106
2024
-
[11]
Are large language models memoriz- ing bug benchmarks?
D. Ramos, C. Mamede, K. Jain, P. Canelas, C. Gamboa, and C. L. Goues, “Are large language models memoriz- ing bug benchmarks?” 2024
2024
-
[12]
Condefects: A complementary dataset to address data leakage for llm-based fault localization and program repair,
Y . Wu, Z. Li, J. M. Zhang, and Y . Liu, “Condefects: A complementary dataset to address data leakage for llm-based fault localization and program repair,” inFSE Companion, 2024, pp. 642–646
2024
-
[13]
A critical review of large language models on software engineering: An example from chatgpt and automated program repair,
Q. Zhang, T. Zhang, J. Zhai, C. Fang, B. Yu, W. Sun, and Z. Chen, “A critical review of large language models on software engineering: An example from chatgpt and automated program repair,” 2024
2024
-
[14]
Lessleak-bench: A first investigation of data leakage in llms across 83 software engineering benchmarks,
X. Zhou, M. Weyssow, R. Widyasari, T. Zhang, J. He, Y . Lyu, J. Chang, B. Zhang, D. Huang, and D. Lo, “Lessleak-bench: A first investigation of data leakage in llms across 83 software engineering benchmarks,” 2025
2025
-
[15]
Assess- ing robustness of ml-based program analysis tools using metamorphic program transformations,
L. Applis, A. Panichella, and A. van Deursen, “Assess- ing robustness of ml-based program analysis tools using metamorphic program transformations,” inASE, 2021, pp. 1377–1381
2021
-
[16]
A survey on metamorphic testing,
S. Segura, G. Fraser, A. B. Sanchez, and A. Ruiz-Cort´es, “A survey on metamorphic testing,”IEEE Transactions on Software Engineering, vol. 42, no. 9, pp. 805–824, 2016
2016
-
[17]
Gitbug-java: A reproducible benchmark of recent java bugs,
A. Silva, N. Saavedra, and M. Monperrus, “Gitbug-java: A reproducible benchmark of recent java bugs,” inMSR, 2024, pp. 118–122
2024
-
[18]
Automatic identifier inconsistency detection using code dictionary,
S. Kim and D. Kim, “Automatic identifier inconsistency detection using code dictionary,”Empirical Software Engineering, vol. 21, no. 2, pp. 565–604, 2016
2016
-
[19]
Metamorphic-based many-objective dis- tillation of llms for code-related tasks,
A. Panichella, “Metamorphic-based many-objective dis- tillation of llms for code-related tasks,” inICSE, 2025
2025
-
[20]
Metamorphic testing of deep code mod- els: A systematic literature review,
A. Asgari, M. de Koning, P. Derakhshanfar, and A. Panichella, “Metamorphic testing of deep code mod- els: A systematic literature review,”ACM Transactions on Software Engineering and Methodology
-
[21]
A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Ka- dian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Gemma 2: Improving Open Language Models at a Practical Size
G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahri- ari, A. Ram ´eet al., “Gemma 2: Improving open language models at a practical size,”arXiv preprint arXiv:2408.00118, 2024
work page internal anchor Pith review arXiv 2024
-
[23]
Mistral 7b,
A. Q. Jiang, A. Sablayrolles, A. Mensch, and Others, “Mistral 7b,” 2023
2023
-
[24]
StarCoder 2 and The Stack v2: The Next Generation
A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy- Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y . Wei et al., “Starcoder 2 and the stack v2: The next gener- ation,”arXiv preprint arXiv:2402.19173, 2024
work page internal anchor Pith review arXiv 2024
-
[25]
Empirical review of java program repair tools: A large- scale experiment,
T. Durieux, F. Madeiral, M. Martinez, and R. Abreu, “Empirical review of java program repair tools: A large- scale experiment,” inESEC/FSE, 2019, pp. 302–313
2019
-
[26]
A survey on software fault localization,
W. E. Wong, R. Gao, Y . Li, R. Abreu, and F. Wotawa, “A survey on software fault localization,”IEEE Transac- tions on Software Engineering, vol. 42, no. 8, pp. 707– 740, 2016
2016
-
[27]
Genprog: A generic method for automatic software repair,
C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “Genprog: A generic method for automatic software repair,”IEEE Transactions on Software Engineering, vol. 38, no. 1, pp. 54–72, 2012
2012
-
[28]
Automatic error elimination by horizontal code transfer across multiple applications,
S. Sidiroglou-Douskos, E. Lahtinen, F. Long, and M. Ri- nard, “Automatic error elimination by horizontal code transfer across multiple applications,” inPLDI, 2015, pp. 43–54
2015
-
[29]
Automated fixing of programs with contracts,
Y . Wei, Y . Pei, C. A. Furia, L. S. Silva, S. Buchholz, B. Meyer, and A. Zeller, “Automated fixing of programs with contracts,” inISSTA, 2010, pp. 61–72
2010
-
[30]
Contract-based pro- gram repair without the contracts: An extended study,
L. Chen, Y . Pei, and C. A. Furia, “Contract-based pro- gram repair without the contracts: An extended study,” IEEE Transactions on Software Engineering, vol. 47, no. 12, pp. 2841–2857, 2020
2020
-
[31]
Automatic patch generation learned from human-written patches,
D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation learned from human-written patches,” inICSE, 2013, pp. 802–811
2013
-
[32]
History driven program repair,
X. B. D. Le, D. Lo, and C. Le Goues, “History driven program repair,” inSANER, 2016, pp. 213–224
2016
-
[33]
Relifix: Automated repair of software regressions,
S. H. Tan and A. Roychoudhury, “Relifix: Automated repair of software regressions,” inICSE, 2015, pp. 471– 482
2015
-
[34]
An empirical study on learning bug-fixing patches in the wild via neural ma- chine translation,
M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, “An empirical study on learning bug-fixing patches in the wild via neural ma- chine translation,”ACM Transactions on Software Engi- neering and Methodology, vol. 28, no. 4, 2019
2019
-
[35]
Sequencer: Sequence-to-sequence learning for end-to-end program repair,
Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, “Sequencer: Sequence-to-sequence learning for end-to-end program repair,”IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 1943–1959, 2019
1943
-
[36]
Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation,
Y . Wang, W. Wang, S. Joty, and S. C. H. Hoi, “Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation,” 2021
2021
-
[37]
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liuet al., “Codebert: A pre-trained model for programming and natural languages,”arXiv preprint arXiv:2002.08155, 2020
work page internal anchor Pith review arXiv 2002
-
[38]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brock- manet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[39]
A Survey on Large Language Models for Code Generation
J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,”arXiv preprint arXiv:2406.00515, 2024
work page internal anchor Pith review arXiv 2024
-
[40]
Source code summa- rization in the era of large language models,
W. Sun, Y . Miao, Y . Li, H. Zhang, C. Fang, Y . Liu, G. Deng, Y . Liu, and Z. Chen, “Source code summa- rization in the era of large language models,” 2024
2024
-
[41]
Large language model for vulnerability detection and repair: Literature review and the road ahead,
X. Zhou, S. Cao, X. Sun, and D. Lo, “Large language model for vulnerability detection and repair: Literature review and the road ahead,”ACM Transactions on Soft- ware Engineering and Methodology, 2024
2024
-
[42]
A com- prehensive overview of large language models,
H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A com- prehensive overview of large language models,” 2024
2024
-
[43]
Searching for quality: Genetic algorithms and metamorphic testing for software engineering ml,
L. Applis, A. Panichella, and R. Marang, “Searching for quality: Genetic algorithms and metamorphic testing for software engineering ml,” inGECCO, 2023, pp. 1490– 1498
2023
-
[44]
Evolutionary multi-objective optimization for contextual adversarial example generation,
S. Zhou, M. Huang, Y . Sun, and K. Li, “Evolutionary multi-objective optimization for contextual adversarial example generation,”Proceedings of the ACM on Soft- ware Engineering, vol. 1, 2024
2024
-
[45]
Discrete adversarial attack to models of code,
F. Gao, Y . Wang, and K. Wang, “Discrete adversarial attack to models of code,”Proceedings of the ACM on Programming Languages, vol. 7, 2023
2023
-
[46]
Adversarial robustness of deep code comment generation,
Y . Zhou, X. Zhang, J. Shen, T. Han, T. Chen, and H. Gall, “Adversarial robustness of deep code comment generation,”ACM Transactions on Software Engineer- ing and Methodology, vol. 31, no. 4, 2022
2022
-
[47]
Clawsat: Towards both robust and accurate code models,
J. Jia, S. Srikant, T. Mitrovska, C. Gan, S. Chang, S. Liu, and U.-M. O’Reilly, “Clawsat: Towards both robust and accurate code models,” inSANER, 2023, pp. 212–223
2023
-
[48]
On inter-dataset code duplication and data leakage in large language models,
J. A. H. L ´opez, B. Chen, M. Saad, T. Sharma, and D. Varr´o, “On inter-dataset code duplication and data leakage in large language models,”IEEE Transactions on Software Engineering, vol. 51, no. 1, pp. 192–205, 2025
2025
-
[49]
Traces of memorisation in large language models for code,
A. Al-Kaswan, M. Izadi, and A. Van Deursen, “Traces of memorisation in large language models for code,” in ICSE, 2024, pp. 1–12
2024
-
[50]
Benchmarking benchmark leakage in large language models,
R. Xu, Z. Wang, R.-Z. Fan, and P. Liu, “Benchmarking benchmark leakage in large language models,” 2024
2024
-
[51]
Estimating contamination via perplexity: Quan- tifying memorisation in language model evaluation,
Y . Li, “Estimating contamination via perplexity: Quan- tifying memorisation in language model evaluation,” 2023
2023
-
[52]
A sur- vey of learning-based automated program repair,
Q. Zhang, C. Fang, Y . Ma, W. Sun, and Z. Chen, “A sur- vey of learning-based automated program repair,”ACM Transactions on Software Engineering and Methodol- ogy, vol. 33, no. 2, 2023
2023
-
[53]
Addressing data leakage in humaneval using combinatorial test design,
J. S. Bradbury and R. More, “Addressing data leakage in humaneval using combinatorial test design,” inICST, 2025, pp. 587–591
2025
-
[54]
Natural attack for pre- trained models of code,
Z. Yang, J. Shi, J. He, and D. Lo, “Natural attack for pre- trained models of code,” inICSE, 2022, pp. 1482–1493
2022
-
[55]
Evalu- ating program repair with semantic-preserving transfor- mations: A naturalness assessment,
T. Le-Cong, D. Nguyen, B. Le, and T. Murray, “Evalu- ating program repair with semantic-preserving transfor- mations: A naturalness assessment,”CoRR, 2024
2024
-
[56]
Ro- bustnpr: Evaluating the robustness of neural program re- pair models,
H. Ge, W. Zhong, C. Li, J. Ge, H. Hu, and B. Luo, “Ro- bustnpr: Evaluating the robustness of neural program re- pair models,”Journal of Software: Evolution and Pro- cess, vol. 36, no. 4, p. e2586, 2024
2024
-
[57]
Evaluating the generalizability of llms in automated program repair,
F. Li, J. Jiang, J. Sun, and H. Zhang, “Evaluating the generalizability of llms in automated program repair,” 2025
2025
-
[58]
Exploring and lifting the robustness of llm-powered automated program repair with meta- morphic testing,
P. Xue, L. Wu, Z. Yang, X. Li, Z. Yu, Z. Jin, G. Li, Y . Xiao, and J. Wu, “Exploring and lifting the robustness of llm-powered automated program repair with meta- morphic testing,” 2024
2024
-
[59]
Efficient memory management for large language model serving with pagedattention,
W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with pagedattention,” inSOSP, 2023
2023
-
[60]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y . Zhou, S. Savarese, and C. Xiong, “Codegen: An open large language model for code with multi-turn program synthesis,”arXiv preprint arXiv:2203.13474, 2022
work page internal anchor Pith review arXiv 2022
-
[61]
Chain-of- thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of- thought prompting elicits reasoning in large language models,” inNeurIPS, 2022, pp. 24 824–24 837
2022
-
[62]
A hitchhiker’s guide to statis- tical tests for assessing randomized algorithms in soft- ware engineering,
A. Arcuri and L. Briand, “A hitchhiker’s guide to statis- tical tests for assessing randomized algorithms in soft- ware engineering,”Software Testing, Verification and Reliability, vol. 24, no. 3, pp. 219–250, 2014
2014
-
[63]
A critique and improve- ment of the cl common language effect size statistics of mcgraw and wong,
A. Vargha and H. D. Delaney, “A critique and improve- ment of the cl common language effect size statistics of mcgraw and wong,”Journal of Educational and Behav- ioral Statistics, vol. 25, no. 2, pp. 101–132, 2000
2000
-
[64]
A survey of llm-based automated program repair: Taxonomies, design paradigms, and ap- plications,
B. Yang, Z. Cai, F. Liu, B. Le, L. Zhang, T. F. Bissyand´e, Y . Liu, and H. Tian, “A survey of llm-based automated program repair: Taxonomies, design paradigms, and ap- plications,” 2025
2025
-
[65]
Neural transfer learning for repairing security vulnerabilities in c code,
Z. Chen, S. Kommrusch, and M. Monperrus, “Neural transfer learning for repairing security vulnerabilities in c code,”IEEE Transactions on Software Engineering, vol. 49, no. 1, pp. 147–165, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.