Recognition: no theorem link
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
Pith reviewed 2026-05-15 18:46 UTC · model grok-4.3
The pith
Research on sustainable LLM-generated code is limited and lacks any standard measurement framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The systematic literature review of primary studies shows that research on the sustainability of LLM-generated code remains relatively limited and fragmented. No widely accepted framework exists for defining sustainability, selecting metrics for energy efficiency and resource usage, or benchmarking results. The analysis examines methodological approaches, evaluation practices, experimental settings, and the effects of techniques such as fine-tuning and prompt engineering, but finds no consensus or standardized practices across the studies.
What carries the argument
Systematic literature review that selects and categorizes primary studies according to their approaches to measuring sustainability in LLM-generated code.
If this is right
- Clearer definitions of what counts as sustainable code in the LLM context are needed.
- Standardized evaluation methods and metrics must be developed to allow consistent assessment.
- Further systematic research is required to advance environmentally friendly AI-assisted software engineering.
- The potential influence of fine-tuning and prompt engineering on code sustainability remains unclear and needs targeted investigation.
Where Pith is reading between the lines
- Developers and organizations using LLMs for code tasks will have no reliable way to compare or improve the energy profile of their outputs until benchmarks appear.
- Widespread adoption of current LLM tools could raise the overall energy footprint of software applications if efficiency is not measured.
- Future LLM systems might incorporate direct sustainability scoring during generation as a practical response to this gap.
- This review points toward the value of linking LLM code work with established practices in green software engineering.
Load-bearing premise
The primary studies selected through the systematic search comprehensively and representatively capture the current state of research on LLM-generated sustainable code without significant gaps in coverage.
What would settle it
Identification of a substantial body of additional studies that all apply the same consistent set of metrics and a shared benchmarking framework for sustainability would contradict the claim of fragmentation and lack of standards.
Figures
read the original abstract
Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model training and inference, far less attention has been given to the sustainability of the code these models produce. The efficiency of generated code affects the long-term environmental impact of software systems. Inefficient code can increase CPU usage, memory consumption, execution time, and overall energy use during deployment and operation. As LLM-generated code becomes more common in real-world projects, even small inefficiencies can lead to high environmental costs over time. This paper examines existing research on the sustainability of code generated by LLMs. We conduct a systematic literature review to analyze selected primary studies and investigate the extent to which LLMs are capable of producing sustainable code. In addition, we examine how sustainability is defined and measured in this context, including the metrics and evaluation strategies used to assess energy efficiency and resource usage. We also explore whether techniques such as fine-tuning and prompt engineering influence the sustainability of generated code. Through a structured analysis of the selected studies, we categorize research efforts based on their methodological approaches, evaluation practices, and experimental settings. The findings indicate that research in this area remains relatively limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code. These observations highlight the need for clearer definitions, standardized evaluation methods, and systematic research to support environmentally friendly AI-assisted software engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper conducts a systematic literature review on the sustainability of code generated by large language models (LLMs). It examines the extent to which LLMs produce sustainable code, how sustainability is defined and measured (via metrics for energy efficiency, resource usage, CPU/memory consumption, and execution time), the impact of techniques such as fine-tuning and prompt engineering, and categorizes primary studies by methodological approaches, evaluation practices, and experimental settings. The central finding is that research remains limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code.
Significance. If the synthesis holds, the review would usefully map an emerging intersection of LLM-based code generation and software sustainability, underscoring the environmental stakes of inefficient generated code in deployed systems and calling for standardized metrics and evaluation protocols in AI-assisted software engineering.
major comments (2)
- [Methodology] Methodology section: The description of the search strategy, selected databases, search strings, inclusion/exclusion criteria, number of primary studies screened and included, and any quality assessment procedure is absent or insufficiently detailed. This information is load-bearing for the claim that research is 'relatively limited and fragmented,' because an incomplete or non-representative sample could artifactually produce that assessment.
- [Results/Discussion] Results and Discussion sections: The conclusion of 'no widely accepted framework' depends on the selected studies being comprehensive. The search terms centered on 'LLM', 'sustainable code', and 'energy efficiency' risk missing relevant work using variant terminology (e.g., 'green code generation', 'carbon-aware code synthesis', or studies focused on specific resource metrics without adopting the review's sustainability framing), which directly affects the representativeness of the evidence base.
minor comments (2)
- [Abstract] Abstract: Explicitly stating the final number of included primary studies would immediately convey the scope of the review to readers.
- [Methodology] The paper would benefit from a PRISMA-style flow diagram or table summarizing the study selection process to improve transparency.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our systematic literature review. The feedback highlights important areas for improving transparency and comprehensiveness, which we will address through revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methodology] Methodology section: The description of the search strategy, selected databases, search strings, inclusion/exclusion criteria, number of primary studies screened and included, and any quality assessment procedure is absent or insufficiently detailed. This information is load-bearing for the claim that research is 'relatively limited and fragmented,' because an incomplete or non-representative sample could artifactually produce that assessment.
Authors: We agree that the Methodology section requires substantially more detail to support reproducibility and the validity of our conclusions. In the revised manuscript, we will expand this section to fully describe the search strategy, including the specific databases (ACM Digital Library, IEEE Xplore, Scopus, Web of Science, and arXiv), the complete search strings with Boolean operators, the inclusion/exclusion criteria, a PRISMA flow diagram with exact screening and inclusion counts, and any quality assessment procedures applied. This will provide a stronger basis for assessing the research as limited and fragmented. revision: yes
-
Referee: [Results/Discussion] Results and Discussion sections: The conclusion of 'no widely accepted framework' depends on the selected studies being comprehensive. The search terms centered on 'LLM', 'sustainable code', and 'energy efficiency' risk missing relevant work using variant terminology (e.g., 'green code generation', 'carbon-aware code synthesis', or studies focused on specific resource metrics without adopting the review's sustainability framing), which directly affects the representativeness of the evidence base.
Authors: We acknowledge the potential for terminology variation to affect coverage in this emerging area. Our original search incorporated core terms along with some synonyms (such as 'green software' and 'energy-aware generation'), but we agree it may have missed certain framings. In revision, we will conduct an expanded search using the suggested variant terms, update the results to reflect any additional studies found, and add an explicit limitations discussion on terminology challenges and their implications for completeness. This will either reinforce or refine our conclusion regarding the absence of a widely accepted framework. revision: yes
Circularity Check
No circularity: SLR reports external primary-study observations
full rationale
This is a systematic literature review whose central claim (research remains limited and fragmented with no accepted framework) is obtained by counting and categorizing the primary studies returned by the described search string across databases. No equations, fitted parameters, predictions, or self-citations appear in the provided text; the methodology is a standard SLR protocol whose output is the set of external papers themselves. The claim therefore does not reduce to any input by construction and rests on the representativeness of the retrieved corpus rather than on any internal definitional loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard systematic literature review methodology in software engineering is sufficient to identify and synthesize relevant primary studies on LLM code sustainability.
Reference graph
Works this paper leans on
-
[1]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[2]
A. Hameed, S. M. Danish, A. Ranjha, and G. Srivastava, “Block- fest: Blockchain-enhanced federated sparse transformers for privacy- preserving res forecasting in internet of vehicles systems,”IEEE Internet of Things Journal, 2025
work page 2025
-
[3]
A Survey of Large Language Models
W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Dong, Y . Du, C. Yang, Y . Chen, Z. Chen, J. Jiang, R. Ren, Y . Li, X. Tang, Z. Liu, P. Liu, J.-Y . Nie, and J.-R. Wen, “A survey of large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2303.18223
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
GitHub, “GitHub Copilot documentation,” https://docs.github.com/en/ copilot, accessed: 2026-02-21
work page 2026
-
[5]
zai-org, “CodeGeeX: An open multilingual code generation model (kdd
-
[6]
[source code],” https://github.com/zai-org/CodeGeeX, accessed: 2026-02-21
work page 2026
-
[7]
Amazon CodeWhisperer Documenta- tion,
Amazon Web Services, Inc., “Amazon CodeWhisperer Documenta- tion,” https://docs.aws.amazon.com/codewhisperer/, accessed: 2026-02- 21
work page 2026
-
[8]
Llms meet federated learning for scalable and secure iot management,
Y . Otoum, A. Asad, and A. Nayak, “Llms meet federated learning for scalable and secure iot management,”arXiv preprint arXiv:2504.16032, 2025
-
[9]
Aligning pre-trained llms for enhanced uav power consumption forecasting,
A. Hameed, S. M. Danish, and A. Leivadeas, “Aligning pre-trained llms for enhanced uav power consumption forecasting,” inProceedings of the IEEE Global Communications Conference (GLOBECOM), IEEE. Taipei, Taiwan: IEEE, 2025
work page 2025
-
[10]
Herrington,Code generation in action
J. Herrington,Code generation in action. Manning Publications Co., 2003
work page 2003
-
[11]
S. Rahman, A. Hameed, G. Srivastava, and S. M. Danish, “Refactor- coderqa: Benchmarking llms for multi-domain coding question solu- tions in cloud and edge deployment,”arXiv preprint arXiv:2509.10436, 2025
-
[12]
R. Desislavov, F. Mart ´ınez-Plumed, and J. Hern ´andez-Orallo, “Trends in ai inference energy consumption: Beyond the performance-vs- parameter laws of deep learning,”Sustainable Computing: Informatics and Systems, vol. 38, p. 100857, 2023
work page 2023
-
[13]
Holistically evaluating the environmental impact of creating language models. arxiv 2025,
J. Morrison, C. Na, J. Fernandez, T. Dettmers, E. Strubell, and J. Dodge, “Holistically evaluating the environmental impact of creating language models. arxiv 2025,”arXiv preprint arXiv:2503.05804, 2025
-
[14]
Learn to code sustainably: An empirical study on llm-based green code generation,
T. Vartziotis, I. Dellatolas, G. Dasoulas, M. Schmidt, F. Schneider, T. Hoffmann, S. Kotsopoulos, and M. Keckeisen, “Learn to code sustainably: An empirical study on llm-based green code generation,” arXiv preprint arXiv:2403.03344, 2024
-
[15]
R. Mehra, P. Pathania, V . S. Sharma, V . Kaulgud, S. Podder, and A. P. Burden, “Assessing the impact of refactoring energy-inefficient code patterns on software sustainability: An industry case study,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023, pp. 1825–1827
work page 2023
-
[16]
The real climate and transformative impact of ict: A critique of estimates, trends, and regulations,
C. Freitag, M. Berners-Lee, K. Widdicks, B. Knowles, G. S. Blair, and A. Friday, “The real climate and transformative impact of ict: A critique of estimates, trends, and regulations,”Patterns, vol. 2, no. 9, 2021
work page 2021
-
[17]
Who is using ai to code? global diffusion and impact of generative ai,
S. Daniotti, J. Wachs, X. Feng, and F. Neffke, “Who is using ai to code? global diffusion and impact of generative ai,”arXiv preprint arXiv:2506.08945, 2025
-
[18]
A comprehensive overview of large language models,
H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,”ACM Transactions on Intelligent Systems and Technology, 2023
work page 2023
-
[19]
Phishing detection in the gen-ai era: Quantized llms vs classical models,
J. Thapa, G. Chahal, S ¸. V . Gabreanu, and Y . Otoum, “Phishing detection in the gen-ai era: Quantized llms vs classical models,” in2025 IEEE/ACIS 29th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, 2025, pp. 856–863
work page 2025
-
[20]
Llm-based threat detec- tion and prevention framework for iot ecosystems,
Y . Otoum, A. Asad, and A. Nayak, “Llm-based threat detec- tion and prevention framework for iot ecosystems,”arXiv preprint arXiv:2505.00240, 2025
-
[22]
Qwen2.5-Coder Technical Report
B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu, K. Dang, Y . Fan, Y . Zhang, A. Yang, R. Men, F. Huang, B. Zheng, Y . Miao, S. Quan, Y . Feng, X. Ren, X. Ren, J. Zhou, and J. Lin, “Qwen2.5-coder technical report,” 2024. [Online]. Available: https://arxiv.org/abs/2409.12186
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
F. Almeida and G. Xex ´eo, “Word embeddings: A survey,”arXiv preprint arXiv:1901.09069, 2019
-
[24]
P.-X. Cai, Y .-C. Fan, and F.-Y . Leu, “Compare encoder-decoder, encoder-only, and decoder-only architectures for text generation on low-resource datasets,” inInternational Conference on Broadband and Wireless Computing, Communication and Applications. Springer, 2021, pp. 216–225
work page 2021
-
[25]
Block-fedl: Electric vehicle charging load forecasting using federated learning and blockchain,
S. M. Danish, A. Hameed, A. Ranjha, G. Srivastava, and K. Zhang, “Block-fedl: Electric vehicle charging load forecasting using federated learning and blockchain,”IEEE Transactions on Vehicular Technology, vol. 74, no. 2, pp. 2048–2056, 2024
work page 2048
-
[26]
Toward qos prediction based on temporal transformers for iot applications,
A. Hameed, J. Violos, A. Leivadeas, N. Santi, R. Gr ¨unblatt, and N. Mitton, “Toward qos prediction based on temporal transformers for iot applications,”IEEE Transactions on Network and Service Management, vol. 19, no. 4, pp. 4010–4027, 2022
work page 2022
-
[27]
A neural probabilistic language model,
Y . Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,”Journal of Machine Learning Research, vol. 3, no. Feb, pp. 1137–1155, 2003. [Online]. Available: https://www.jmlr.org/papers/v3/bengio03a.html
work page 2003
-
[28]
Backpropagation through time: What it does and how to do it,
P. J. Werbos, “Backpropagation through time: What it does and how to do it,”Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990
work page 1990
-
[29]
Decoupled weight decay regularization,
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inInternational Conference on Learning Representations (ICLR),
-
[30]
Decoupled Weight Decay Regularization
[Online]. Available: https://arxiv.org/abs/1711.05101
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
Sequence to sequence learning with neural networks,
I. Sutskever, O. Vinyals, and Q. V . Le, “Sequence to sequence learning with neural networks,” inAdvances in Neural Information Processing Systems, vol. 27, 2014, pp. 3104–3112. [Online]. Available: https://papers.neurips.cc/paper/ 5346-sequence-to-sequence-learning-with-neural-networks.pdf
work page 2014
-
[32]
Small language models: Survey, measurements, and insights,
Z. Lu, X. Li, D. Cai, R. Yi, F. Liu, X. Zhang, N. D. Lane, and M. Xu, “Small language models: Survey, measurements, and insights,”arXiv preprint arXiv:2409.15790, 2024
-
[33]
A. Kshatriya and K. D. Prajapati, “Advances in small language mod- els: A comprehensive survey on efficient nlp solutions for resource- constrained environments,”Authorea Preprints, 2025
work page 2025
-
[34]
Knowledge distillation: A survey,
J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,”International journal of computer vision, vol. 129, no. 6, pp. 1789–1819, 2021
work page 2021
-
[35]
Y . Wang, W. Wang, S. Joty, and S. C. H. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds. Online and Punta Cana, Dominican Republic: Association ...
work page 2021
-
[36]
Dy- namollm: Designing llm inference clusters for performance and energy efficiency,
J. Stojkovic, C. Zhang, ´I. Goiri, J. Torrellas, and E. Choukse, “Dy- namollm: Designing llm inference clusters for performance and energy efficiency,” in2025 IEEE International Symposium on High Perfor- mance Computer Architecture (HPCA). IEEE, 2025, pp. 1348–1362
work page 2025
-
[37]
Beyond the limits: A survey of techniques to ex- tend the context length in large language models,
X. Wang, M. Salmani, P. Omidi, X. Ren, M. Rezagholizadeh, and A. Eshaghi, “Beyond the limits: A survey of techniques to ex- tend the context length in large language models,”arXiv preprint arXiv:2402.02244, 2024
-
[38]
S. Samsi, D. Zhao, J. McDonald, B. Li, A. Michaleas, M. Jones, W. Bergeron, J. Kepner, D. Tiwari, and V . Gadepally, “From words to watts: Benchmarking the energy costs of large language model inference,” 2023. [Online]. Available: https://arxiv.org/abs/2310.03003
-
[39]
When scaling meets llm finetuning: The effect of data, model and finetuning method,
B. Zhang, Z. Liu, C. Cherry, and O. Firat, “When scaling meets llm finetuning: The effect of data, model and finetuning method,”arXiv preprint arXiv:2402.17193, 2024
-
[40]
Tokenpowerbench: Benchmarking the power consumption of llm inference,
C. Niu, W. Zhang, J. Li, Y . Zhao, T. Wang, X. Wang, and Y . Chen, “Tokenpowerbench: Benchmarking the power consumption of llm inference,” 2025. [Online]. Available: https://arxiv.org/abs/2512.03024
-
[41]
Fine tuning llm for en- terprise: Practical guidelines and recommendations,
K. VM, H. Warrier, Y . Guptaet al., “Fine tuning llm for en- terprise: Practical guidelines and recommendations,”arXiv preprint arXiv:2404.10779, 2024
-
[42]
Energy and policy considerations for deep learning in NLP,
E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in NLP,” inProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, A. Korhonen, D. Traum, and L. M `arquez, Eds. Florence, Italy: Association for Computational Linguistics, Jul. 2019, pp. 3645–3650. [Online]. Available: https://acl...
work page 2019
-
[43]
Efficient prompting for llm-based generative internet of things,
B. Xiao, B. Kantarci, J. Kang, D. Niyato, and M. Guizani, “Efficient prompting for llm-based generative internet of things,”IEEE Internet of Things Journal, 2024
work page 2024
-
[44]
Prompt programming for large language models: Beyond the few-shot paradigm,
L. Reynolds and K. McDonell, “Prompt programming for large language models: Beyond the few-shot paradigm,” 2021. [Online]. Available: https://arxiv.org/abs/2102.07350
-
[45]
Prompt engineering in large language models,
G. Marvin, N. Hellen, D. Jjingo, and J. Nakatumba-Nabende, “Prompt engineering in large language models,” inInternational conference on data intelligence and cognitive informatics. Springer, 2023, pp. 387– 402
work page 2023
-
[46]
B. A. Becker, P. Denny, J. Finnie-Ansley, A. Luxton-Reilly, J. Prather, and E. A. Santos, “Programming is hard-or at least it used to be: Educational opportunities and challenges of ai code generation,” in Proceedings of the 54th ACM Technical Symposium on Computer Science Education V . 1, 2023, pp. 500–506
work page 2023
-
[47]
In-ide code generation from natural language: Promise and challenges,
F. F. Xu, B. Vasilescu, and G. Neubig, “In-ide code generation from natural language: Promise and challenges,”ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 2, pp. 1–47, 2022
work page 2022
- [48]
-
[49]
A survey on code generation with llm-based agents,
Y . Dong, X. Jiang, J. Qian, T. Wang, K. Zhang, Z. Jin, and G. Li, “A survey on code generation with llm-based agents,”arXiv preprint arXiv:2508.00083, 2025
-
[50]
Learning from examples to improve code completion systems,
M. Bruch, M. Monperrus, and M. Mezini, “Learning from examples to improve code completion systems,” inProceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009, pp. 213–222
work page 2009
-
[51]
Y . Ishibashi and Y . Nishimura, “Self-organized agents: A llm multi- agent framework toward ultra large-scale code generation and opti- mization,”arXiv preprint arXiv:2404.02183, 2024
-
[52]
Code refactoring with llm: A comprehensive evaluation with few-shot settings,
M. R. Tapader, M. M. Rahman, A. I. Shiplu, M. F. I. Amin, and Y . Watanobe, “Code refactoring with llm: A comprehensive evaluation with few-shot settings,”arXiv preprint arXiv:2511.21788, 2025
-
[53]
Llm-based code generation: A systematic literature review with technical and demographic insights,
K. U. Danyaro, M. Nasser, A. Zakari, S. Abdullahi, A. Khanzada, M. M. Yakubu, S. Shoaibet al., “Llm-based code generation: A systematic literature review with technical and demographic insights,” IEEE Access, vol. 13, pp. 194 915–194 939, 2025
work page 2025
-
[54]
A systematic review about large language models (llms) applied to code generation,
F. A. Bacin, B. A. de Mello, G. D. Salton, and S. da Silva Feitosa, “A systematic review about large language models (llms) applied to code generation,”Revista Brasileira de Computac ¸˜ao Aplicada, vol. 17, no. 3, pp. 1–13, 2025
work page 2025
-
[55]
Usage of large language model for code generation tasks: A review,
S. Bistarelli, M. Fiore, I. Mercanti, and M. Mongiello, “Usage of large language model for code generation tasks: A review,”SN Computer Science, vol. 6, no. 6, p. 673, 2025
work page 2025
-
[56]
Automatic code generation techniques: A systematic literature review,
M. Alharbi and M. Alshayeb, “Automatic code generation techniques: A systematic literature review,”Automated Software Engineering, vol. 33, no. 1, p. 4, 2026
work page 2026
-
[57]
A Survey on Large Language Models for Code Generation
J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim, “A survey on large language models for code generation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.00515
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Large language models meet NL2Code: A survey,
D. Zan, B. Chen, F. Zhang, D. Lu, B. Wu, B. Guan, Y . Wang, and J.-G. Lou, “Large language models meet NL2Code: A survey,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, ...
work page 2023
-
[59]
A survey on evaluating large language models in code generation tasks,
L. Chen, Q. Guo, H. Jia, Z. Zeng, X. Wang, Y . Xu, J. Wu, Y . Wang, Q. Gao, J. Wang, W. Ye, and S. Zhang, “A survey on evaluating large language models in code generation tasks,” 2024. [Online]. Available: https://arxiv.org/abs/2408.16498
-
[60]
A systematic survey on large language models for code generation,
S. K. Jabrw and Q. I. Sarhan, “A systematic survey on large language models for code generation,”ARO - The Scientific Journal of Koya University, vol. 13, no. 2, pp. 83–99, 2025. [Online]. Available: https://aro.koyauniversity.org/index.php/aro/article/view/2159
work page 2025
-
[61]
Large language models for software engi- neering: A systematic literature review,
X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engi- neering: A systematic literature review,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 8, pp. 1–79, 2024
work page 2024
-
[62]
Large language models for software engineering: A systematic literature review,
——, “Large language models for software engineering: A systematic literature review,” 2023. [Online]. Available: https://arxiv.org/abs/2308. 10620
work page 2023
-
[63]
Large language models for software engineering: Survey and open problems,
A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” 2023. [Online]. Available: https://arxiv.org/abs/2310.03533
-
[64]
Large language models for code completion: A systematic literature review,
R. A. Husein, H. Aburajouh, and C. Catal, “Large language models for code completion: A systematic literature review,”Computer Standards & Interfaces, vol. 92, p. 103917, 2025
work page 2025
-
[65]
Large language models for code completion: A systematic literature review,
——, “Large language models for code completion: A systematic literature review,”Computer Standards & Interfaces, vol. 92, p. 103917, Mar. 2025. [Online]. Available: https://doi.org/10.1016/j.csi. 2024.103917
-
[67]
A survey of LLM-based automated program repair: Taxonomies, design paradigms, and applications,
B. Yang, Z. Cai, F. Liu, B. Le, L. Zhang, T. F. Bissyand ´e, Y . Liu, and H. Tian, “A survey of LLM-based automated program repair: Taxonomies, design paradigms, and applications,” 2025. [Online]. Available: https://arxiv.org/abs/2506.23749
- [68]
-
[69]
A systematic literature review of parameter-efficient fine-tuning for large code models,
S. Afrin, M. Z. Haque, and A. Mastropaolo, “A systematic literature review of parameter-efficient fine-tuning for large code models,”ACM Transactions on Software Engineering and Methodology, 2025
work page 2025
-
[70]
A systematic literature review of parameter-efficient fine-tuning for large code models,
M. Z. Haque, S. Afrin, and A. Mastropaolo, “A systematic literature review of parameter-efficient fine-tuning for large code models,” 2025. [Online]. Available: https://arxiv.org/abs/2504.21569
-
[71]
A survey on llm-based code generation for low-resource and domain-specific programming languages,
S. Joel, J. Wu, and F. Fard, “A survey on llm-based code generation for low-resource and domain-specific programming languages,”ACM Transactions on Software Engineering and Methodology, 2024
work page 2024
-
[72]
On the effectiveness of large language models in domain- specific code generation,
X. Guet al., “On the effectiveness of large language models in domain- specific code generation,”ACM Transactions on Software Engineering and Methodology, 2025
work page 2025
-
[73]
A. F. Pereira and R. F. Mello, “A systematic literature review on large language models applications in computer programming teaching evaluation process,”IEEE Access, 2025
work page 2025
-
[74]
D. Cambaz and X. Zhang, “Use of ai-driven code generation models in teaching and learning programming: A systematic literature review,” in Proceedings of the ACM SIGCSE Technical Symposium. ACM, 2024
work page 2024
-
[75]
Large language models in computer science education: A systematic literature review,
“Large language models in computer science education: A systematic literature review,”ACM Transactions on Computing Education, 2025
work page 2025
-
[76]
Computing education using generative artificial intelligence: A systematic literature review,
F. J. Agboet al., “Computing education using generative artificial intelligence: A systematic literature review,”Computers and Education: Artificial Intelligence, 2025
work page 2025
-
[77]
The impact of llm-assistants on software developer productivity: A systematic literature review,
A. Mohamed, M. Assi, and M. Guizani, “The impact of llm-assistants on software developer productivity: A systematic literature review,” arXiv preprint arXiv:2507.03156, 2025
-
[78]
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
S. Peng, E. Kalliamvakou, P. Cihon, and M. Demirer, “The impact of ai on developer productivity: Evidence from github copilot,” 2023. [Online]. Available: https://arxiv.org/abs/2302.06590
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[79]
V . F. Quevedo-Tumailli, S. E. Arias Calder ´on, V . A. Ortega Manjarrez, and B. Ortega-Tenezaca, “Impact of large language models on quality and efficiency of code generation: Systematic literature review,”Revista Digital Novasinergia, vol. 8, pp. 52–66, 2025
work page 2025
-
[80]
Benchmarks and metrics for evaluations of code generation: A critical review,
D. G. Paul, H. Zhu, and I. Bayley, “Benchmarks and metrics for evaluations of code generation: A critical review,” in2024 IEEE International Conference on Artificial Intelligence Testing (AITest), 2024, pp. 87–94. [Online]. Available: https://arxiv.org/abs/2406.12655
-
[82]
Quality assurance of llm-generated code: Addressing non-functional quality characteristics,
X. Sun, D. St ˚ahl, K. Sandahl, and C. Kessler, “Quality assurance of llm-generated code: Addressing non-functional quality characteristics,”
-
[83]
Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics
[Online]. Available: https://arxiv.org/abs/2511.10271
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.