pith. sign in

arxiv: 2605.23091 · v1 · pith:7OJ2TQLRnew · submitted 2026-05-21 · 💻 cs.SE · cs.AI· cs.CR

Security of LLM-generated Code: A Comparative Analysis

Pith reviewed 2026-05-25 05:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CR
keywords LLM-generated codecode securityvulnerability analysisAI tools for developmentempirical studysoftware vulnerabilities
0
0 comments X

The pith

Seven popular LLMs all generate code with vulnerabilities when prompted to mimic developer behavior, with most issues being critical or high severity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests seven large language models by giving them prompts designed to copy how developers request code in practice. It reports that every model returns code containing security vulnerabilities. Most of these vulnerabilities are classified as critical or high severity. This finding is relevant because many developers already use or plan to use these tools for real projects, potentially introducing security risks into software.

Core claim

When LLMs are prompted in ways that reflect typical developer usage for code generation, all seven models evaluated produce code that includes vulnerabilities, the majority of which have critical or high severity ratings.

What carries the argument

Developer-mimicking prompts applied to seven LLMs to generate code samples, followed by vulnerability analysis to identify security flaws in the outputs.

If this is right

  • Code from LLMs requires manual security review before use in production.
  • The security problems appear consistent across different popular models.
  • AI coding tools may need additional safeguards to reduce vulnerability introduction.
  • Existing production use of LLM code could already contain undetected issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The results imply that current LLMs lack sufficient built-in security awareness during code generation.
  • Extending the evaluation to more models or different prompting styles could reveal if this is universal.
  • Developers might benefit from tools that automatically scan and fix LLM-generated code for security issues.

Load-bearing premise

The prompts used successfully replicate the real-world prompting patterns of developers when generating code with LLMs.

What would settle it

A test where at least one of the seven LLMs generates code free of critical and high-severity vulnerabilities under the same prompting conditions.

Figures

Figures reproduced from arXiv: 2605.23091 by Hala Assal, Mahmoud Selim, Srivathsan G Morkonda.

Figure 1
Figure 1. Figure 1: Overview of CWEs generated from each prompt category, and by each LLM. On the le [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The severity level of vulnerabilities found in code snippets generated by di [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The box plots represent the distribution of the number of lines of code generated by each tool (the y-axis on the le [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The rate of vulnerabilities in each tool [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper empirically evaluates the security of code generated by seven popular LLMs. The authors construct prompts that build upon prior work to mimic developer behaviors when querying LLMs, and report that all seven models produce code containing vulnerabilities, the majority of which are of critical or high severity.

Significance. If the empirical findings are robust, the work is significant for software engineering and security research because LLM-generated code is already deployed in production environments at major companies. A comparative analysis across multiple models that quantifies vulnerability rates and severity distributions could inform guidelines for safe adoption of these tools and highlight the need for improved prompting or post-generation checks.

major comments (2)
  1. [Abstract] Abstract: The central claim that results reflect real-world risk rests on the unvalidated assumption that the chosen prompts 'mimic the behaviours of developers.' No evidence is supplied that these prompts were checked against logged developer sessions, public query corpora, or surveys; if the prompts systematically omit context or guardrails relative to typical usage, the observed vulnerability rates could be inflated.
  2. [Abstract] Abstract (and presumably §3 or §4): The abstract supplies no information on the identity of the seven LLMs, the number of code samples generated per model, the specific vulnerability scanner employed, or the method used to assign severity levels. These omissions render the headline empirical result non-reproducible and unverifiable from the provided description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments, which highlight opportunities to strengthen the clarity and reproducibility of our empirical study. We address each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that results reflect real-world risk rests on the unvalidated assumption that the chosen prompts 'mimic the behaviours of developers.' No evidence is supplied that these prompts were checked against logged developer sessions, public query corpora, or surveys; if the prompts systematically omit context or guardrails relative to typical usage, the observed vulnerability rates could be inflated.

    Authors: We acknowledge the referee's concern. The prompts were adapted from prior published work that sought to emulate typical developer-LLM interactions for code generation tasks. However, this study did not include independent validation against real developer query logs, surveys, or public corpora. We agree this represents a limitation that could affect the generalizability of the risk estimates. In the revised manuscript we will expand the methodology section to describe the prompt construction process in greater detail, explicitly note the reliance on prior work, and add a limitations paragraph discussing the possibility that vulnerability rates may be inflated relative to production usage patterns that include more context or guardrails. revision: partial

  2. Referee: [Abstract] Abstract (and presumably §3 or §4): The abstract supplies no information on the identity of the seven LLMs, the number of code samples generated per model, the specific vulnerability scanner employed, or the method used to assign severity levels. These omissions render the headline empirical result non-reproducible and unverifiable from the provided description.

    Authors: We agree that the abstract should contain sufficient detail to make the core empirical claims reproducible at a high level. The full paper (Sections 3 and 4) specifies the seven LLMs, sample counts, scanner, and severity assignment procedure, but these were not summarized in the abstract. In the revision we will expand the abstract to name the models, report the number of samples per model, identify the vulnerability scanner, and briefly describe the severity classification method. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation

full rationale

The paper is an empirical comparative study that prompts seven LLMs to generate code using behaviors mimicked from prior work, then analyzes the outputs for vulnerabilities. No equations, derivations, fitted parameters, or predictions are present. The central claim rests on experimental results rather than any self-referential reduction, self-citation chain, or ansatz. The prompting method is an experimental design choice whose validity is external to the derivation (none exists), so no pattern from the enumerated list applies. This is the normal case of a self-contained empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the unstated assumption that automated vulnerability scanners correctly classify severity and that the chosen prompting simulation is representative; no free parameters or invented entities are visible in the abstract.

axioms (1)
  • domain assumption Automated vulnerability detection tools produce reliable severity classifications for LLM-generated code.
    The abstract uses these classifications to assert that the majority of vulnerabilities are critical or high severity.

pith-pipeline@v0.9.0 · 5659 in / 1160 out tokens · 39984 ms · 2026-05-25T05:14:54.110353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages

  1. [1]

    [n. d.]. AI that builds with you. Retrieved May, 2025 from https://github.com/features/ai Security of LLM-generated Code: A Comparative Analysis. 19

  2. [2]

    [n. d.]. ChatGPT. Retrieved May, 2025 from https://chatgpt.com

  3. [3]

    [n. d.]. OpenAI. Retrieved May, 2025 from https://openai.com

  4. [4]

    [n. d.]. Utilities - Werkzeug Documentation (3.1.x). Retrieved Dec, 2025 from https://werkzeug.palletsprojects.com/en/stable/utils/

  5. [5]

    ChatGPT goes temporarily “insane” with unexpected outputs, spooking users

    2024. ChatGPT goes temporarily “insane” with unexpected outputs, spooking users. Retrieved May, 2025 from https://arstechnica.com/information- technology/2024/02/chatgpt-alarms-users-by-spitting-out-shakespearean-nonsense-and-rambling/

  6. [6]

    More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks

    2024. More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks. Retrieved May, 2025 from https://www.cisco. com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-privacy-benchmark-study-2024.pdf

  7. [7]

    CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials

    Nov 21, 2024. CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials. https://www.cve.news/cve-2024-29291/

  8. [8]

    Common Vulnerability Scoring System v3.1: Speci!cation Document Rev 1. [n. d.]. Qualitative Severity Rating Scale. Retrieved June, 2025 from https://www.!rst.org/cvss/v3-1/speci!cation-document#Qualitative-Severity-Rating-Scale

  9. [9]

    Mazurek, and Christian Stransky

    Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L. Mazurek, and Christian Stransky. 2016. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In 2016 IEEE Symposium on Security and Privacy (SP) . 289–305. doi:10.1109/SP.2016.25

  10. [10]

    Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. 2024. TrojanPuzzle: Covertly Poisoning Code-Suggestion Models. arXiv: 2301.02344 [cs.CR] https://arxiv.org/abs/2301.02344

  11. [11]

    Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model- Card.pdf

  12. [12]

    Owura Asare, Meiyappan Nagappan, and N. Asokan. 2023. Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code? Empirical Softw. Engg. 28, 6 (Sept. 2023), 24 pages. doi:10.1007/s10664-023-10380-1

  13. [13]

    Hala Assal and Sonia Chiasson. 2019. ’Think secure from the beginning’: A Survey with Software Developers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3290605.3300519

  14. [14]

    Python Cryptographic Authority. [n. d.]. Bcrypt. Retrieved July, 2025 from https://github.com/pyca/bcrypt/

  15. [15]

    Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, and Joshua Saxe. 2023. Pu...

  16. [16]

    Debug mode

    Paul Bischo$. March 22, 2022. “Debug mode" in popular webdev tool exposes credentials for hundreds of websites, including Donald Trump’s. https://www.comparitech.com/blog/vpn-privacy/debug-mode-exposes-credentials/

  17. [17]

    Erik Brynjolfsson, Danielle Li, and Lindsey R Raymond. 2023. Generative AI at Work. Working Paper 31161. National Bureau of Economic Research. doi:10.3386/w31161

  18. [18]

    Sylwia Budzynska. 2024. CodeQL zero to hero part 3: Security research with CodeQL. https://github.blog/security/vulnerability-research/codeql- zero-to-hero-part-3-security-research-with-codeql/

  19. [19]

    Domenico Cotroneo, Cristina Improta, Pietro Liguori, and Roberto Natella. 2024. Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (Lisbon, Portugal) (ICPC ’24). Association for Computing Machinery, New York, NY, USA, 280–292. doi:10.1145/3...

  20. [20]

    Okta Developer. [n. d.]. Sanitizing Data: Accept Known Good. Retrieved July, 2025 from https://developer.okta.com/books/api-security/sanitizing/ accept-good/

  21. [21]

    GitHub Docs. [n. d.]. About code scanning alerts. Retrieved June, 2025 from https://docs.github.com/en/code-security/code-scanning/managing- code-scanning-alerts/about-code-scanning-alerts

  22. [22]

    CodeQL documentation. [n. d.]. CWE coverage for Python. Retrieved June, 2025 from https://codeql.github.com/codeql-query-help/python-cwe/

  23. [23]

    Flask Documentation. [n. d.]. Quickstart. Retrieved July, 2025 from https://"ask.palletsprojects.com/en/stable/quickstart/

  24. [24]

    Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning. arXiv: 2406.03718 [cs.CR] https://arxiv.org/abs/2406.03718

  25. [25]

    Mike Elgan. 2022. ChatGPT: Finally, an AI chatbot worth talking to. Retrieved May, 2025 from https://www.computerworld.com/article/1615637/ chatgpt-!nally-an-ai-chatbot-worth-talking-to.html

  26. [26]

    Robin Emsley. 2023. ChatGPT: these are not hallucinations–they’re fabrications and falsi!cations. Schizophrenia 9, 1 (2023), 52

  27. [27]

    GitHub. [n. d.]. CodeQL. Retrieved June, 2025 from https://codeql.github.com/

  28. [28]

    Abenezer Golda, Kidus Mekonen, Amit Pandey, Anushka Singh, Vikas Hassija, Vinay Chamola, and Biplab Sikdar. 2024. Privacy and Security Concerns in Generative AI: A Comprehensive Survey. IEEE Access 12 (2024), 48126–48144. doi:10.1109/ACCESS.2024.3381611

  29. [29]

    Alice Gomstyn and Alexandra Jonker. 2024. Exploring privacy issues in the age of AI. Retrieved May, 2025 from https://www.ibm.com/think/ insights/ai-privacy

  30. [30]

    Dan Goodin. 2024. Meta pays the price for storing hundreds of millions of passwords in plaintext. Retrieved Dec, 2025 from https://arstechnica. com/security/2024/09/meta-slapped-with-101-million-!ne-for-storing-passwords-in-plaintext/

  31. [31]

    Nico Grant and Cade Metz. 2022. New Chatbot Is a ‘Code Red’ For Google’s Search Business. Retrieved May, 2025 from https://www.nytimes.com/ 2022/12/21/technology/ai-chatgpt-google-search.html 20 Srivathsan G Morkonda, Mahmoud Selim, and Hala Assal

  32. [32]

    Matthew Green and Matthew Smith. 2016. Developers are Not the Enemy!: The Need for Usable Security APIs. IEEE Security & Privacy 14, 5 (2016), 40–46. doi:10.1109/MSP.2016.111

  33. [33]

    Sep 5, 2025

    Hacken. Sep 5, 2025. Dangers of Laravel Debug Mode Enabled. https://hacken.io/discover/dangers-of-laravel-debug-mode-enabled/

  34. [34]

    IBM. [n. d.]. IBM watsonx Code Assistant. Retrieved Dec, 2025 from https://www.ibm.com/products/watsonx-code-assistant

  35. [35]

    Saki Imai. 2022. Is GitHub copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321. doi:10.1145/3510454.3522684

  36. [36]

    March 9, 2022

    Mackenzie Jackson. March 9, 2022. Samsung and Nvidia are the latest companies to involuntarily go open-source leaking company secrets. https: //blog.gitguardian.com/samsung-and-nvidia-are-the-latest-companies-to-involuntarily-go-open-source-potentially-leaking-company-secrets/

  37. [37]

    Nan Jiang, Xiaopeng LI, Shiqi Wang, Qiang Zhou, Baishakhi Ray, Varun Kumar, Xiaofei Ma, and Anoop Deoras. 2024. Training LLMs to better self-debug and explain code. In Neural Information Processing Systems (NeurIPS) . https://www.amazon.science/publications/training-llms-to-better- self-debug-and-explain-code

  38. [38]

    Avila, Jacob Brunelle, and Baba Mamadou Camara

    Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, and Baba Mamadou Camara. 2023. How Secure is Code Generated by ChatGPT?. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) . 2445–2451. doi:10.1109/SMC53992.2023.10394237

  39. [39]

    Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, and Wenhai Wang. 2024. Exploring ChatGPT’s Capabilities on Vulnerability Management. In 33rd USENIX Security Symposium (USENIX Security 24) . 811–828

  40. [40]

    Evolve North Ltd. 2025. Why Storing Passwords in Plain Text is a Bad Idea. Retrieved Dec, 2025 from https://www.evolvenorth.com/why-storing- passwords-in-plain-text-is-a-bad-idea/

  41. [41]

    Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh. 2024. Assessing the Security of GitHub Copilot’s Generated Code - A Targeted Replication Study . In 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 435–444...

  42. [42]

    Negar Maleki, Balaji Padmanabhan, and Kaushik Dutta. 2024. AI Hallucinations: A Misnomer Worth Clarifying. In 2024 IEEE Conference on Arti"cial Intelligence (CAI). 133–138. doi:10.1109/CAI59869.2024.00033

  43. [43]

    James Manyika, Michael Chui, Mehdi Miremadi, Jacques Bughin, Katy George, Paul Willmott, and Martin Dewhurst. 2017. A future that works: AI, automation, employment, and productivity. McKinsey Global Institute Research, Tech. Rep 60 (2017), 1–135

  44. [44]

    September 16, 2022

    Dan Milmo. September 16, 2022. Uber responding to ‘cybersecurity incident’ after hack. https://www.theguardian.com/technology/2022/sep/15/uber- computer-network-hack-report

  45. [45]

    MITRE. [n. d.]. CWE Database. Retrieved June, 2025 from https://cwe.mitre.org/index.html

  46. [46]

    Sidhant Narula, Mohammad Ghasemigol, Javier Carnerero-Cano, Amanda Minnich, Emil Lupu, and Daniel Takabi. 2025. Exploring Research and Tools in AI Security: A Systematic Mapping Study. IEEE Access 13 (2025), 84057–84080. doi:10.1109/ACCESS.2025.3567195

  47. [47]

    Jakob Nielsen. 2023. AI Improves Employee Productivity by 66%. Retrieved May, 2025 from https://www.nngroup.com/articles/ai-tools-productivity- gains/

  48. [48]

    NIST. [n. d.]. Search Vulnerability Database. Retrieved June, 2025 from https://nvd.nist.gov/vuln/search

  49. [49]

    Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. 2023. CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23) . 2133–2150

  50. [50]

    Louis Nkengakah. 2025. ChatGPT review: The Revolutionary AI Chatbot. Retrieved May, 2025 from https://aitheir.world/top-ai-tools/chatgpt- review-the-revolutionary-ai-chatbot

  51. [51]

    David Noever. 2023. Can Large Language Models Find And Fix Vulnerable Software? arXiv: 2308.10345 [cs.SE] https://arxiv.org/abs/2308.10345

  52. [52]

    Shakked Noy and Whitney Zhang. 2023. Experimental evidence on the productivity e$ects of generative arti!cial intelligence. Science 381, 6654 (2023), 187–192. doi:10.1126/science.adh2586

  53. [53]

    Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the psychology stupid: how heuristics explain software vulnerabilities and how priming can illuminate developer’s blind spots. In Proceedings of the 30th Annual Computer Security Applications Conference (New Orleans, Louisiana, USA) (ACSAC ’14). ...

  54. [54]

    OW ASP. [n. d.]. Path Traversal. Retrieved July, 2025 from https://owasp.org/www-community/attacks/Path_Traversal

  55. [55]

    OW ASP. [n. d.]. Source Code Analysis Tools. Retrieved July, 2025 from https://owasp.org/www-community/Source_Code_Analysis_Tools

  56. [56]

    OW ASP. 2025. Password Storage Cheat Sheet. Retrieved Dec, 2025 from https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_ Sheet.html

  57. [57]

    Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP) . 754–768. doi:10.1109/SP46214.2022.9833571

  58. [58]

    Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In 2023 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023.10179420

  59. [59]

    Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. doi:10.1145/3576915.3623157 Security of LLM-gene...

  60. [60]

    Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do users write more insecure code with AI assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen Denmark). ACM, New York, NY, USA, 2785–2799

  61. [61]

    Vilius Petkauskas. 2024. RockYou2024: 10 billion passwords leaked in the largest compilation of all time. Retrieved Dec, 2025 from https: //cybernews.com/security/rockyou2024-largest-password-compilation-leak/

  62. [62]

    Olgierd Pieczul, Simon Foley, and Mary Ellen Zurko. 2017. Developer-centered security and the symmetry of ignorance. In Proceedings of the 2017 New Security Paradigms Workshop (Santa Cruz, CA, USA) (NSPW ’17). Association for Computing Machinery, New York, NY, USA, 46–56. doi:10.1145/3171533.3171539

  63. [63]

    David Prosser. 2025. Worried About AI-Generated Code? Ask AI To Review It. Retrieved May, 2025 from https://www.forbes.com/sites/davidprosser/ 2025/05/07/worried-about-ai-generated-code-ask-ai-to-review-it/

  64. [64]

    PyYAML. [n. d.]. PyYAML Documentation. Retrieved July, 2025 from https://pyyaml.org/wiki/PyYAMLDocumentation

  65. [65]

    Chris Reddington. 2023. How companies are boosting productivity with generative AI. Retrieved May, 2025 from https://github.blog/ai-and- ml/generative-ai/how-companies-are-boosting-productivity-with-generative-ai/

  66. [66]

    Papalexakis, and Michalis Faloutsos

    Md Omar Faruk Rokon, Risul Islam, Ahmad Darki, Evangelos E. Papalexakis, and Michalis Faloutsos. 2020. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 149–163. https://www.usenix.org/conf...

  67. [67]

    Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: a user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Conference on Security Symposium (Anaheim, CA, USA) (SEC ’23). USENIX Association, USA, Article 124, 18 pages

  68. [68]

    Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In 30th USENIX Security Symposium (USENIX Security 21) . USENIX Association, 1559–1575. https://www.usenix.org/conference/ usenixsecurity21/presentation/schuster

  69. [69]

    Semrush. 2025. ChatGPT.com Website Tra#c, Ranking, Analytics [April 2025]. Retrieved May, 2025 from https://www.semrush.com/website/ chatgpt.com/overview/

  70. [70]

    Majumder, Maisha R

    Mohammed Latif Siddiq, Shafayat H. Majumder, Maisha R. Mim, Sourov Jajodia, and Joanna C. S. Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). 71–82. doi:10.1109/SCAM55253.2022.00014

  71. [71]

    Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (Singapore, Singapore) (MSR4P&S 2022). Association for Computing Machi...

  72. [72]

    Ramya Srinivasan and Ajay Chander. 2021. Biases in AI systems. Commun. ACM 64, 8 (2021), 44–49

  73. [73]

    ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover

    Stack Over"ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover"ow.co/2024/

  74. [74]

    Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection . In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE Computer Society, Los Alamitos, CA, USA, 2237–2248. doi:10.1109/ICSE48619.2023.00188

  75. [75]

    Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New...

  76. [76]

    Andrew Tarantola. 2023. How OpenAI’s ChatGPT has changed the world in just a year. Retrieved May, 2025 from https://www.engadget.com/how- openais-chatgpt-has-changed-the-world-in-just-a-year-140050053.html

  77. [77]

    The European Union. [n. d.]. Regulation - EU - 2024/1689. Retrieved May, 2025 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32024R1689

  78. [78]

    The Government of Canada. [n. d.]. Arti!cial Intelligence and Data Act. Retrieved May, 2025 from https://ised-isde.canada.ca/site/innovation- better-canada/en/arti!cial-intelligence-and-data-act

  79. [79]

    Catherine Tony, Markus Mutas, Nicolás E Díaz Ferreyra, and Riccardo Scandariato. 2023. Llmseceval: A Dataset of Natural Language Prompts for Security Evaluations. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) . IEEE, 588–592

  80. [80]

    October 10, 2022

    Bill Toulas. October 10, 2022. Toyota discloses data leak after access key exposed on GitHub. https://www.bleepingcomputer.com/news/security/ toyota-discloses-data-leak-after-access-key-exposed-on-github/

Showing first 80 references.