Security of LLM-generated Code: A Comparative Analysis

Hala Assal; Mahmoud Selim; Srivathsan G Morkonda

arxiv: 2605.23091 · v1 · pith:7OJ2TQLRnew · submitted 2026-05-21 · 💻 cs.SE · cs.AI· cs.CR

Security of LLM-generated Code: A Comparative Analysis

Srivathsan G Morkonda , Mahmoud Selim , Hala Assal This is my paper

Pith reviewed 2026-05-25 05:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CR

keywords LLM-generated codecode securityvulnerability analysisAI tools for developmentempirical studysoftware vulnerabilities

0 comments

The pith

Seven popular LLMs all generate code with vulnerabilities when prompted to mimic developer behavior, with most issues being critical or high severity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests seven large language models by giving them prompts designed to copy how developers request code in practice. It reports that every model returns code containing security vulnerabilities. Most of these vulnerabilities are classified as critical or high severity. This finding is relevant because many developers already use or plan to use these tools for real projects, potentially introducing security risks into software.

Core claim

When LLMs are prompted in ways that reflect typical developer usage for code generation, all seven models evaluated produce code that includes vulnerabilities, the majority of which have critical or high severity ratings.

What carries the argument

Developer-mimicking prompts applied to seven LLMs to generate code samples, followed by vulnerability analysis to identify security flaws in the outputs.

If this is right

Code from LLMs requires manual security review before use in production.
The security problems appear consistent across different popular models.
AI coding tools may need additional safeguards to reduce vulnerability introduction.
Existing production use of LLM code could already contain undetected issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The results imply that current LLMs lack sufficient built-in security awareness during code generation.
Extending the evaluation to more models or different prompting styles could reveal if this is universal.
Developers might benefit from tools that automatically scan and fix LLM-generated code for security issues.

Load-bearing premise

The prompts used successfully replicate the real-world prompting patterns of developers when generating code with LLMs.

What would settle it

A test where at least one of the seven LLMs generates code free of critical and high-severity vulnerabilities under the same prompting conditions.

Figures

Figures reproduced from arXiv: 2605.23091 by Hala Assal, Mahmoud Selim, Srivathsan G Morkonda.

**Figure 2.** Figure 2: The severity level of vulnerabilities found in code snippets generated by di [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: The box plots represent the distribution of the number of lines of code generated by each tool (the y-axis on the le [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: The rate of vulnerabilities in each tool [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

The majority of software developers use or are planning to use Artificial Intelligence (AI) tools in their development processes. Their top reasons include improving productivity and faster learning. In fact, Large Language Model (LLM)-generated code is currently in production, including in major tech companies. However, concerns were raised about the risks associated with the use of AI tools to generate code. In this paper, we focus our attention on the risks to software security. We empirically evaluate the security of code generated by seven popular LLMs. We build upon previous work to mimic the behaviours of developers when using LLMs to generate code. Our results show that all seven LLMs that we have evaluated generate code that contains vulnerabilities, the majority of which are of critical or high severity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports that all seven tested LLMs produce vulnerable code under developer-mimic prompts, but those prompts are not validated against real usage data.

read the letter

The core finding is straightforward: when the authors prompt seven LLMs in ways meant to copy how developers actually work, every model outputs code containing vulnerabilities and most of those are rated critical or high severity. That result lines up with earlier single-model studies but adds a side-by-side comparison and a more structured prompting protocol. The work is useful for anyone who needs a quick snapshot of current LLM output quality on security-sensitive tasks. The methods section appears to follow prior papers on prompt construction, which is a reasonable starting point. The main weakness is the untested assumption that the chosen prompts reflect typical developer behavior. The abstract mentions building on previous work to mimic developers, yet supplies no comparison to logged queries, survey responses, or public corpora that would show the prompts are representative rather than unusually security-sensitive. Without that check, the high vulnerability rate could be inflated relative to everyday use. Sample sizes, exact model versions, the scanner employed, and how severity was assigned are also not visible in the summary, so the numbers cannot be reproduced from the abstract alone. The paper is aimed at practitioners and tool builders who want empirical numbers on LLM code security rather than theoretical analysis. It is coherent on its own terms and engages the existing literature without obvious internal contradictions. A serious editor should send it for peer review so referees can inspect the prompt validation and raw data.

Referee Report

2 major / 0 minor

Summary. The paper empirically evaluates the security of code generated by seven popular LLMs. The authors construct prompts that build upon prior work to mimic developer behaviors when querying LLMs, and report that all seven models produce code containing vulnerabilities, the majority of which are of critical or high severity.

Significance. If the empirical findings are robust, the work is significant for software engineering and security research because LLM-generated code is already deployed in production environments at major companies. A comparative analysis across multiple models that quantifies vulnerability rates and severity distributions could inform guidelines for safe adoption of these tools and highlight the need for improved prompting or post-generation checks.

major comments (2)

[Abstract] Abstract: The central claim that results reflect real-world risk rests on the unvalidated assumption that the chosen prompts 'mimic the behaviours of developers.' No evidence is supplied that these prompts were checked against logged developer sessions, public query corpora, or surveys; if the prompts systematically omit context or guardrails relative to typical usage, the observed vulnerability rates could be inflated.
[Abstract] Abstract (and presumably §3 or §4): The abstract supplies no information on the identity of the seven LLMs, the number of code samples generated per model, the specific vulnerability scanner employed, or the method used to assign severity levels. These omissions render the headline empirical result non-reproducible and unverifiable from the provided description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments, which highlight opportunities to strengthen the clarity and reproducibility of our empirical study. We address each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that results reflect real-world risk rests on the unvalidated assumption that the chosen prompts 'mimic the behaviours of developers.' No evidence is supplied that these prompts were checked against logged developer sessions, public query corpora, or surveys; if the prompts systematically omit context or guardrails relative to typical usage, the observed vulnerability rates could be inflated.

Authors: We acknowledge the referee's concern. The prompts were adapted from prior published work that sought to emulate typical developer-LLM interactions for code generation tasks. However, this study did not include independent validation against real developer query logs, surveys, or public corpora. We agree this represents a limitation that could affect the generalizability of the risk estimates. In the revised manuscript we will expand the methodology section to describe the prompt construction process in greater detail, explicitly note the reliance on prior work, and add a limitations paragraph discussing the possibility that vulnerability rates may be inflated relative to production usage patterns that include more context or guardrails. revision: partial
Referee: [Abstract] Abstract (and presumably §3 or §4): The abstract supplies no information on the identity of the seven LLMs, the number of code samples generated per model, the specific vulnerability scanner employed, or the method used to assign severity levels. These omissions render the headline empirical result non-reproducible and unverifiable from the provided description.

Authors: We agree that the abstract should contain sufficient detail to make the core empirical claims reproducible at a high level. The full paper (Sections 3 and 4) specifies the seven LLMs, sample counts, scanner, and severity assignment procedure, but these were not summarized in the abstract. In the revision we will expand the abstract to name the models, report the number of samples per model, identify the vulnerability scanner, and briefly describe the severity classification method. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation

full rationale

The paper is an empirical comparative study that prompts seven LLMs to generate code using behaviors mimicked from prior work, then analyzes the outputs for vulnerabilities. No equations, derivations, fitted parameters, or predictions are present. The central claim rests on experimental results rather than any self-referential reduction, self-citation chain, or ansatz. The prompting method is an experimental design choice whose validity is external to the derivation (none exists), so no pattern from the enumerated list applies. This is the normal case of a self-contained empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the unstated assumption that automated vulnerability scanners correctly classify severity and that the chosen prompting simulation is representative; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Automated vulnerability detection tools produce reliable severity classifications for LLM-generated code.
The abstract uses these classifications to assert that the majority of vulnerabilities are critical or high severity.

pith-pipeline@v0.9.0 · 5659 in / 1160 out tokens · 39984 ms · 2026-05-25T05:14:54.110353+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

88 extracted references · 88 canonical work pages

[1]

[n. d.]. AI that builds with you. Retrieved May, 2025 from https://github.com/features/ai Security of LLM-generated Code: A Comparative Analysis. 19

work page 2025
[2]

[n. d.]. ChatGPT. Retrieved May, 2025 from https://chatgpt.com

work page 2025
[3]

[n. d.]. OpenAI. Retrieved May, 2025 from https://openai.com

work page 2025
[4]

[n. d.]. Utilities - Werkzeug Documentation (3.1.x). Retrieved Dec, 2025 from https://werkzeug.palletsprojects.com/en/stable/utils/

work page 2025
[5]

ChatGPT goes temporarily “insane” with unexpected outputs, spooking users

2024. ChatGPT goes temporarily “insane” with unexpected outputs, spooking users. Retrieved May, 2025 from https://arstechnica.com/information- technology/2024/02/chatgpt-alarms-users-by-spitting-out-shakespearean-nonsense-and-rambling/

work page 2024
[6]

More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks

2024. More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks. Retrieved May, 2025 from https://www.cisco. com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-privacy-benchmark-study-2024.pdf

work page 2024
[7]

CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials

Nov 21, 2024. CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials. https://www.cve.news/cve-2024-29291/

work page 2024
[8]

Common Vulnerability Scoring System v3.1: Speci!cation Document Rev 1. [n. d.]. Qualitative Severity Rating Scale. Retrieved June, 2025 from https://www.!rst.org/cvss/v3-1/speci!cation-document#Qualitative-Severity-Rating-Scale

work page 2025
[9]

Mazurek, and Christian Stransky

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L. Mazurek, and Christian Stransky. 2016. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In 2016 IEEE Symposium on Security and Privacy (SP) . 289–305. doi:10.1109/SP.2016.25

work page doi:10.1109/sp.2016.25 2016
[10]

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. 2024. TrojanPuzzle: Covertly Poisoning Code-Suggestion Models. arXiv: 2301.02344 [cs.CR] https://arxiv.org/abs/2301.02344

work page arXiv 2024
[11]

Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model- Card.pdf

work page 2024
[12]

Owura Asare, Meiyappan Nagappan, and N. Asokan. 2023. Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code? Empirical Softw. Engg. 28, 6 (Sept. 2023), 24 pages. doi:10.1007/s10664-023-10380-1

work page doi:10.1007/s10664-023-10380-1 2023
[13]

Hala Assal and Sonia Chiasson. 2019. ’Think secure from the beginning’: A Survey with Software Developers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3290605.3300519

work page doi:10.1145/3290605.3300519 2019
[14]

Python Cryptographic Authority. [n. d.]. Bcrypt. Retrieved July, 2025 from https://github.com/pyca/bcrypt/

work page 2025
[15]

Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, and Joshua Saxe. 2023. Pu...

work page arXiv 2023
[16]

Debug mode

Paul Bischo$. March 22, 2022. “Debug mode" in popular webdev tool exposes credentials for hundreds of websites, including Donald Trump’s. https://www.comparitech.com/blog/vpn-privacy/debug-mode-exposes-credentials/

work page 2022
[17]

Erik Brynjolfsson, Danielle Li, and Lindsey R Raymond. 2023. Generative AI at Work. Working Paper 31161. National Bureau of Economic Research. doi:10.3386/w31161

work page doi:10.3386/w31161 2023
[18]

Sylwia Budzynska. 2024. CodeQL zero to hero part 3: Security research with CodeQL. https://github.blog/security/vulnerability-research/codeql- zero-to-hero-part-3-security-research-with-codeql/

work page 2024
[19]

Domenico Cotroneo, Cristina Improta, Pietro Liguori, and Roberto Natella. 2024. Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (Lisbon, Portugal) (ICPC ’24). Association for Computing Machinery, New York, NY, USA, 280–292. doi:10.1145/3...

work page doi:10.1145/3643916.3644416 2024
[20]

Okta Developer. [n. d.]. Sanitizing Data: Accept Known Good. Retrieved July, 2025 from https://developer.okta.com/books/api-security/sanitizing/ accept-good/

work page 2025
[21]

GitHub Docs. [n. d.]. About code scanning alerts. Retrieved June, 2025 from https://docs.github.com/en/code-security/code-scanning/managing- code-scanning-alerts/about-code-scanning-alerts

work page 2025
[22]

CodeQL documentation. [n. d.]. CWE coverage for Python. Retrieved June, 2025 from https://codeql.github.com/codeql-query-help/python-cwe/

work page 2025
[23]

Flask Documentation. [n. d.]. Quickstart. Retrieved July, 2025 from https://"ask.palletsprojects.com/en/stable/quickstart/

work page 2025
[24]

Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning. arXiv: 2406.03718 [cs.CR] https://arxiv.org/abs/2406.03718

work page arXiv 2024
[25]

Mike Elgan. 2022. ChatGPT: Finally, an AI chatbot worth talking to. Retrieved May, 2025 from https://www.computerworld.com/article/1615637/ chatgpt-!nally-an-ai-chatbot-worth-talking-to.html

work page arXiv 2022
[26]

Robin Emsley. 2023. ChatGPT: these are not hallucinations–they’re fabrications and falsi!cations. Schizophrenia 9, 1 (2023), 52

work page 2023
[27]

GitHub. [n. d.]. CodeQL. Retrieved June, 2025 from https://codeql.github.com/

work page 2025
[28]

Abenezer Golda, Kidus Mekonen, Amit Pandey, Anushka Singh, Vikas Hassija, Vinay Chamola, and Biplab Sikdar. 2024. Privacy and Security Concerns in Generative AI: A Comprehensive Survey. IEEE Access 12 (2024), 48126–48144. doi:10.1109/ACCESS.2024.3381611

work page doi:10.1109/access.2024.3381611 2024
[29]

Alice Gomstyn and Alexandra Jonker. 2024. Exploring privacy issues in the age of AI. Retrieved May, 2025 from https://www.ibm.com/think/ insights/ai-privacy

work page 2024
[30]

Dan Goodin. 2024. Meta pays the price for storing hundreds of millions of passwords in plaintext. Retrieved Dec, 2025 from https://arstechnica. com/security/2024/09/meta-slapped-with-101-million-!ne-for-storing-passwords-in-plaintext/

work page 2024
[31]

Nico Grant and Cade Metz. 2022. New Chatbot Is a ‘Code Red’ For Google’s Search Business. Retrieved May, 2025 from https://www.nytimes.com/ 2022/12/21/technology/ai-chatgpt-google-search.html 20 Srivathsan G Morkonda, Mahmoud Selim, and Hala Assal

work page 2022
[32]

Matthew Green and Matthew Smith. 2016. Developers are Not the Enemy!: The Need for Usable Security APIs. IEEE Security & Privacy 14, 5 (2016), 40–46. doi:10.1109/MSP.2016.111

work page doi:10.1109/msp.2016.111 2016
[33]

Sep 5, 2025

Hacken. Sep 5, 2025. Dangers of Laravel Debug Mode Enabled. https://hacken.io/discover/dangers-of-laravel-debug-mode-enabled/

work page 2025
[34]

IBM. [n. d.]. IBM watsonx Code Assistant. Retrieved Dec, 2025 from https://www.ibm.com/products/watsonx-code-assistant

work page 2025
[35]

Saki Imai. 2022. Is GitHub copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321. doi:10.1145/3510454.3522684

work page doi:10.1145/3510454.3522684 2022
[36]

March 9, 2022

Mackenzie Jackson. March 9, 2022. Samsung and Nvidia are the latest companies to involuntarily go open-source leaking company secrets. https: //blog.gitguardian.com/samsung-and-nvidia-are-the-latest-companies-to-involuntarily-go-open-source-potentially-leaking-company-secrets/

work page 2022
[37]

Nan Jiang, Xiaopeng LI, Shiqi Wang, Qiang Zhou, Baishakhi Ray, Varun Kumar, Xiaofei Ma, and Anoop Deoras. 2024. Training LLMs to better self-debug and explain code. In Neural Information Processing Systems (NeurIPS) . https://www.amazon.science/publications/training-llms-to-better- self-debug-and-explain-code

work page 2024
[38]

Avila, Jacob Brunelle, and Baba Mamadou Camara

Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, and Baba Mamadou Camara. 2023. How Secure is Code Generated by ChatGPT?. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) . 2445–2451. doi:10.1109/SMC53992.2023.10394237

work page doi:10.1109/smc53992.2023.10394237 2023
[39]

Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, and Wenhai Wang. 2024. Exploring ChatGPT’s Capabilities on Vulnerability Management. In 33rd USENIX Security Symposium (USENIX Security 24) . 811–828

work page 2024
[40]

Evolve North Ltd. 2025. Why Storing Passwords in Plain Text is a Bad Idea. Retrieved Dec, 2025 from https://www.evolvenorth.com/why-storing- passwords-in-plain-text-is-a-bad-idea/

work page 2025
[41]

Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh. 2024. Assessing the Security of GitHub Copilot’s Generated Code - A Targeted Replication Study . In 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 435–444...

work page doi:10.1109/saner60148.2024.00051 2024
[42]

Negar Maleki, Balaji Padmanabhan, and Kaushik Dutta. 2024. AI Hallucinations: A Misnomer Worth Clarifying. In 2024 IEEE Conference on Arti"cial Intelligence (CAI). 133–138. doi:10.1109/CAI59869.2024.00033

work page doi:10.1109/cai59869.2024.00033 2024
[43]

James Manyika, Michael Chui, Mehdi Miremadi, Jacques Bughin, Katy George, Paul Willmott, and Martin Dewhurst. 2017. A future that works: AI, automation, employment, and productivity. McKinsey Global Institute Research, Tech. Rep 60 (2017), 1–135

work page 2017
[44]

September 16, 2022

Dan Milmo. September 16, 2022. Uber responding to ‘cybersecurity incident’ after hack. https://www.theguardian.com/technology/2022/sep/15/uber- computer-network-hack-report

work page 2022
[45]

MITRE. [n. d.]. CWE Database. Retrieved June, 2025 from https://cwe.mitre.org/index.html

work page 2025
[46]

Sidhant Narula, Mohammad Ghasemigol, Javier Carnerero-Cano, Amanda Minnich, Emil Lupu, and Daniel Takabi. 2025. Exploring Research and Tools in AI Security: A Systematic Mapping Study. IEEE Access 13 (2025), 84057–84080. doi:10.1109/ACCESS.2025.3567195

work page doi:10.1109/access.2025.3567195 2025
[47]

Jakob Nielsen. 2023. AI Improves Employee Productivity by 66%. Retrieved May, 2025 from https://www.nngroup.com/articles/ai-tools-productivity- gains/

work page 2023
[48]

NIST. [n. d.]. Search Vulnerability Database. Retrieved June, 2025 from https://nvd.nist.gov/vuln/search

work page 2025
[49]

Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. 2023. CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23) . 2133–2150

work page 2023
[50]

Louis Nkengakah. 2025. ChatGPT review: The Revolutionary AI Chatbot. Retrieved May, 2025 from https://aitheir.world/top-ai-tools/chatgpt- review-the-revolutionary-ai-chatbot

work page 2025
[51]

David Noever. 2023. Can Large Language Models Find And Fix Vulnerable Software? arXiv: 2308.10345 [cs.SE] https://arxiv.org/abs/2308.10345

work page arXiv 2023
[52]

Shakked Noy and Whitney Zhang. 2023. Experimental evidence on the productivity e$ects of generative arti!cial intelligence. Science 381, 6654 (2023), 187–192. doi:10.1126/science.adh2586

work page doi:10.1126/science.adh2586 2023
[53]

Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the psychology stupid: how heuristics explain software vulnerabilities and how priming can illuminate developer’s blind spots. In Proceedings of the 30th Annual Computer Security Applications Conference (New Orleans, Louisiana, USA) (ACSAC ’14). ...

work page doi:10.1145/2664243.2664254 2014
[54]

OW ASP. [n. d.]. Path Traversal. Retrieved July, 2025 from https://owasp.org/www-community/attacks/Path_Traversal

work page 2025
[55]

OW ASP. [n. d.]. Source Code Analysis Tools. Retrieved July, 2025 from https://owasp.org/www-community/Source_Code_Analysis_Tools

work page 2025
[56]

OW ASP. 2025. Password Storage Cheat Sheet. Retrieved Dec, 2025 from https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_ Sheet.html

work page 2025
[57]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP) . 754–768. doi:10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022
[58]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In 2023 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023.10179420

work page doi:10.1109/sp46215.2023.10179420 2023
[59]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. doi:10.1145/3576915.3623157 Security of LLM-gene...

work page doi:10.1145/3576915.3623157 2023
[60]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do users write more insecure code with AI assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen Denmark). ACM, New York, NY, USA, 2785–2799

work page 2023
[61]

Vilius Petkauskas. 2024. RockYou2024: 10 billion passwords leaked in the largest compilation of all time. Retrieved Dec, 2025 from https: //cybernews.com/security/rockyou2024-largest-password-compilation-leak/

work page 2024
[62]

Olgierd Pieczul, Simon Foley, and Mary Ellen Zurko. 2017. Developer-centered security and the symmetry of ignorance. In Proceedings of the 2017 New Security Paradigms Workshop (Santa Cruz, CA, USA) (NSPW ’17). Association for Computing Machinery, New York, NY, USA, 46–56. doi:10.1145/3171533.3171539

work page doi:10.1145/3171533.3171539 2017
[63]

David Prosser. 2025. Worried About AI-Generated Code? Ask AI To Review It. Retrieved May, 2025 from https://www.forbes.com/sites/davidprosser/ 2025/05/07/worried-about-ai-generated-code-ask-ai-to-review-it/

work page 2025
[64]

PyYAML. [n. d.]. PyYAML Documentation. Retrieved July, 2025 from https://pyyaml.org/wiki/PyYAMLDocumentation

work page 2025
[65]

Chris Reddington. 2023. How companies are boosting productivity with generative AI. Retrieved May, 2025 from https://github.blog/ai-and- ml/generative-ai/how-companies-are-boosting-productivity-with-generative-ai/

work page 2023
[66]

Papalexakis, and Michalis Faloutsos

Md Omar Faruk Rokon, Risul Islam, Ahmad Darki, Evangelos E. Papalexakis, and Michalis Faloutsos. 2020. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 149–163. https://www.usenix.org/conf...

work page 2020
[67]

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: a user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Conference on Security Symposium (Anaheim, CA, USA) (SEC ’23). USENIX Association, USA, Article 124, 18 pages

work page 2023
[68]

Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In 30th USENIX Security Symposium (USENIX Security 21) . USENIX Association, 1559–1575. https://www.usenix.org/conference/ usenixsecurity21/presentation/schuster

work page 2021
[69]

Semrush. 2025. ChatGPT.com Website Tra#c, Ranking, Analytics [April 2025]. Retrieved May, 2025 from https://www.semrush.com/website/ chatgpt.com/overview/

work page 2025
[70]

Majumder, Maisha R

Mohammed Latif Siddiq, Shafayat H. Majumder, Maisha R. Mim, Sourov Jajodia, and Joanna C. S. Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). 71–82. doi:10.1109/SCAM55253.2022.00014

work page doi:10.1109/scam55253.2022.00014 2022
[71]

Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (Singapore, Singapore) (MSR4P&S 2022). Association for Computing Machi...

work page doi:10.1145/3549035.3561184 2022
[72]

Ramya Srinivasan and Ajay Chander. 2021. Biases in AI systems. Commun. ACM 64, 8 (2021), 44–49

work page 2021
[73]

ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover

Stack Over"ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover"ow.co/2024/

work page 2024
[74]

Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection . In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE Computer Society, Los Alamitos, CA, USA, 2237–2248. doi:10.1109/ICSE48619.2023.00188

work page doi:10.1109/icse48619.2023.00188 2023
[75]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New...

work page doi:10.1145/3597503.3639117 2024
[76]

Andrew Tarantola. 2023. How OpenAI’s ChatGPT has changed the world in just a year. Retrieved May, 2025 from https://www.engadget.com/how- openais-chatgpt-has-changed-the-world-in-just-a-year-140050053.html

work page 2023
[77]

The European Union. [n. d.]. Regulation - EU - 2024/1689. Retrieved May, 2025 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32024R1689

work page 2024
[78]

The Government of Canada. [n. d.]. Arti!cial Intelligence and Data Act. Retrieved May, 2025 from https://ised-isde.canada.ca/site/innovation- better-canada/en/arti!cial-intelligence-and-data-act

work page 2025
[79]

Catherine Tony, Markus Mutas, Nicolás E Díaz Ferreyra, and Riccardo Scandariato. 2023. Llmseceval: A Dataset of Natural Language Prompts for Security Evaluations. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) . IEEE, 588–592

work page 2023
[80]

October 10, 2022

Bill Toulas. October 10, 2022. Toyota discloses data leak after access key exposed on GitHub. https://www.bleepingcomputer.com/news/security/ toyota-discloses-data-leak-after-access-key-exposed-on-github/

work page 2022

Showing first 80 references.

[1] [1]

[n. d.]. AI that builds with you. Retrieved May, 2025 from https://github.com/features/ai Security of LLM-generated Code: A Comparative Analysis. 19

work page 2025

[2] [2]

[n. d.]. ChatGPT. Retrieved May, 2025 from https://chatgpt.com

work page 2025

[3] [3]

[n. d.]. OpenAI. Retrieved May, 2025 from https://openai.com

work page 2025

[4] [4]

[n. d.]. Utilities - Werkzeug Documentation (3.1.x). Retrieved Dec, 2025 from https://werkzeug.palletsprojects.com/en/stable/utils/

work page 2025

[5] [5]

ChatGPT goes temporarily “insane” with unexpected outputs, spooking users

2024. ChatGPT goes temporarily “insane” with unexpected outputs, spooking users. Retrieved May, 2025 from https://arstechnica.com/information- technology/2024/02/chatgpt-alarms-users-by-spitting-out-shakespearean-nonsense-and-rambling/

work page 2024

[6] [6]

More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks

2024. More than 1 in 4 Organizations Banned Use of GenAI Over Privacy and Data Security Risks. Retrieved May, 2025 from https://www.cisco. com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-privacy-benchmark-study-2024.pdf

work page 2024

[7] [7]

CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials

Nov 21, 2024. CVE-2024-29291 - How A Log Leak in Laravel 8-11 Could Expose Your Database Credentials. https://www.cve.news/cve-2024-29291/

work page 2024

[8] [8]

Common Vulnerability Scoring System v3.1: Speci!cation Document Rev 1. [n. d.]. Qualitative Severity Rating Scale. Retrieved June, 2025 from https://www.!rst.org/cvss/v3-1/speci!cation-document#Qualitative-Severity-Rating-Scale

work page 2025

[9] [9]

Mazurek, and Christian Stransky

Yasemin Acar, Michael Backes, Sascha Fahl, Doowon Kim, Michelle L. Mazurek, and Christian Stransky. 2016. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In 2016 IEEE Symposium on Security and Privacy (SP) . 289–305. doi:10.1109/SP.2016.25

work page doi:10.1109/sp.2016.25 2016

[10] [10]

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. 2024. TrojanPuzzle: Covertly Poisoning Code-Suggestion Models. arXiv: 2301.02344 [cs.CR] https://arxiv.org/abs/2301.02344

work page arXiv 2024

[11] [11]

Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model- Card.pdf

work page 2024

[12] [12]

Owura Asare, Meiyappan Nagappan, and N. Asokan. 2023. Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code? Empirical Softw. Engg. 28, 6 (Sept. 2023), 24 pages. doi:10.1007/s10664-023-10380-1

work page doi:10.1007/s10664-023-10380-1 2023

[13] [13]

Hala Assal and Sonia Chiasson. 2019. ’Think secure from the beginning’: A Survey with Software Developers. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3290605.3300519

work page doi:10.1145/3290605.3300519 2019

[14] [14]

Python Cryptographic Authority. [n. d.]. Bcrypt. Retrieved July, 2025 from https://github.com/pyca/bcrypt/

work page 2025

[15] [15]

Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, and Joshua Saxe. 2023. Pu...

work page arXiv 2023

[16] [16]

Debug mode

Paul Bischo$. March 22, 2022. “Debug mode" in popular webdev tool exposes credentials for hundreds of websites, including Donald Trump’s. https://www.comparitech.com/blog/vpn-privacy/debug-mode-exposes-credentials/

work page 2022

[17] [17]

Erik Brynjolfsson, Danielle Li, and Lindsey R Raymond. 2023. Generative AI at Work. Working Paper 31161. National Bureau of Economic Research. doi:10.3386/w31161

work page doi:10.3386/w31161 2023

[18] [18]

Sylwia Budzynska. 2024. CodeQL zero to hero part 3: Security research with CodeQL. https://github.blog/security/vulnerability-research/codeql- zero-to-hero-part-3-security-research-with-codeql/

work page 2024

[19] [19]

Domenico Cotroneo, Cristina Improta, Pietro Liguori, and Roberto Natella. 2024. Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (Lisbon, Portugal) (ICPC ’24). Association for Computing Machinery, New York, NY, USA, 280–292. doi:10.1145/3...

work page doi:10.1145/3643916.3644416 2024

[20] [20]

Okta Developer. [n. d.]. Sanitizing Data: Accept Known Good. Retrieved July, 2025 from https://developer.okta.com/books/api-security/sanitizing/ accept-good/

work page 2025

[21] [21]

GitHub Docs. [n. d.]. About code scanning alerts. Retrieved June, 2025 from https://docs.github.com/en/code-security/code-scanning/managing- code-scanning-alerts/about-code-scanning-alerts

work page 2025

[22] [22]

CodeQL documentation. [n. d.]. CWE coverage for Python. Retrieved June, 2025 from https://codeql.github.com/codeql-query-help/python-cwe/

work page 2025

[23] [23]

Flask Documentation. [n. d.]. Quickstart. Retrieved July, 2025 from https://"ask.palletsprojects.com/en/stable/quickstart/

work page 2025

[24] [24]

Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning. arXiv: 2406.03718 [cs.CR] https://arxiv.org/abs/2406.03718

work page arXiv 2024

[25] [25]

Mike Elgan. 2022. ChatGPT: Finally, an AI chatbot worth talking to. Retrieved May, 2025 from https://www.computerworld.com/article/1615637/ chatgpt-!nally-an-ai-chatbot-worth-talking-to.html

work page arXiv 2022

[26] [26]

Robin Emsley. 2023. ChatGPT: these are not hallucinations–they’re fabrications and falsi!cations. Schizophrenia 9, 1 (2023), 52

work page 2023

[27] [27]

GitHub. [n. d.]. CodeQL. Retrieved June, 2025 from https://codeql.github.com/

work page 2025

[28] [28]

Abenezer Golda, Kidus Mekonen, Amit Pandey, Anushka Singh, Vikas Hassija, Vinay Chamola, and Biplab Sikdar. 2024. Privacy and Security Concerns in Generative AI: A Comprehensive Survey. IEEE Access 12 (2024), 48126–48144. doi:10.1109/ACCESS.2024.3381611

work page doi:10.1109/access.2024.3381611 2024

[29] [29]

Alice Gomstyn and Alexandra Jonker. 2024. Exploring privacy issues in the age of AI. Retrieved May, 2025 from https://www.ibm.com/think/ insights/ai-privacy

work page 2024

[30] [30]

Dan Goodin. 2024. Meta pays the price for storing hundreds of millions of passwords in plaintext. Retrieved Dec, 2025 from https://arstechnica. com/security/2024/09/meta-slapped-with-101-million-!ne-for-storing-passwords-in-plaintext/

work page 2024

[31] [31]

Nico Grant and Cade Metz. 2022. New Chatbot Is a ‘Code Red’ For Google’s Search Business. Retrieved May, 2025 from https://www.nytimes.com/ 2022/12/21/technology/ai-chatgpt-google-search.html 20 Srivathsan G Morkonda, Mahmoud Selim, and Hala Assal

work page 2022

[32] [32]

Matthew Green and Matthew Smith. 2016. Developers are Not the Enemy!: The Need for Usable Security APIs. IEEE Security & Privacy 14, 5 (2016), 40–46. doi:10.1109/MSP.2016.111

work page doi:10.1109/msp.2016.111 2016

[33] [33]

Sep 5, 2025

Hacken. Sep 5, 2025. Dangers of Laravel Debug Mode Enabled. https://hacken.io/discover/dangers-of-laravel-debug-mode-enabled/

work page 2025

[34] [34]

IBM. [n. d.]. IBM watsonx Code Assistant. Retrieved Dec, 2025 from https://www.ibm.com/products/watsonx-code-assistant

work page 2025

[35] [35]

Saki Imai. 2022. Is GitHub copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321. doi:10.1145/3510454.3522684

work page doi:10.1145/3510454.3522684 2022

[36] [36]

March 9, 2022

Mackenzie Jackson. March 9, 2022. Samsung and Nvidia are the latest companies to involuntarily go open-source leaking company secrets. https: //blog.gitguardian.com/samsung-and-nvidia-are-the-latest-companies-to-involuntarily-go-open-source-potentially-leaking-company-secrets/

work page 2022

[37] [37]

Nan Jiang, Xiaopeng LI, Shiqi Wang, Qiang Zhou, Baishakhi Ray, Varun Kumar, Xiaofei Ma, and Anoop Deoras. 2024. Training LLMs to better self-debug and explain code. In Neural Information Processing Systems (NeurIPS) . https://www.amazon.science/publications/training-llms-to-better- self-debug-and-explain-code

work page 2024

[38] [38]

Avila, Jacob Brunelle, and Baba Mamadou Camara

Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, and Baba Mamadou Camara. 2023. How Secure is Code Generated by ChatGPT?. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) . 2445–2451. doi:10.1109/SMC53992.2023.10394237

work page doi:10.1109/smc53992.2023.10394237 2023

[39] [39]

Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, and Wenhai Wang. 2024. Exploring ChatGPT’s Capabilities on Vulnerability Management. In 33rd USENIX Security Symposium (USENIX Security 24) . 811–828

work page 2024

[40] [40]

Evolve North Ltd. 2025. Why Storing Passwords in Plain Text is a Bad Idea. Retrieved Dec, 2025 from https://www.evolvenorth.com/why-storing- passwords-in-plain-text-is-a-bad-idea/

work page 2025

[41] [41]

Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh. 2024. Assessing the Security of GitHub Copilot’s Generated Code - A Targeted Replication Study . In 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 435–444...

work page doi:10.1109/saner60148.2024.00051 2024

[42] [42]

Negar Maleki, Balaji Padmanabhan, and Kaushik Dutta. 2024. AI Hallucinations: A Misnomer Worth Clarifying. In 2024 IEEE Conference on Arti"cial Intelligence (CAI). 133–138. doi:10.1109/CAI59869.2024.00033

work page doi:10.1109/cai59869.2024.00033 2024

[43] [43]

James Manyika, Michael Chui, Mehdi Miremadi, Jacques Bughin, Katy George, Paul Willmott, and Martin Dewhurst. 2017. A future that works: AI, automation, employment, and productivity. McKinsey Global Institute Research, Tech. Rep 60 (2017), 1–135

work page 2017

[44] [44]

September 16, 2022

Dan Milmo. September 16, 2022. Uber responding to ‘cybersecurity incident’ after hack. https://www.theguardian.com/technology/2022/sep/15/uber- computer-network-hack-report

work page 2022

[45] [45]

MITRE. [n. d.]. CWE Database. Retrieved June, 2025 from https://cwe.mitre.org/index.html

work page 2025

[46] [46]

Sidhant Narula, Mohammad Ghasemigol, Javier Carnerero-Cano, Amanda Minnich, Emil Lupu, and Daniel Takabi. 2025. Exploring Research and Tools in AI Security: A Systematic Mapping Study. IEEE Access 13 (2025), 84057–84080. doi:10.1109/ACCESS.2025.3567195

work page doi:10.1109/access.2025.3567195 2025

[47] [47]

Jakob Nielsen. 2023. AI Improves Employee Productivity by 66%. Retrieved May, 2025 from https://www.nngroup.com/articles/ai-tools-productivity- gains/

work page 2023

[48] [48]

NIST. [n. d.]. Search Vulnerability Database. Retrieved June, 2025 from https://nvd.nist.gov/vuln/search

work page 2025

[49] [49]

Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. 2023. CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23) . 2133–2150

work page 2023

[50] [50]

Louis Nkengakah. 2025. ChatGPT review: The Revolutionary AI Chatbot. Retrieved May, 2025 from https://aitheir.world/top-ai-tools/chatgpt- review-the-revolutionary-ai-chatbot

work page 2025

[51] [51]

David Noever. 2023. Can Large Language Models Find And Fix Vulnerable Software? arXiv: 2308.10345 [cs.SE] https://arxiv.org/abs/2308.10345

work page arXiv 2023

[52] [52]

Shakked Noy and Whitney Zhang. 2023. Experimental evidence on the productivity e$ects of generative arti!cial intelligence. Science 381, 6654 (2023), 187–192. doi:10.1126/science.adh2586

work page doi:10.1126/science.adh2586 2023

[53] [53]

Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the psychology stupid: how heuristics explain software vulnerabilities and how priming can illuminate developer’s blind spots. In Proceedings of the 30th Annual Computer Security Applications Conference (New Orleans, Louisiana, USA) (ACSAC ’14). ...

work page doi:10.1145/2664243.2664254 2014

[54] [54]

OW ASP. [n. d.]. Path Traversal. Retrieved July, 2025 from https://owasp.org/www-community/attacks/Path_Traversal

work page 2025

[55] [55]

OW ASP. [n. d.]. Source Code Analysis Tools. Retrieved July, 2025 from https://owasp.org/www-community/Source_Code_Analysis_Tools

work page 2025

[56] [56]

OW ASP. 2025. Password Storage Cheat Sheet. Retrieved Dec, 2025 from https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_ Sheet.html

work page 2025

[57] [57]

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP) . 754–768. doi:10.1109/SP46214.2022.9833571

work page doi:10.1109/sp46214.2022.9833571 2022

[58] [58]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In 2023 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023.10179420

work page doi:10.1109/sp46215.2023.10179420 2023

[59] [59]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2785–2799. doi:10.1145/3576915.3623157 Security of LLM-gene...

work page doi:10.1145/3576915.3623157 2023

[60] [60]

Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do users write more insecure code with AI assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen Denmark). ACM, New York, NY, USA, 2785–2799

work page 2023

[61] [61]

Vilius Petkauskas. 2024. RockYou2024: 10 billion passwords leaked in the largest compilation of all time. Retrieved Dec, 2025 from https: //cybernews.com/security/rockyou2024-largest-password-compilation-leak/

work page 2024

[62] [62]

Olgierd Pieczul, Simon Foley, and Mary Ellen Zurko. 2017. Developer-centered security and the symmetry of ignorance. In Proceedings of the 2017 New Security Paradigms Workshop (Santa Cruz, CA, USA) (NSPW ’17). Association for Computing Machinery, New York, NY, USA, 46–56. doi:10.1145/3171533.3171539

work page doi:10.1145/3171533.3171539 2017

[63] [63]

David Prosser. 2025. Worried About AI-Generated Code? Ask AI To Review It. Retrieved May, 2025 from https://www.forbes.com/sites/davidprosser/ 2025/05/07/worried-about-ai-generated-code-ask-ai-to-review-it/

work page 2025

[64] [64]

PyYAML. [n. d.]. PyYAML Documentation. Retrieved July, 2025 from https://pyyaml.org/wiki/PyYAMLDocumentation

work page 2025

[65] [65]

Chris Reddington. 2023. How companies are boosting productivity with generative AI. Retrieved May, 2025 from https://github.blog/ai-and- ml/generative-ai/how-companies-are-boosting-productivity-with-generative-ai/

work page 2023

[66] [66]

Papalexakis, and Michalis Faloutsos

Md Omar Faruk Rokon, Risul Islam, Ahmad Darki, Evangelos E. Papalexakis, and Michalis Faloutsos. 2020. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian, 149–163. https://www.usenix.org/conf...

work page 2020

[67] [67]

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at C: a user study on the security implications of large language model code assistants. In Proceedings of the 32nd USENIX Conference on Security Symposium (Anaheim, CA, USA) (SEC ’23). USENIX Association, USA, Article 124, 18 pages

work page 2023

[68] [68]

Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. 2021. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In 30th USENIX Security Symposium (USENIX Security 21) . USENIX Association, 1559–1575. https://www.usenix.org/conference/ usenixsecurity21/presentation/schuster

work page 2021

[69] [69]

Semrush. 2025. ChatGPT.com Website Tra#c, Ranking, Analytics [April 2025]. Retrieved May, 2025 from https://www.semrush.com/website/ chatgpt.com/overview/

work page 2025

[70] [70]

Majumder, Maisha R

Mohammed Latif Siddiq, Shafayat H. Majumder, Maisha R. Mim, Sourov Jajodia, and Joanna C. S. Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). 71–82. doi:10.1109/SCAM55253.2022.00014

work page doi:10.1109/scam55253.2022.00014 2022

[71] [71]

Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (Singapore, Singapore) (MSR4P&S 2022). Association for Computing Machi...

work page doi:10.1145/3549035.3561184 2022

[72] [72]

Ramya Srinivasan and Ajay Chander. 2021. Biases in AI systems. Commun. ACM 64, 8 (2021), 44–49

work page 2021

[73] [73]

ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover

Stack Over"ow. 2024. 2024 Developer Survey. Retrieved May, 2025 from https://survey.stackover"ow.co/2024/

work page 2024

[74] [74]

Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection . In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE Computer Society, Los Alamitos, CA, USA, 2237–2248. doi:10.1109/ICSE48619.2023.00188

work page doi:10.1109/icse48619.2023.00188 2023

[75] [75]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New...

work page doi:10.1145/3597503.3639117 2024

[76] [76]

Andrew Tarantola. 2023. How OpenAI’s ChatGPT has changed the world in just a year. Retrieved May, 2025 from https://www.engadget.com/how- openais-chatgpt-has-changed-the-world-in-just-a-year-140050053.html

work page 2023

[77] [77]

The European Union. [n. d.]. Regulation - EU - 2024/1689. Retrieved May, 2025 from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32024R1689

work page 2024

[78] [78]

The Government of Canada. [n. d.]. Arti!cial Intelligence and Data Act. Retrieved May, 2025 from https://ised-isde.canada.ca/site/innovation- better-canada/en/arti!cial-intelligence-and-data-act

work page 2025

[79] [79]

Catherine Tony, Markus Mutas, Nicolás E Díaz Ferreyra, and Riccardo Scandariato. 2023. Llmseceval: A Dataset of Natural Language Prompts for Security Evaluations. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) . IEEE, 588–592

work page 2023

[80] [80]

October 10, 2022

Bill Toulas. October 10, 2022. Toyota discloses data leak after access key exposed on GitHub. https://www.bleepingcomputer.com/news/security/ toyota-discloses-data-leak-after-access-key-exposed-on-github/

work page 2022