Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Danfeng Yao; Na Meng; Shravya Kanchi; Xiaoyan Zang; Ying Zhang

arxiv: 2605.03956 · v1 · submitted 2026-05-05 · 💻 cs.CR · cs.SE

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Shravya Kanchi , Xiaoyan Zang , Ying Zhang , Danfeng Yao , Na Meng This is my paper

Pith reviewed 2026-05-07 15:09 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords proof-of-vulnerability testssoftware supply chain securityLLM-based test generationvulnerable librariescall path analysisagent-based test generationJava applicationsexecutable security tests

0 comments

The pith

PoVSmith combines call path analysis with LLM prompts and execution feedback to automatically generate proof-of-vulnerability tests for applications using vulnerable libraries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Applications built on third-party libraries can become vulnerable when library flaws are reachable through app code. Developers need concrete, executable proof-of-vulnerability tests to judge real security risk, yet manual creation is difficult and prior automation falls short. PoVSmith feeds call-path details, exemplar tests, and runtime feedback into prompts for coding agents and large language models to produce, run, and evaluate such tests. Evaluated on 33 Java app-library pairs with known vulnerabilities, the method identified 96 percent of relevant entry points correctly and generated 152 tests of which 55 percent demonstrated feasible attacks. It reduces reliance on human effort while raising test quality over earlier LLM-only baselines.

Core claim

PoVSmith is a new agent-based approach that integrates static call-path analysis, code context, and iterative execution feedback into multiple prompts to direct a coding agent and large language model through test generation, execution, and quality assessment, yielding executable PoV tests that expose how library vulnerabilities propagate into dependent applications.

What carries the argument

The iterative prompting loop that supplies call-path information from application entry points to vulnerable library APIs together with execution logs to guide LLM test creation and refinement.

If this is right

Developers receive concrete evidence of supply-chain risks without writing tests themselves.
96 percent of application-level entry points that reach vulnerable library APIs are located along with their call paths.
55 percent of the 152 generated tests succeed in demonstrating feasible attacks on the applications.
Human involvement drops while test quality rises compared with prior LLM-based methods.
The same prompting structure supports both test creation and automated quality assessment grounded in context and logs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The technique could be extended to languages other than Java by adapting the call-path extractor and runtime instrumentation.
Embedding PoVSmith in continuous-integration pipelines would allow automatic flagging of exploitable dependency vulnerabilities before deployment.
Higher success rates might follow from richer feedback signals such as coverage metrics or symbolic execution traces.
The generated tests could serve as regression oracles for future library updates to confirm that fixes remain effective.

Load-bearing premise

LLM-generated tests guided by call paths and execution feedback reliably indicate real-world attack feasibility without systematic false positives or negatives.

What would settle it

Independent manual verification by security experts showing that a substantial fraction of the 84 tests labeled successful do not actually produce exploitable behavior in the target applications, or that many known feasible attacks are missed.

Figures

Figures reproduced from arXiv: 2605.03956 by Danfeng Yao, Na Meng, Shravya Kanchi, Xiaoyan Zang, Ying Zhang.

**Figure 1.** Figure 1: The threat model of software supply chain attacks view at source ↗

**Figure 2.** Figure 2: PoVSmith has four phases As shown in view at source ↗

**Figure 3.** Figure 3: A simplified version of the prompt template we view at source ↗

**Figure 4.** Figure 4: The template used for iterative PoV test generation view at source ↗

**Figure 5.** Figure 5: The prompt template we used to assess test quality view at source ↗

**Figure 6.** Figure 6: One PoV test that PoVSmith successfully generated view at source ↗

**Figure 7.** Figure 7: One generated test that fails to demonstrate PoV view at source ↗

**Figure 8.** Figure 8: Distribution of call paths by length have length 1, meaning that the identified methods directly call vulnerable APIs; the other 76 paths have longer lengths (i.e., 2–5), showing how the identified methods indirectly call APIs. These 216 paths correspond to 158 unique source methods, as some paths share the same source and sink methods. For the nine call paths that were incorrectly identified, we observed… view at source ↗

read the original abstract

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PoVSmith steers LLMs with call-path analysis and execution feedback to generate PoV tests, hitting 96% entry-point accuracy and 55% effective tests on 33 pairs, but the effectiveness label comes from another LLM pass with no external check.

read the letter

The main point is that this work automates creation of proof-of-vulnerability tests for library issues reachable from application code. It feeds static call paths, exemplar tests, and code context into Codex and GPT prompts, then loops execution results back for refinement and assessment. On 33 Java app-lib pairs it locates 152 of 158 entry points correctly and produces 152 tests, of which 84 are marked effective at showing feasible attacks. The authors report lower human effort and better test quality than a prior LLM baseline. That combination of analysis-driven prompting plus iterative feedback is the concrete addition over plain LLM test generation. The numbers are straightforward and the setup targets a real developer pain point around supply-chain risk assessment. The weak part is the success metric itself. The abstract ties the 55% figure to an LLM assessor that scores tests using only the supplied call-path context and execution logs. No human oracle, CVE-specific exploit check, or comparison against manually written PoVs is described to bound false positives. If the assessor sometimes labels tests that merely reach the vulnerable API without triggering the actual condition, the effectiveness count and the outperformance claim rest on shaky ground. The paper is for people building automated security tools or studying LLM agents for code tasks. A reader who needs concrete examples of prompt chaining and feedback loops will find usable material, provided the full version includes the actual prompts and failure cases. It should go to peer review because the prototype works on real pairs and produces measurable output; referees can press on the validation design and ask for tighter controls on the LLM assessor.

Referee Report

2 major / 2 minor

Summary. The paper presents PoVSmith, an LLM-agent approach that combines call-path analysis, exemplar tests, code context, and execution feedback to generate and assess proof-of-vulnerability (PoV) tests for Java applications that depend on vulnerable libraries. On 33 App-Lib pairs, it reports identifying 158 entry points (96% accuracy) and producing 152 tests, of which 84 (55%) are assessed by an LLM as demonstrating feasible attacks on the applications.

Significance. If the LLM-based feasibility judgments prove reliable, the work could meaningfully lower the barrier for developers to obtain concrete evidence of supply-chain risk, complementing existing static analysis and fuzzing tools. The agent-driven iterative refinement loop and grounding of assessment in both context and logs represent a practical advance over prior LLM-only baselines for security test generation.

major comments (2)

[Evaluation section] Evaluation section (results on 33 pairs and the 55% figure): the claim that 84 tests 'demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities' rests entirely on an LLM assessor that receives only the provided call-path context plus execution logs. No human oracle, CVE-specific exploit oracle, differential comparison against manually written PoVs, or other independent validation is described to calibrate the false-positive rate of this assessor. Because every downstream claim (outperformance vs. prior LLM baselines, reduction in human effort, practical effectiveness) is computed from the same LLM-labeled count, this is load-bearing for the central contribution.
[Abstract and Approach section] Abstract and Approach section: the description of the GPT-based quality assessor provides no details on the exact prompt template, decision criteria for labeling a test 'feasible,' handling of LLM variability (e.g., temperature, multiple runs), or inter-rater agreement with any external ground truth. This omission prevents readers from assessing the reproducibility and soundness of the 55% success rate.

minor comments (2)

[Abstract] The abstract states '152 (96%) of them were correctly found' but does not clarify whether the 6% error rate was measured against a manually verified ground truth or another automated method; adding this detail would strengthen the entry-point accuracy claim.
[Evaluation section] Table or figure presenting the 33 program pairs should include basic statistics (e.g., lines of code, number of vulnerable APIs per pair) to allow readers to judge the diversity and representativeness of the benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of PoVSmith to lower barriers for assessing supply-chain risks. We address each major comment below with clarifications and proposed revisions to improve the manuscript's rigor and reproducibility.

read point-by-point responses

Referee: [Evaluation section] Evaluation section (results on 33 pairs and the 55% figure): the claim that 84 tests 'demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities' rests entirely on an LLM assessor that receives only the provided call-path context plus execution logs. No human oracle, CVE-specific exploit oracle, differential comparison against manually written PoVs, or other independent validation is described to calibrate the false-positive rate of this assessor. Because every downstream claim (outperformance vs. prior LLM baselines, reduction in human effort, practical effectiveness) is computed from the same LLM-labeled count, this is load-bearing for the central contribution.

Authors: We acknowledge that the 55% feasibility rate is determined solely by the LLM assessor without an independent human oracle, CVE-specific exploit validation, or direct comparison to manually crafted PoVs. This choice supports scalability and aligns with our aim to reduce human involvement in PoV generation. The assessor receives call-path context, test code, and execution logs to ground judgments in observable behavior. To address the concern, we will revise the Evaluation section to explicitly discuss this as a limitation, add a small-scale manual calibration (reviewing a random subset of 20 tests for agreement with the LLM labels), and qualify the outperformance claims relative to baselines that use comparable automated assessment. This provides partial calibration of reliability without requiring a full re-evaluation of all 152 tests. revision: partial
Referee: [Abstract and Approach section] Abstract and Approach section: the description of the GPT-based quality assessor provides no details on the exact prompt template, decision criteria for labeling a test 'feasible,' handling of LLM variability (e.g., temperature, multiple runs), or inter-rater agreement with any external ground truth. This omission prevents readers from assessing the reproducibility and soundness of the 55% success rate.

Authors: We agree that the current description lacks sufficient detail on the assessor for full reproducibility. In the revised manuscript, we will expand the Approach section to include the complete prompt template, the precise decision criteria (e.g., positive label if logs indicate successful vulnerability trigger such as exception patterns or data exfiltration), our use of fixed low temperature (0.0) and single-run execution per test to minimize variability, and an explicit statement that inter-rater agreement with external ground truth was not computed. We will also add this as a noted limitation with suggestions for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or evaluation chain

full rationale

The paper describes an LLM-guided test generation pipeline (call-path analysis + prompting + execution feedback + LLM assessment) and reports direct counts on an external set of 33 App-Lib pairs. The 152 tests and 84/152 success figure are produced by applying the described procedure to those pairs; success is measured by the LLM assessor using provided context and logs, but this is an explicit component of the method rather than a self-definitional loop or fitted parameter renamed as prediction. No equations, self-citations, or uniqueness theorems are invoked to force the result. The evaluation therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the unproven assumption that LLMs can be reliably steered for security test generation; the paper introduces the PoVSmith system as its main contribution.

axioms (1)

domain assumption Large language models can produce correct, executable security tests when supplied with call paths, code context, and execution feedback.
This capability is invoked as the foundation for the entire test generation and refinement process.

invented entities (1)

PoVSmith no independent evidence
purpose: Agent-based system for automated PoV test generation
The paper proposes this new tool without external independent validation beyond the internal evaluation.

pith-pipeline@v0.9.0 · 5617 in / 1276 out tokens · 74430 ms · 2026-05-07T15:09:48.550877+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 38 canonical work pages

[1]

Arrange-Act-Assert: A Pattern for Writing Good Tests

2020. Arrange-Act-Assert: A Pattern for Writing Good Tests. https://automationpanda.com/2020/07/07/arrange-act-assert-a-pattern- for-writing-good-tests/

2020
[2]

OWASP Dependency-Check

2020. OWASP Dependency-Check. https://owasp.org/www-project- dependency-check/

2020
[3]

Find Security Bugs

2021. Find Security Bugs. https://find-sec-bugs.github.io/

2021
[4]

Supply chain attacks show why you should be wary of third-party providers

2021. Supply chain attacks show why you should be wary of third-party providers

2021
[5]

alibaba / fastjson

2023. alibaba / fastjson. https://github.com/alibaba/fastjson

2023
[6]

american fuzzy lop

2023. american fuzzy lop. https://lcamtuf.coredump.cx/afl/

2023
[7]

2023. Codec. https://commons.apache.org/proper/commons-codec/

2023
[8]

2023. Dom4j. https://dom4j.github.io

2023
[9]

OSS-Fuzz

2023. OSS-Fuzz. https://google.github.io/oss-fuzz/

2023
[10]

spring-projects / spring-security

2023. spring-projects / spring-security. https://github.com/spring-projects/ spring-security

2023
[11]

Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects

2024. Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects. https://cybermagazine.com/articles/amazon-how-moveit-supply-chain- attack-left-lasting-effects

2024
[12]

GPT-5.1: A smarter, more conversational ChatGPT

2025. GPT-5.1: A smarter, more conversational ChatGPT. https://openai.com/ index/gpt-5-1/

2025
[13]

Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure

2025. Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure. https://industrialcyber.co/reports/software- supply-chain-attacks-surge-as-ransomware-groups-escalate-and-industrial- sectors-face-more-exposure/

2025
[14]

The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself

2025. The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself. https://www.abs-group.com/Knowledge- Center/Insights/The-Log4j-Vulnerability-What-It-Is-What-Organizations- Are-at-Risk-and-How-You-Can-Protect-Yourself/

2025
[15]

What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them

2025. What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them. https://safe.security/resources/insights/what- are-software-supply-chain-vulnerabilities-understanding-the-risks-how-to- mitigate-them/#How-Attackers-Exploit-These-Vulnerabilities

2025
[16]

2026 Software Supply Chain Security Report

2026. 2026 Software Supply Chain Security Report. https://www.reversinglabs. com/sscs-report

2026
[17]

2026. CodeQL. https://codeql.github.com

2026
[18]

Codex | AI Coding Partner from OpenAI | OpenAI

2026. Codex | AI Coding Partner from OpenAI | OpenAI. https://openai.com/ codex/

2026
[19]

Dependabot

2026. Dependabot. https://github.com/dependabot

2026
[20]

dependency-check vulnerabilities

2026. dependency-check vulnerabilities. https://security.snyk.io/package/npm/ dependency-check

2026
[21]

Gemini Code Assist | AI coding assistant

2026. Gemini Code Assist | AI coding assistant. https://codeassist.google

2026
[22]

go4retro/tcpser4j

2026. go4retro/tcpser4j. https://github.com/go4retro/tcpser4j/blob/ 7a3dbd8d719c0b256bb49da85227b73d580c6c82/gensrc/org/jbrain/tcpser4j/ binding/PhoneBook.java

2026
[23]

huangsigit/commerce

2026. huangsigit/commerce. https://github.com/huangsigit/commerce/ blob/899f81c2080bfa4223176e4bd06c701b4d50958c/src/main/java/com/egao/ common/core/utils/JSONUtil.java

2026
[24]

Introducing GPT-5.2-Codex | OpenAI

2026. Introducing GPT-5.2-Codex | OpenAI. https://openai.com/index/ introducing-gpt-5-2-codex/

2026
[25]

mistralai/mistral-vibe: Minimal CLI coding agent by Mistral

2026. mistralai/mistral-vibe: Minimal CLI coding agent by Mistral. https://github. com/mistralai/mistral-vibe

2026
[26]

NVD - cve-2018-1000632

2026. NVD - cve-2018-1000632. https://nvd.nist.gov/vuln/detail/cve-2018- 1000632

2026
[27]

soot-oss/soot: Soot - A Java optimization framework

2026. soot-oss/soot: Soot - A Java optimization framework. https://github.com/ soot-oss/soot

2026
[28]

wala/WALA: T.J

2026. wala/WALA: T.J. Watson Libraries for Analysis, with front ends for Java, Android, and JavaScript, and many common static program analyses. https: //github.com/wala/wala

2026
[29]

What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding

2026. What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding

2026
[30]

What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing

2026. What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing

2026
[31]

What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/

2026. What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/

2026
[32]

Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, and Hammond Pearce. 2024. On Hardware Security Bug Code Fixes by Prompting Large Lan- guage Models.IEEE Transactions on Information Forensics and Security19 (2024), 4043–4057. doi:10.1109/TIFS.2024.3374558

work page doi:10.1109/tifs.2024.3374558 2024
[33]

Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C

Kaled M. Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C. Cordeiro. 2021. FuSeBMC: An Energy-Efficient Test Generator for Finding Security Vulnerabilities in C Programs. InTests and Proofs, Frédéric Loulergue and Franz Wotawa (Eds.). Springer International Publishing, Cham, 85–105

2021
[34]

Schwartz, Mav- erick Woo, and David Brumley

Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Mav- erick Woo, and David Brumley. 2014. Automatic exploit generation.Commun. ACM57, 2 (Feb. 2014), 74–84. doi:10.1145/2560217.2560219

work page doi:10.1145/2560217.2560219 2014
[35]

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. In2012 IEEE Symposium on Security and Privacy. 380–394. doi:10.1109/SP.2012.31

work page doi:10.1109/sp.2012.31 2012
[36]

Sujita Chaudhary, Austin O’Brien, and Shengjie Xu. 2020. Automated Post-Breach Penetration Testing through Reinforcement Learning. In2020 IEEE Conference on Communications and Network Security (CNS). 1–2. doi:10.1109/CNS48642.2020. 9162301

work page doi:10.1109/cns48642.2020 2020
[37]

Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2023. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code.IEEE Transactions on Software Engineering49, 1 (2023), 147–165. doi:10.1109/TSE.2022.3147265

work page doi:10.1109/tse.2022.3147265 2023
[38]

Jianlei Chi, Yu Qu, Ting Liu, Qinghua Zheng, and Heng Yin. 2023. SeqTrans: Au- tomatic Vulnerability Fix Via Sequence to Sequence Learning.IEEE Transactions on Software Engineering49, 2 (2023), 564–585. doi:10.1109/TSE.2022.3156637

work page doi:10.1109/tse.2022.3156637 2023
[39]

Ge Chu and Alexei Lisitsa. 2018. Penetration Testing for Internet of Things and Its Automation. In2018 IEEE 20th International Conference on High Per- formance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Sys- tems (HPCC/SmartCity/DSS). 1479–1484. doi:10.1109/HPCC/Sma...

work page doi:10.1109/hpcc/smartcity/dss.2018 2018
[40]

Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi- Task Instruction Fine-Tuning. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics,...

work page doi:10.18653/v1/2024.findings-acl.625 2024
[41]

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung
[42]

InPro- ceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore) (ESEC/FSE 2022)

VulRepair: a T5-based automated software vulnerability repair. InPro- ceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. doi:10.1145/3540250.3549098

work page doi:10.1145/3540250.3549098 2022
[43]

Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, and Fran Casino. 2024. Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection. InComputer Security – ESORICS 2024: 29th European Symposium on Research in Computer Security, Bydgoszcz, Poland, September 16–20, 2024, Proceedings, Part I(Bydgoszcz, Poland). Springer-Ve...

work page doi:10.1007/978-3-031-70879-4_14 2024
[44]

Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu. 2023. Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). 297–306. doi:10.1109/ TPS-ISA58951.2023.00044

work page arXiv 2023
[45]

Zhenguo Hu, Razvan Beuran, and Yasuo Tan. 2020. Automated Penetration Test- ing Using Deep Reinforcement Learning. In2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). 2–10. doi:10.1109/EuroSPW51379. 2020.00010

work page doi:10.1109/eurospw51379 2020
[46]

Junjie Huang and Quanyan Zhu. 2024. PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation. InProceedings of the Workshop on Autonomous Cybersecurity(Salt Lake City, UT, USA)(AutonomousCyber ’24). Association for Computing Machinery, New York, NY, USA, 11–22. doi:10.1145/ 3689933.3690831

work page arXiv 2024
[47]

Emanuele Iannone, Dario Di Nucci, Antonino Sabetta, and Andrea De Lucia
[48]

In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)

Toward automated exploit generation for known vulnerabilities in open- source libraries. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 396–400
[49]

Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. InferFix: End-to-End Program Repair with LLMs. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing...

work page doi:10.1145/3611643.3613892 2023
[50]

Md Mahir Asef Kabir, Ying Wang, Danfeng Yao, and Na Meng. 2022. How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages?. In2022 IEEE Secure Development Conference (SecDev). IEEE Computer Society, Los Alamitos, CA, USA, 77–83. doi:10.1109/SecDev53368.2022.00027

work page doi:10.1109/secdev53368.2022.00027 2022
[51]

Hong Jin Kang, Truong Giang Nguyen, Bach Le, Corina S Păsăreanu, and David Lo. 2022. Test mimicry to assess the exploitability of library vulnerabilities. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 276–288

2022
[52]

11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng

Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bod- den, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, et al. 11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng
[53]

In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

CogniCrypt: supporting developers in using cryptography. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936
[54]

Tan Khang Le, Saba Alimadadi, and Steven Y. Ko. 2024. A Study of Vulnerability Repair in JavaScript Programs with Large Language Models. InCompanion Proceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 666–669. doi:10. 1145/3589335.3651463

work page arXiv 2024
[55]

Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, and Shuiguang Deng
[56]

InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24)

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair. InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York, NY, USA, 719–731. doi:10.1145/3691620.3695066

work page doi:10.1145/3691620.3695066
[57]

Bissyandé

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis(Beijing, China)(ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 31–42. doi:10.1145/3293882.3330577

work page doi:10.1145/3293882.3330577 2019
[58]

Zhihong Liu, Qing Liao, Wenchao Gu, and Cuiyun Gao. 2023. Software Vulner- ability Detection with GPT and In-Context Learning. In2023 8th International Conference on Data Science in Cyberspace (DSC). 229–236. doi:10.1109/DSC59305. 2023.00041

work page doi:10.1109/dsc59305 2023
[59]

Yunlong Lyu, Yuxuan Xie, Peng Chen, and Hao Chen. 2024. Prompt Fuzzing for Fuzz Driver Generation. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security(Salt Lake City, UT, USA)(CCS ’24). Association for Computing Machinery, New York, NY, USA, 3793–3807. doi:10. 1145/3658644.3670396

work page arXiv 2024
[60]

Siqi Ma, Ferdian Thung, David Lo, Cong Sun, and Robert H. Deng. 2017. VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 229–246

2017
[61]

Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: The Cardumen Mode of Astor. InSearch-Based Software Engineering, Thelma Elita Colanzi and Phil McMinn (Eds.). Springer International Publishing, Cham, 65–86

2018
[62]

Ravindra Metta, Raveendra Kumar Medicherla, and Samarjit Chakraborty. 2022. BMC+Fuzz: Efficient and Effective Test Generation. In2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1419–1424. doi:10.23919/ DATE54114.2022.9774672

work page arXiv 2022
[63]

Marwan Omar and Stavros Shiaeles. 2023. VulDetect: A novel technique for detecting software vulnerabilities using Language Models. In2023 IEEE Interna- tional Conference on Cyber Security and Resilience (CSR). 105–110. doi:10.1109/ CSR57506.2023.10224924

work page arXiv 2023
[64]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023. 10179420

work page doi:10.1109/sp46215.2023 2023
[65]

Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2020. Detection, assess- ment and mitigation of vulnerabilities in open source dependencies.Empirical Software Engineering25, 5 (2020), 3175–3215

2020
[66]

Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, and Howon Kim. 2024. CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher. Sensors24, 21 (2024). doi:10.3390/s24216878

work page doi:10.3390/s24216878 2024
[67]

Radford, and Bill Chu

Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. 2023. Software Vulnerability Detection using Large Language Models. In2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112–119. doi:10.1109/ISSREW60843.2023.00058

work page doi:10.1109/issrew60843.2023.00058 2023
[68]

Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, and Danfeng Yao. 2019. Cryptoguard: High precision detec- tion of cryptographic vulnerabilities in massive-sized Java projects. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472

2019
[69]

Maria Rigaki, Ondřej Lukáš, Carlos Catania, and Sebastian Garcia. 2024. Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments. InPro- ceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART. INSTICC, SciTePress, 774–781. doi:10.5220/0012391800003636

work page doi:10.5220/0012391800003636 2024
[70]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Con- tracts by Combining GPT with Program Analysis. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New ...

work page doi:10.1145/3597503.3639117 2024
[71]

2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition

Ari Takanen, Jared Demott, Charles Miller, and Atte Kettunen. 2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition. Artech House

2017
[72]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision.IEEE Trans. Softw. Eng.50, 4 (April 2024), 911–936. doi:10.1109/TSE. 2024.3368208

work page doi:10.1109/tse 2024
[73]

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Auto- mated Program Repair. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering(San Francisco, CA, USA)(ESEC/FSE 2023). Association ...

work page doi:10.1145/3611643.3616271 2023
[74]

Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How Effective Are Neural Networks for Fixing Security Vulnerabilities. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, WA, USA)(ISSTA 2023). Association for Computing Machinery, New York, ...

work page arXiv 2023
[75]

Hanxiang Xu, Wei Ma, Ting Zhou, Yanjie Zhao, Kai Chen, Qiang Hu, Yang Liu, and Haoyu Wang. 2024. CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph. arXiv:2411.11532 [cs.SE] https://arxiv. org/abs/2411.11532

work page arXiv 2024
[76]

Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhang, Haifeng Shen, and He Zhang. 2025. DLAP: A Deep Learning Augmented Large Language Model Prompting framework for software vulnerability detection.J. Syst. Softw.219, C (Jan. 2025), 15 pages. doi:10.1016/j.jss.2024.112234

work page doi:10.1016/j.jss.2024.112234 2025
[77]

Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang
[78]

InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria)(ISSTA 2024)

ThinkRepair: Self-Directed Automated Program Repair. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1274–1286. doi:10.1145/3650212.3680359

work page doi:10.1145/3650212.3680359 2024
[79]

Cen Zhang, Yaowen Zheng, Mingqiang Bai, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, and Yang Liu. 2024. How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis(Vienna, Austria)(ISSTA 2024). Association for Computing Machin...

work page doi:10.1145/3650212.3680355 2024
[80]

Jie Zhang, Haoyu Bu, Hui Wen, Yongji Liu, Haiqiang Fei, Rongrong Xi, Lun Li, Yun Yang, Hongsong Zhu, and Dan Meng. 2025. When LLMs meet cybersecurity: a systematic literature review.Cybersecurity8, 1 (2025), 55. doi:10.1186/s42400- 025-00361-w

work page doi:10.1186/s42400- 2025

Showing first 80 references.

[1] [1]

Arrange-Act-Assert: A Pattern for Writing Good Tests

2020. Arrange-Act-Assert: A Pattern for Writing Good Tests. https://automationpanda.com/2020/07/07/arrange-act-assert-a-pattern- for-writing-good-tests/

2020

[2] [2]

OWASP Dependency-Check

2020. OWASP Dependency-Check. https://owasp.org/www-project- dependency-check/

2020

[3] [3]

Find Security Bugs

2021. Find Security Bugs. https://find-sec-bugs.github.io/

2021

[4] [4]

Supply chain attacks show why you should be wary of third-party providers

2021. Supply chain attacks show why you should be wary of third-party providers

2021

[5] [5]

alibaba / fastjson

2023. alibaba / fastjson. https://github.com/alibaba/fastjson

2023

[6] [6]

american fuzzy lop

2023. american fuzzy lop. https://lcamtuf.coredump.cx/afl/

2023

[7] [7]

2023. Codec. https://commons.apache.org/proper/commons-codec/

2023

[8] [8]

2023. Dom4j. https://dom4j.github.io

2023

[9] [9]

OSS-Fuzz

2023. OSS-Fuzz. https://google.github.io/oss-fuzz/

2023

[10] [10]

spring-projects / spring-security

2023. spring-projects / spring-security. https://github.com/spring-projects/ spring-security

2023

[11] [11]

Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects

2024. Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects. https://cybermagazine.com/articles/amazon-how-moveit-supply-chain- attack-left-lasting-effects

2024

[12] [12]

GPT-5.1: A smarter, more conversational ChatGPT

2025. GPT-5.1: A smarter, more conversational ChatGPT. https://openai.com/ index/gpt-5-1/

2025

[13] [13]

Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure

2025. Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure. https://industrialcyber.co/reports/software- supply-chain-attacks-surge-as-ransomware-groups-escalate-and-industrial- sectors-face-more-exposure/

2025

[14] [14]

The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself

2025. The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself. https://www.abs-group.com/Knowledge- Center/Insights/The-Log4j-Vulnerability-What-It-Is-What-Organizations- Are-at-Risk-and-How-You-Can-Protect-Yourself/

2025

[15] [15]

What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them

2025. What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them. https://safe.security/resources/insights/what- are-software-supply-chain-vulnerabilities-understanding-the-risks-how-to- mitigate-them/#How-Attackers-Exploit-These-Vulnerabilities

2025

[16] [16]

2026 Software Supply Chain Security Report

2026. 2026 Software Supply Chain Security Report. https://www.reversinglabs. com/sscs-report

2026

[17] [17]

2026. CodeQL. https://codeql.github.com

2026

[18] [18]

Codex | AI Coding Partner from OpenAI | OpenAI

2026. Codex | AI Coding Partner from OpenAI | OpenAI. https://openai.com/ codex/

2026

[19] [19]

Dependabot

2026. Dependabot. https://github.com/dependabot

2026

[20] [20]

dependency-check vulnerabilities

2026. dependency-check vulnerabilities. https://security.snyk.io/package/npm/ dependency-check

2026

[21] [21]

Gemini Code Assist | AI coding assistant

2026. Gemini Code Assist | AI coding assistant. https://codeassist.google

2026

[22] [22]

go4retro/tcpser4j

2026. go4retro/tcpser4j. https://github.com/go4retro/tcpser4j/blob/ 7a3dbd8d719c0b256bb49da85227b73d580c6c82/gensrc/org/jbrain/tcpser4j/ binding/PhoneBook.java

2026

[23] [23]

huangsigit/commerce

2026. huangsigit/commerce. https://github.com/huangsigit/commerce/ blob/899f81c2080bfa4223176e4bd06c701b4d50958c/src/main/java/com/egao/ common/core/utils/JSONUtil.java

2026

[24] [24]

Introducing GPT-5.2-Codex | OpenAI

2026. Introducing GPT-5.2-Codex | OpenAI. https://openai.com/index/ introducing-gpt-5-2-codex/

2026

[25] [25]

mistralai/mistral-vibe: Minimal CLI coding agent by Mistral

2026. mistralai/mistral-vibe: Minimal CLI coding agent by Mistral. https://github. com/mistralai/mistral-vibe

2026

[26] [26]

NVD - cve-2018-1000632

2026. NVD - cve-2018-1000632. https://nvd.nist.gov/vuln/detail/cve-2018- 1000632

2026

[27] [27]

soot-oss/soot: Soot - A Java optimization framework

2026. soot-oss/soot: Soot - A Java optimization framework. https://github.com/ soot-oss/soot

2026

[28] [28]

wala/WALA: T.J

2026. wala/WALA: T.J. Watson Libraries for Analysis, with front ends for Java, Android, and JavaScript, and many common static program analyses. https: //github.com/wala/wala

2026

[29] [29]

What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding

2026. What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding

2026

[30] [30]

What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing

2026. What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing

2026

[31] [31]

What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/

2026. What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/

2026

[32] [32]

Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, and Hammond Pearce. 2024. On Hardware Security Bug Code Fixes by Prompting Large Lan- guage Models.IEEE Transactions on Information Forensics and Security19 (2024), 4043–4057. doi:10.1109/TIFS.2024.3374558

work page doi:10.1109/tifs.2024.3374558 2024

[33] [33]

Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C

Kaled M. Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C. Cordeiro. 2021. FuSeBMC: An Energy-Efficient Test Generator for Finding Security Vulnerabilities in C Programs. InTests and Proofs, Frédéric Loulergue and Franz Wotawa (Eds.). Springer International Publishing, Cham, 85–105

2021

[34] [34]

Schwartz, Mav- erick Woo, and David Brumley

Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Mav- erick Woo, and David Brumley. 2014. Automatic exploit generation.Commun. ACM57, 2 (Feb. 2014), 74–84. doi:10.1145/2560217.2560219

work page doi:10.1145/2560217.2560219 2014

[35] [35]

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. In2012 IEEE Symposium on Security and Privacy. 380–394. doi:10.1109/SP.2012.31

work page doi:10.1109/sp.2012.31 2012

[36] [36]

Sujita Chaudhary, Austin O’Brien, and Shengjie Xu. 2020. Automated Post-Breach Penetration Testing through Reinforcement Learning. In2020 IEEE Conference on Communications and Network Security (CNS). 1–2. doi:10.1109/CNS48642.2020. 9162301

work page doi:10.1109/cns48642.2020 2020

[37] [37]

Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2023. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code.IEEE Transactions on Software Engineering49, 1 (2023), 147–165. doi:10.1109/TSE.2022.3147265

work page doi:10.1109/tse.2022.3147265 2023

[38] [38]

Jianlei Chi, Yu Qu, Ting Liu, Qinghua Zheng, and Heng Yin. 2023. SeqTrans: Au- tomatic Vulnerability Fix Via Sequence to Sequence Learning.IEEE Transactions on Software Engineering49, 2 (2023), 564–585. doi:10.1109/TSE.2022.3156637

work page doi:10.1109/tse.2022.3156637 2023

[39] [39]

Ge Chu and Alexei Lisitsa. 2018. Penetration Testing for Internet of Things and Its Automation. In2018 IEEE 20th International Conference on High Per- formance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Sys- tems (HPCC/SmartCity/DSS). 1479–1484. doi:10.1109/HPCC/Sma...

work page doi:10.1109/hpcc/smartcity/dss.2018 2018

[40] [40]

Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi- Task Instruction Fine-Tuning. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics,...

work page doi:10.18653/v1/2024.findings-acl.625 2024

[41] [41]

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung

[42] [42]

InPro- ceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore) (ESEC/FSE 2022)

VulRepair: a T5-based automated software vulnerability repair. InPro- ceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. doi:10.1145/3540250.3549098

work page doi:10.1145/3540250.3549098 2022

[43] [43]

Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, and Fran Casino. 2024. Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection. InComputer Security – ESORICS 2024: 29th European Symposium on Research in Computer Security, Bydgoszcz, Poland, September 16–20, 2024, Proceedings, Part I(Bydgoszcz, Poland). Springer-Ve...

work page doi:10.1007/978-3-031-70879-4_14 2024

[44] [44]

Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu. 2023. Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). 297–306. doi:10.1109/ TPS-ISA58951.2023.00044

work page arXiv 2023

[45] [45]

Zhenguo Hu, Razvan Beuran, and Yasuo Tan. 2020. Automated Penetration Test- ing Using Deep Reinforcement Learning. In2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). 2–10. doi:10.1109/EuroSPW51379. 2020.00010

work page doi:10.1109/eurospw51379 2020

[46] [46]

Junjie Huang and Quanyan Zhu. 2024. PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation. InProceedings of the Workshop on Autonomous Cybersecurity(Salt Lake City, UT, USA)(AutonomousCyber ’24). Association for Computing Machinery, New York, NY, USA, 11–22. doi:10.1145/ 3689933.3690831

work page arXiv 2024

[47] [47]

Emanuele Iannone, Dario Di Nucci, Antonino Sabetta, and Andrea De Lucia

[48] [48]

In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)

Toward automated exploit generation for known vulnerabilities in open- source libraries. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 396–400

[49] [49]

Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. InferFix: End-to-End Program Repair with LLMs. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing...

work page doi:10.1145/3611643.3613892 2023

[50] [50]

Md Mahir Asef Kabir, Ying Wang, Danfeng Yao, and Na Meng. 2022. How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages?. In2022 IEEE Secure Development Conference (SecDev). IEEE Computer Society, Los Alamitos, CA, USA, 77–83. doi:10.1109/SecDev53368.2022.00027

work page doi:10.1109/secdev53368.2022.00027 2022

[51] [51]

Hong Jin Kang, Truong Giang Nguyen, Bach Le, Corina S Păsăreanu, and David Lo. 2022. Test mimicry to assess the exploitability of library vulnerabilities. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 276–288

2022

[52] [52]

11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng

Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bod- den, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, et al. 11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng

[53] [53]

In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

CogniCrypt: supporting developers in using cryptography. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936

[54] [54]

Tan Khang Le, Saba Alimadadi, and Steven Y. Ko. 2024. A Study of Vulnerability Repair in JavaScript Programs with Large Language Models. InCompanion Proceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 666–669. doi:10. 1145/3589335.3651463

work page arXiv 2024

[55] [55]

Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, and Shuiguang Deng

[56] [56]

InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24)

Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair. InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York, NY, USA, 719–731. doi:10.1145/3691620.3695066

work page doi:10.1145/3691620.3695066

[57] [57]

Bissyandé

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis(Beijing, China)(ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 31–42. doi:10.1145/3293882.3330577

work page doi:10.1145/3293882.3330577 2019

[58] [58]

Zhihong Liu, Qing Liao, Wenchao Gu, and Cuiyun Gao. 2023. Software Vulner- ability Detection with GPT and In-Context Learning. In2023 8th International Conference on Data Science in Cyberspace (DSC). 229–236. doi:10.1109/DSC59305. 2023.00041

work page doi:10.1109/dsc59305 2023

[59] [59]

Yunlong Lyu, Yuxuan Xie, Peng Chen, and Hao Chen. 2024. Prompt Fuzzing for Fuzz Driver Generation. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security(Salt Lake City, UT, USA)(CCS ’24). Association for Computing Machinery, New York, NY, USA, 3793–3807. doi:10. 1145/3658644.3670396

work page arXiv 2024

[60] [60]

Siqi Ma, Ferdian Thung, David Lo, Cong Sun, and Robert H. Deng. 2017. VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 229–246

2017

[61] [61]

Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: The Cardumen Mode of Astor. InSearch-Based Software Engineering, Thelma Elita Colanzi and Phil McMinn (Eds.). Springer International Publishing, Cham, 65–86

2018

[62] [62]

Ravindra Metta, Raveendra Kumar Medicherla, and Samarjit Chakraborty. 2022. BMC+Fuzz: Efficient and Effective Test Generation. In2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1419–1424. doi:10.23919/ DATE54114.2022.9774672

work page arXiv 2022

[63] [63]

Marwan Omar and Stavros Shiaeles. 2023. VulDetect: A novel technique for detecting software vulnerabilities using Language Models. In2023 IEEE Interna- tional Conference on Cyber Security and Resilience (CSR). 105–110. doi:10.1109/ CSR57506.2023.10224924

work page arXiv 2023

[64] [64]

Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023. 10179420

work page doi:10.1109/sp46215.2023 2023

[65] [65]

Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2020. Detection, assess- ment and mitigation of vulnerabilities in open source dependencies.Empirical Software Engineering25, 5 (2020), 3175–3215

2020

[66] [66]

Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, and Howon Kim. 2024. CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher. Sensors24, 21 (2024). doi:10.3390/s24216878

work page doi:10.3390/s24216878 2024

[67] [67]

Radford, and Bill Chu

Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. 2023. Software Vulnerability Detection using Large Language Models. In2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112–119. doi:10.1109/ISSREW60843.2023.00058

work page doi:10.1109/issrew60843.2023.00058 2023

[68] [68]

Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, and Danfeng Yao. 2019. Cryptoguard: High precision detec- tion of cryptographic vulnerabilities in massive-sized Java projects. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472

2019

[69] [69]

Maria Rigaki, Ondřej Lukáš, Carlos Catania, and Sebastian Garcia. 2024. Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments. InPro- ceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART. INSTICC, SciTePress, 774–781. doi:10.5220/0012391800003636

work page doi:10.5220/0012391800003636 2024

[70] [70]

Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Con- tracts by Combining GPT with Program Analysis. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New ...

work page doi:10.1145/3597503.3639117 2024

[71] [71]

2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition

Ari Takanen, Jared Demott, Charles Miller, and Atte Kettunen. 2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition. Artech House

2017

[72] [72]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision.IEEE Trans. Softw. Eng.50, 4 (April 2024), 911–936. doi:10.1109/TSE. 2024.3368208

work page doi:10.1109/tse 2024

[73] [73]

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Auto- mated Program Repair. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering(San Francisco, CA, USA)(ESEC/FSE 2023). Association ...

work page doi:10.1145/3611643.3616271 2023

[74] [74]

Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How Effective Are Neural Networks for Fixing Security Vulnerabilities. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, WA, USA)(ISSTA 2023). Association for Computing Machinery, New York, ...

work page arXiv 2023

[75] [75]

Hanxiang Xu, Wei Ma, Ting Zhou, Yanjie Zhao, Kai Chen, Qiang Hu, Yang Liu, and Haoyu Wang. 2024. CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph. arXiv:2411.11532 [cs.SE] https://arxiv. org/abs/2411.11532

work page arXiv 2024

[76] [76]

Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhang, Haifeng Shen, and He Zhang. 2025. DLAP: A Deep Learning Augmented Large Language Model Prompting framework for software vulnerability detection.J. Syst. Softw.219, C (Jan. 2025), 15 pages. doi:10.1016/j.jss.2024.112234

work page doi:10.1016/j.jss.2024.112234 2025

[77] [77]

Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang

[78] [78]

InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria)(ISSTA 2024)

ThinkRepair: Self-Directed Automated Program Repair. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1274–1286. doi:10.1145/3650212.3680359

work page doi:10.1145/3650212.3680359 2024

[79] [79]

Cen Zhang, Yaowen Zheng, Mingqiang Bai, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, and Yang Liu. 2024. How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis(Vienna, Austria)(ISSTA 2024). Association for Computing Machin...

work page doi:10.1145/3650212.3680355 2024

[80] [80]

Jie Zhang, Haoyu Bu, Hui Wen, Yongji Liu, Haiqiang Fei, Rongrong Xi, Lun Li, Yun Yang, Hongsong Zhu, and Dan Meng. 2025. When LLMs meet cybersecurity: a systematic literature review.Cybersecurity8, 1 (2025), 55. doi:10.1186/s42400- 025-00361-w

work page doi:10.1186/s42400- 2025