Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software
Pith reviewed 2026-05-07 15:09 UTC · model grok-4.3
The pith
PoVSmith combines call path analysis with LLM prompts and execution feedback to automatically generate proof-of-vulnerability tests for applications using vulnerable libraries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PoVSmith is a new agent-based approach that integrates static call-path analysis, code context, and iterative execution feedback into multiple prompts to direct a coding agent and large language model through test generation, execution, and quality assessment, yielding executable PoV tests that expose how library vulnerabilities propagate into dependent applications.
What carries the argument
The iterative prompting loop that supplies call-path information from application entry points to vulnerable library APIs together with execution logs to guide LLM test creation and refinement.
If this is right
- Developers receive concrete evidence of supply-chain risks without writing tests themselves.
- 96 percent of application-level entry points that reach vulnerable library APIs are located along with their call paths.
- 55 percent of the 152 generated tests succeed in demonstrating feasible attacks on the applications.
- Human involvement drops while test quality rises compared with prior LLM-based methods.
- The same prompting structure supports both test creation and automated quality assessment grounded in context and logs.
Where Pith is reading between the lines
- The technique could be extended to languages other than Java by adapting the call-path extractor and runtime instrumentation.
- Embedding PoVSmith in continuous-integration pipelines would allow automatic flagging of exploitable dependency vulnerabilities before deployment.
- Higher success rates might follow from richer feedback signals such as coverage metrics or symbolic execution traces.
- The generated tests could serve as regression oracles for future library updates to confirm that fixes remain effective.
Load-bearing premise
LLM-generated tests guided by call paths and execution feedback reliably indicate real-world attack feasibility without systematic false positives or negatives.
What would settle it
Independent manual verification by security experts showing that a substantial fraction of the 84 tests labeled successful do not actually produce exploitable behavior in the target applications, or that many known feasible attacks are missed.
Figures
read the original abstract
Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents PoVSmith, an LLM-agent approach that combines call-path analysis, exemplar tests, code context, and execution feedback to generate and assess proof-of-vulnerability (PoV) tests for Java applications that depend on vulnerable libraries. On 33 App-Lib pairs, it reports identifying 158 entry points (96% accuracy) and producing 152 tests, of which 84 (55%) are assessed by an LLM as demonstrating feasible attacks on the applications.
Significance. If the LLM-based feasibility judgments prove reliable, the work could meaningfully lower the barrier for developers to obtain concrete evidence of supply-chain risk, complementing existing static analysis and fuzzing tools. The agent-driven iterative refinement loop and grounding of assessment in both context and logs represent a practical advance over prior LLM-only baselines for security test generation.
major comments (2)
- [Evaluation section] Evaluation section (results on 33 pairs and the 55% figure): the claim that 84 tests 'demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities' rests entirely on an LLM assessor that receives only the provided call-path context plus execution logs. No human oracle, CVE-specific exploit oracle, differential comparison against manually written PoVs, or other independent validation is described to calibrate the false-positive rate of this assessor. Because every downstream claim (outperformance vs. prior LLM baselines, reduction in human effort, practical effectiveness) is computed from the same LLM-labeled count, this is load-bearing for the central contribution.
- [Abstract and Approach section] Abstract and Approach section: the description of the GPT-based quality assessor provides no details on the exact prompt template, decision criteria for labeling a test 'feasible,' handling of LLM variability (e.g., temperature, multiple runs), or inter-rater agreement with any external ground truth. This omission prevents readers from assessing the reproducibility and soundness of the 55% success rate.
minor comments (2)
- [Abstract] The abstract states '152 (96%) of them were correctly found' but does not clarify whether the 6% error rate was measured against a manually verified ground truth or another automated method; adding this detail would strengthen the entry-point accuracy claim.
- [Evaluation section] Table or figure presenting the 33 program pairs should include basic statistics (e.g., lines of code, number of vulnerable APIs per pair) to allow readers to judge the diversity and representativeness of the benchmark.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of PoVSmith to lower barriers for assessing supply-chain risks. We address each major comment below with clarifications and proposed revisions to improve the manuscript's rigor and reproducibility.
read point-by-point responses
-
Referee: [Evaluation section] Evaluation section (results on 33 pairs and the 55% figure): the claim that 84 tests 'demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities' rests entirely on an LLM assessor that receives only the provided call-path context plus execution logs. No human oracle, CVE-specific exploit oracle, differential comparison against manually written PoVs, or other independent validation is described to calibrate the false-positive rate of this assessor. Because every downstream claim (outperformance vs. prior LLM baselines, reduction in human effort, practical effectiveness) is computed from the same LLM-labeled count, this is load-bearing for the central contribution.
Authors: We acknowledge that the 55% feasibility rate is determined solely by the LLM assessor without an independent human oracle, CVE-specific exploit validation, or direct comparison to manually crafted PoVs. This choice supports scalability and aligns with our aim to reduce human involvement in PoV generation. The assessor receives call-path context, test code, and execution logs to ground judgments in observable behavior. To address the concern, we will revise the Evaluation section to explicitly discuss this as a limitation, add a small-scale manual calibration (reviewing a random subset of 20 tests for agreement with the LLM labels), and qualify the outperformance claims relative to baselines that use comparable automated assessment. This provides partial calibration of reliability without requiring a full re-evaluation of all 152 tests. revision: partial
-
Referee: [Abstract and Approach section] Abstract and Approach section: the description of the GPT-based quality assessor provides no details on the exact prompt template, decision criteria for labeling a test 'feasible,' handling of LLM variability (e.g., temperature, multiple runs), or inter-rater agreement with any external ground truth. This omission prevents readers from assessing the reproducibility and soundness of the 55% success rate.
Authors: We agree that the current description lacks sufficient detail on the assessor for full reproducibility. In the revised manuscript, we will expand the Approach section to include the complete prompt template, the precise decision criteria (e.g., positive label if logs indicate successful vulnerability trigger such as exception patterns or data exfiltration), our use of fixed low temperature (0.0) and single-run execution per test to minimize variability, and an explicit statement that inter-rater agreement with external ground truth was not computed. We will also add this as a noted limitation with suggestions for future work. revision: yes
Circularity Check
No significant circularity in derivation or evaluation chain
full rationale
The paper describes an LLM-guided test generation pipeline (call-path analysis + prompting + execution feedback + LLM assessment) and reports direct counts on an external set of 33 App-Lib pairs. The 152 tests and 84/152 success figure are produced by applying the described procedure to those pairs; success is measured by the LLM assessor using provided context and logs, but this is an explicit component of the method rather than a self-definitional loop or fitted parameter renamed as prediction. No equations, self-citations, or uniqueness theorems are invoked to force the result. The evaluation therefore remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can produce correct, executable security tests when supplied with call paths, code context, and execution feedback.
invented entities (1)
-
PoVSmith
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Arrange-Act-Assert: A Pattern for Writing Good Tests
2020. Arrange-Act-Assert: A Pattern for Writing Good Tests. https://automationpanda.com/2020/07/07/arrange-act-assert-a-pattern- for-writing-good-tests/
2020
-
[2]
OWASP Dependency-Check
2020. OWASP Dependency-Check. https://owasp.org/www-project- dependency-check/
2020
-
[3]
Find Security Bugs
2021. Find Security Bugs. https://find-sec-bugs.github.io/
2021
-
[4]
Supply chain attacks show why you should be wary of third-party providers
2021. Supply chain attacks show why you should be wary of third-party providers
2021
-
[5]
alibaba / fastjson
2023. alibaba / fastjson. https://github.com/alibaba/fastjson
2023
-
[6]
american fuzzy lop
2023. american fuzzy lop. https://lcamtuf.coredump.cx/afl/
2023
-
[7]
2023. Codec. https://commons.apache.org/proper/commons-codec/
2023
-
[8]
2023. Dom4j. https://dom4j.github.io
2023
-
[9]
OSS-Fuzz
2023. OSS-Fuzz. https://google.github.io/oss-fuzz/
2023
-
[10]
spring-projects / spring-security
2023. spring-projects / spring-security. https://github.com/spring-projects/ spring-security
2023
-
[11]
Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects
2024. Amazon: How MOVEit Supply Chain Attack Left Echoing Ef- fects. https://cybermagazine.com/articles/amazon-how-moveit-supply-chain- attack-left-lasting-effects
2024
-
[12]
GPT-5.1: A smarter, more conversational ChatGPT
2025. GPT-5.1: A smarter, more conversational ChatGPT. https://openai.com/ index/gpt-5-1/
2025
-
[13]
Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure
2025. Software supply chain attacks surge, as ransomware groups escalate and in- dustrial sectors face more exposure. https://industrialcyber.co/reports/software- supply-chain-attacks-surge-as-ransomware-groups-escalate-and-industrial- sectors-face-more-exposure/
2025
-
[14]
The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself
2025. The Log4j Vulnerability: What It Is, What Organizations Are at Risk and How You Can Protect Yourself. https://www.abs-group.com/Knowledge- Center/Insights/The-Log4j-Vulnerability-What-It-Is-What-Organizations- Are-at-Risk-and-How-You-Can-Protect-Yourself/
2025
-
[15]
What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them
2025. What Are Software Supply Chain Vulnerabilities? Understanding the Risks and How to Mitigate Them. https://safe.security/resources/insights/what- are-software-supply-chain-vulnerabilities-understanding-the-risks-how-to- mitigate-them/#How-Attackers-Exploit-These-Vulnerabilities
2025
-
[16]
2026 Software Supply Chain Security Report
2026. 2026 Software Supply Chain Security Report. https://www.reversinglabs. com/sscs-report
2026
-
[17]
2026. CodeQL. https://codeql.github.com
2026
-
[18]
Codex | AI Coding Partner from OpenAI | OpenAI
2026. Codex | AI Coding Partner from OpenAI | OpenAI. https://openai.com/ codex/
2026
-
[19]
Dependabot
2026. Dependabot. https://github.com/dependabot
2026
-
[20]
dependency-check vulnerabilities
2026. dependency-check vulnerabilities. https://security.snyk.io/package/npm/ dependency-check
2026
-
[21]
Gemini Code Assist | AI coding assistant
2026. Gemini Code Assist | AI coding assistant. https://codeassist.google
2026
-
[22]
go4retro/tcpser4j
2026. go4retro/tcpser4j. https://github.com/go4retro/tcpser4j/blob/ 7a3dbd8d719c0b256bb49da85227b73d580c6c82/gensrc/org/jbrain/tcpser4j/ binding/PhoneBook.java
2026
-
[23]
huangsigit/commerce
2026. huangsigit/commerce. https://github.com/huangsigit/commerce/ blob/899f81c2080bfa4223176e4bd06c701b4d50958c/src/main/java/com/egao/ common/core/utils/JSONUtil.java
2026
-
[24]
Introducing GPT-5.2-Codex | OpenAI
2026. Introducing GPT-5.2-Codex | OpenAI. https://openai.com/index/ introducing-gpt-5-2-codex/
2026
-
[25]
mistralai/mistral-vibe: Minimal CLI coding agent by Mistral
2026. mistralai/mistral-vibe: Minimal CLI coding agent by Mistral. https://github. com/mistralai/mistral-vibe
2026
-
[26]
NVD - cve-2018-1000632
2026. NVD - cve-2018-1000632. https://nvd.nist.gov/vuln/detail/cve-2018- 1000632
2026
-
[27]
soot-oss/soot: Soot - A Java optimization framework
2026. soot-oss/soot: Soot - A Java optimization framework. https://github.com/ soot-oss/soot
2026
-
[28]
wala/WALA: T.J
2026. wala/WALA: T.J. Watson Libraries for Analysis, with front ends for Java, Android, and JavaScript, and many common static program analyses. https: //github.com/wala/wala
2026
-
[29]
What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding
2026. What is agentic coding? https://cloud.google.com/discover/what-is- agentic-coding
2026
-
[30]
What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing
2026. What is penetration testing? https://www.ibm.com/think/topics/ penetration-testing
2026
-
[31]
What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/
2026. What is Software Supply Chain Security? https://jfrog.com/learn/software- supply-chain/
2026
-
[32]
Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, and Hammond Pearce. 2024. On Hardware Security Bug Code Fixes by Prompting Large Lan- guage Models.IEEE Transactions on Information Forensics and Security19 (2024), 4043–4057. doi:10.1109/TIFS.2024.3374558
-
[33]
Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C
Kaled M. Alshmrany, Mohannad Aldughaim, Ahmed Bhayat, and Lucas C. Cordeiro. 2021. FuSeBMC: An Energy-Efficient Test Generator for Finding Security Vulnerabilities in C Programs. InTests and Proofs, Frédéric Loulergue and Franz Wotawa (Eds.). Springer International Publishing, Cham, 85–105
2021
-
[34]
Schwartz, Mav- erick Woo, and David Brumley
Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J. Schwartz, Mav- erick Woo, and David Brumley. 2014. Automatic exploit generation.Commun. ACM57, 2 (Feb. 2014), 74–84. doi:10.1145/2560217.2560219
-
[35]
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. In2012 IEEE Symposium on Security and Privacy. 380–394. doi:10.1109/SP.2012.31
-
[36]
Sujita Chaudhary, Austin O’Brien, and Shengjie Xu. 2020. Automated Post-Breach Penetration Testing through Reinforcement Learning. In2020 IEEE Conference on Communications and Network Security (CNS). 1–2. doi:10.1109/CNS48642.2020. 9162301
-
[37]
Zimin Chen, Steve Kommrusch, and Martin Monperrus. 2023. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code.IEEE Transactions on Software Engineering49, 1 (2023), 147–165. doi:10.1109/TSE.2022.3147265
-
[38]
Jianlei Chi, Yu Qu, Ting Liu, Qinghua Zheng, and Heng Yin. 2023. SeqTrans: Au- tomatic Vulnerability Fix Via Sequence to Sequence Learning.IEEE Transactions on Software Engineering49, 2 (2023), 564–585. doi:10.1109/TSE.2022.3156637
-
[39]
Ge Chu and Alexei Lisitsa. 2018. Penetration Testing for Internet of Things and Its Automation. In2018 IEEE 20th International Conference on High Per- formance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Sys- tems (HPCC/SmartCity/DSS). 1479–1484. doi:10.1109/HPCC/Sma...
-
[40]
Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-Enhanced Code Vulnerability Detection via Multi- Task Instruction Fine-Tuning. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics,...
-
[41]
Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung
-
[42]
VulRepair: a T5-based automated software vulnerability repair. InPro- ceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 935–947. doi:10.1145/3540250.3549098
-
[43]
Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, and Fran Casino. 2024. Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection. InComputer Security – ESORICS 2024: 29th European Symposium on Research in Computer Security, Bydgoszcz, Poland, September 16–20, 2024, Proceedings, Part I(Bydgoszcz, Poland). Springer-Ve...
-
[44]
Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu. 2023. Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). 297–306. doi:10.1109/ TPS-ISA58951.2023.00044
-
[45]
Zhenguo Hu, Razvan Beuran, and Yasuo Tan. 2020. Automated Penetration Test- ing Using Deep Reinforcement Learning. In2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). 2–10. doi:10.1109/EuroSPW51379. 2020.00010
-
[46]
Junjie Huang and Quanyan Zhu. 2024. PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation. InProceedings of the Workshop on Autonomous Cybersecurity(Salt Lake City, UT, USA)(AutonomousCyber ’24). Association for Computing Machinery, New York, NY, USA, 11–22. doi:10.1145/ 3689933.3690831
-
[47]
Emanuele Iannone, Dario Di Nucci, Antonino Sabetta, and Andrea De Lucia
-
[48]
In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)
Toward automated exploit generation for known vulnerabilities in open- source libraries. In2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 396–400
-
[49]
Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. InferFix: End-to-End Program Repair with LLMs. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(San Francisco, CA, USA)(ESEC/FSE 2023). Association for Computing...
-
[50]
Md Mahir Asef Kabir, Ying Wang, Danfeng Yao, and Na Meng. 2022. How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages?. In2022 IEEE Secure Development Conference (SecDev). IEEE Computer Society, Los Alamitos, CA, USA, 77–83. doi:10.1109/SecDev53368.2022.00027
-
[51]
Hong Jin Kang, Truong Giang Nguyen, Bach Le, Corina S Păsăreanu, and David Lo. 2022. Test mimicry to assess the exploitability of library vulnerabilities. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 276–288
2022
-
[52]
11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng
Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bod- den, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, et al. 11 Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng (Daphne) Yao, and Na Meng
-
[53]
In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
CogniCrypt: supporting developers in using cryptography. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936
-
[54]
Tan Khang Le, Saba Alimadadi, and Steven Y. Ko. 2024. A Study of Vulnerability Repair in JavaScript Programs with Large Language Models. InCompanion Proceedings of the ACM Web Conference 2024(Singapore, Singapore)(WWW ’24). Association for Computing Machinery, New York, NY, USA, 666–669. doi:10. 1145/3589335.3651463
-
[55]
Guochang Li, Chen Zhi, Jialiang Chen, Junxiao Han, and Shuiguang Deng
-
[56]
Exploring Parameter-Efficient Fine-Tuning of Large Language Model on Automated Program Repair. InProceedings of the 39th IEEE/ACM Interna- tional Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York, NY, USA, 719–731. doi:10.1145/3691620.3695066
-
[57]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis(Beijing, China)(ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 31–42. doi:10.1145/3293882.3330577
-
[58]
Zhihong Liu, Qing Liao, Wenchao Gu, and Cuiyun Gao. 2023. Software Vulner- ability Detection with GPT and In-Context Learning. In2023 8th International Conference on Data Science in Cyberspace (DSC). 229–236. doi:10.1109/DSC59305. 2023.00041
-
[59]
Yunlong Lyu, Yuxuan Xie, Peng Chen, and Hao Chen. 2024. Prompt Fuzzing for Fuzz Driver Generation. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security(Salt Lake City, UT, USA)(CCS ’24). Association for Computing Machinery, New York, NY, USA, 3793–3807. doi:10. 1145/3658644.3670396
-
[60]
Siqi Ma, Ferdian Thung, David Lo, Cong Sun, and Robert H. Deng. 2017. VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 229–246
2017
-
[61]
Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: The Cardumen Mode of Astor. InSearch-Based Software Engineering, Thelma Elita Colanzi and Phil McMinn (Eds.). Springer International Publishing, Cham, 65–86
2018
- [62]
- [63]
-
[64]
Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt. 2023. Examining Zero-Shot Vulnerability Repair with Large Language Models . In2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. doi:10.1109/SP46215.2023. 10179420
-
[65]
Serena Elisa Ponta, Henrik Plate, and Antonino Sabetta. 2020. Detection, assess- ment and mitigation of vulnerabilities in open source dependencies.Empirical Software Engineering25, 5 (2020), 3175–3215
2020
-
[66]
Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, and Howon Kim. 2024. CIPHER: Cybersecurity Intelligent Penetration-Testing Helper for Ethical Researcher. Sensors24, 21 (2024). doi:10.3390/s24216878
-
[67]
Moumita Das Purba, Arpita Ghosh, Benjamin J. Radford, and Bill Chu. 2023. Software Vulnerability Detection using Large Language Models. In2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112–119. doi:10.1109/ISSREW60843.2023.00058
-
[68]
Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, and Danfeng Yao. 2019. Cryptoguard: High precision detec- tion of cryptographic vulnerabilities in massive-sized Java projects. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472
2019
-
[69]
Maria Rigaki, Ondřej Lukáš, Carlos Catania, and Sebastian Garcia. 2024. Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments. InPro- ceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART. INSTICC, SciTePress, 774–781. doi:10.5220/0012391800003636
-
[70]
Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024. GPTScan: Detecting Logic Vulnerabilities in Smart Con- tracts by Combining GPT with Program Analysis. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering(Lisbon, Portugal)(ICSE ’24). Association for Computing Machinery, New ...
-
[71]
2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition
Ari Takanen, Jared Demott, Charles Miller, and Atte Kettunen. 2017.Fuzzing for Software Security Testing and Quality Assurance, Second Edition. Artech House
2017
-
[72]
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision.IEEE Trans. Softw. Eng.50, 4 (April 2024), 911–936. doi:10.1109/TSE. 2024.3368208
work page doi:10.1109/tse 2024
-
[73]
Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Auto- mated Program Repair. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engi- neering(San Francisco, CA, USA)(ESEC/FSE 2023). Association ...
-
[74]
Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How Effective Are Neural Networks for Fixing Security Vulnerabilities. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis(Seattle, WA, USA)(ISSTA 2023). Association for Computing Machinery, New York, ...
- [75]
-
[76]
Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhang, Haifeng Shen, and He Zhang. 2025. DLAP: A Deep Learning Augmented Large Language Model Prompting framework for software vulnerability detection.J. Syst. Softw.219, C (Jan. 2025), 15 pages. doi:10.1016/j.jss.2024.112234
-
[77]
Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang
-
[78]
ThinkRepair: Self-Directed Automated Program Repair. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 1274–1286. doi:10.1145/3650212.3680359
-
[79]
Cen Zhang, Yaowen Zheng, Mingqiang Bai, Yeting Li, Wei Ma, Xiaofei Xie, Yuekang Li, Limin Sun, and Yang Liu. 2024. How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis(Vienna, Austria)(ISSTA 2024). Association for Computing Machin...
-
[80]
Jie Zhang, Haoyu Bu, Hui Wen, Yongji Liu, Haiqiang Fei, Rongrong Xi, Lun Li, Yun Yang, Hongsong Zhu, and Dan Meng. 2025. When LLMs meet cybersecurity: a systematic literature review.Cybersecurity8, 1 (2025), 55. doi:10.1186/s42400- 025-00361-w
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.