Recognition: no theorem link
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
Pith reviewed 2026-05-10 18:58 UTC · model grok-4.3
The pith
Vulnsage uses specialized agents and runtime feedback to turn static vulnerability reports into working exploits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Vulnsage decomposes automated exploit generation into an orchestrated workflow of specialized agents: the Code Analyzer Agent performs static analysis to identify vulnerabilities and gather context; the Code Generation Agent creates candidate exploits using an LLM; the Validation Agent executes candidates and collects traces; and Reflection Agents use runtime error analysis in iterative loops to improve the exploit or reason that the original alert is a false positive. Experimental results show this process produces 34.64 percent more successful exploits than prior tools such as ExplodeJS and enables discovery of 146 verified zero-day vulnerabilities in real-world code.
What carries the argument
The iterative self-refinement loop run by the Validation Agent and Reflection Agents, which feeds execution traces and runtime error details back to improve candidate exploits or classify alerts as false positives.
Load-bearing premise
The iterative feedback loop with execution traces and runtime error analysis reliably improves exploit success rates or correctly distinguishes true vulnerabilities from false positives without introducing systematic biases or missing edge cases.
What would settle it
Independent re-testing on the same programs used for comparison shows Vulnsage produces the same number or fewer working exploits than ExplodeJS, or independent verification fails to confirm the claimed zero-day vulnerabilities.
Figures
read the original abstract
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale, they often generate overwhelming reports with high false positive rates. Automated Exploit Generation (AEG) emerges as a promising solution to confirm vulnerability authenticity by generating an exploit. However, traditional AEG approaches based on fuzzing or symbolic execution face path coverage and constraint-solving problems. Although LLMs show great potential for AEG, how to effectively leverage them to comprehend vulnerabilities and generate corresponding exploits is still an open question. To address these challenges, we propose Vulnsage, a multi-agent framework for AEG. Vulnsage simulates human security researchers' workflows by decomposing the complex AEG process into multiple specialized sub-agents: Code Analyzer Agent, Code Generation Agent, Validation Agent, and a set of Reflection Agents, orchestrated by a central supervisor through iterative cycles. Given a target program, the Code Analyzer Agent performs static analysis to identify potential vulnerabilities and collects relevant information for each one. The Code Generation Agent then utilizes an LLM to generate candidate exploits. The Validation Agent and Reflection Agents form a feedback-driven self-refinement loop that uses execution traces and runtime error analysis to either improve the exploit iteratively or reason about the false positive alert. Experimental evaluation demonstrates that Vulnsage succeeds in generating 34.64\% more exploits than state-of-the-art tools such as \explodejs. Furthermore, Vulnsage has successfully discovered and verified 146 zero-day vulnerabilities in real-world scenarios, demonstrating its practical effectiveness for assisting security assessment in software supply chains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Vulnsage, a multi-agent LLM framework for automated exploit generation (AEG) that decomposes the task into specialized agents (Code Analyzer, Code Generation, Validation, and Reflection Agents) orchestrated by a supervisor. The agents perform static analysis, generate candidate exploits, and use iterative feedback from execution traces and runtime errors to refine exploits or flag false positives. The central claims are a 34.64% improvement in successful exploit generation over state-of-the-art tools such as ExplodeJS and the discovery/verification of 146 zero-day vulnerabilities in real-world open-source libraries.
Significance. If the performance and zero-day claims are substantiated with rigorous methodology, the multi-agent reflection loop could meaningfully advance AEG by improving upon the path-coverage and constraint-solving limitations of fuzzing and symbolic execution. The approach has potential practical value for confirming vulnerabilities in software supply chains and reducing false positives from static analysis tools.
major comments (2)
- [Abstract and Experimental Evaluation] Abstract and Experimental Evaluation section: The headline claims of 34.64% more exploits than ExplodeJS and 146 verified zero-day vulnerabilities are presented without any description of the experimental methodology, target programs/datasets, baseline implementations, success criteria for exploit generation, statistical tests, or controls for LLM non-determinism. These omissions make the central empirical claims impossible to evaluate or reproduce.
- [Framework Description (Validation and Reflection Agents)] Validation Agent and Reflection Agents description: The feedback loop is said to use 'execution traces and runtime error analysis' to improve exploits or reason about false positives, but no explicit decision procedure is given for labeling success (e.g., whether any non-zero exit code, specific error message, or memory corruption indicator counts as confirmation). This leaves open the possibility of inflated success rates or self-confirmation bias.
minor comments (1)
- [Abstract] The tool name 'ExplodeJS' is referenced without citation or description; a reference or brief explanation should be added for readers unfamiliar with it.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments, which have helped us identify areas for improvement in our manuscript. Below, we provide point-by-point responses to the major comments and indicate the revisions we intend to make.
read point-by-point responses
-
Referee: [Abstract and Experimental Evaluation] Abstract and Experimental Evaluation section: The headline claims of 34.64% more exploits than ExplodeJS and 146 verified zero-day vulnerabilities are presented without any description of the experimental methodology, target programs/datasets, baseline implementations, success criteria for exploit generation, statistical tests, or controls for LLM non-determinism. These omissions make the central empirical claims impossible to evaluate or reproduce.
Authors: We acknowledge that the abstract is necessarily concise and omits full methodological details. The Experimental Evaluation section describes the use of real-world open-source libraries as targets, ExplodeJS as a baseline, and success as verified exploit generation. To improve reproducibility and address the referee's concern, we will revise the Experimental Evaluation section to explicitly detail: the full list of target programs and datasets, baseline implementations and configurations, precise success criteria for exploit generation, statistical tests supporting the 34.64% improvement, and controls for LLM non-determinism (e.g., repeated runs with varied seeds and temperature settings). We will also incorporate a concise methodology overview into the abstract where space allows. revision: yes
-
Referee: [Framework Description (Validation and Reflection Agents)] Validation Agent and Reflection Agents description: The feedback loop is said to use 'execution traces and runtime error analysis' to improve exploits or reason about false positives, but no explicit decision procedure is given for labeling success (e.g., whether any non-zero exit code, specific error message, or memory corruption indicator counts as confirmation). This leaves open the possibility of inflated success rates or self-confirmation bias.
Authors: We agree that an explicit decision procedure strengthens the description. The current framework relies on the LLM-powered Reflection Agents to interpret execution traces and runtime errors for iterative refinement or false-positive identification. In the revision, we will add a dedicated subsection under the Validation and Reflection Agents that formalizes the success-labeling criteria, including concrete indicators such as memory corruption signals, specific error patterns associated with exploitation, and combinations of non-zero exit codes with other runtime evidence. We will also describe safeguards against self-confirmation bias, such as requiring corroboration from multiple execution environments or external validation tools where feasible. revision: yes
Circularity Check
No circularity: empirical framework with external benchmarks
full rationale
The paper describes a multi-agent AEG framework (Code Analyzer, Code Generation, Validation, Reflection Agents) and reports empirical results: 34.64% more exploits than ExplodeJS plus 146 zero-day discoveries. No equations, parameters, derivations, or self-citations appear in the provided text. Success metrics rely on execution traces and runtime errors rather than any self-referential definition or fitted input renamed as prediction. The central claims are falsifiable via external reproduction and do not reduce to the framework's own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025. QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration.CoRRabs/2506.23644 (2025). arXiv:2506.23644 doi: 10.48550/ARXIV.2506.23644 Withdrawn
-
[2]
Thanassis Avgerinos, Sang Kil Cha, Brent Lim Tze Hao, and David Brumley
-
[3]
AEG: Automatic Exploit Generation. InProceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011. The Internet Society. https://www.ndss- symposium.org/ndss2011/aeg-automatic-exploit-generation
work page 2011
-
[4]
Roberto Baldoni, Emilio Coppa, Daniele Cono D’Elia, Camil Demetrescu, and Irene Finocchi. 2018. A Survey of Symbolic Execution Techniques.ACM Comput. Surv.51, 3 (2018), 50:1–50:39. doi: 10.1145/3182657
-
[5]
Masudul Hasan Masud Bhuiyan, Adithya Srinivas Parthasarathy, Nikos Vasilakis, Michael Pradel, and Cristian-Alexandru Staicu. 2023. SecBench.js: An Executable Security Benchmark Suite for Server-Side JavaScript. In45th IEEE/ACM Interna- tional Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 1059–1070. doi: 10.110...
-
[6]
Xiaohe Bo, Zeyu Zhang, Quanyu Dai, Xueyang Feng, Lei Wang, Rui Li, Xu Chen, and Ji-Rong Wen. 2024. Reflective Multi-Agent Collaboration based on Large Language Models. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Gl...
work page 2024
-
[7]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoud- hury. 2017. Directed Greybox Fuzzing. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, Bhavani Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 2329–2344. doi: 10.1145/313...
-
[8]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2019. Coverage- Based Greybox Fuzzing as Markov Chain.IEEE Trans. Software Eng.45, 5, 489–506. doi: 10.1109/TSE.2017.2785841
-
[9]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings, Richard Draves and Robbert van Renesse (Eds.). USENIX Assoc...
work page 2008
-
[10]
Darion Cassel, Nuno Sabino, Min-Chien Hsu, Ruben Martins, and Limin Jia. 2025. NodeMedic-FINE: Automatic Detection and Exploit Synthesis for Node.js Vul- nerabilities. In32nd Annual Network and Distributed System Security Symposium, NDSS 2025, San Diego, California, USA, February 24-28, 2025. The Internet Soci- ety. https://www.ndss-symposium.org/ndss-pap...
work page 2025
-
[11]
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. InIEEE Symposium on Security and Privacy, SP 2012, 21-23 May 2012, San Francisco, California, USA. IEEE Computer Society, 380–394. doi: 10.1109/SP.2012.31
-
[12]
Ricardo Corin and Felipe Andrés Manzano. 2012. Taint Analysis of Security Code in the KLEE Symbolic Execution Engine. InInformation and Communications Security - 14th International Conference, ICICS 2012, Hong Kong, China, October 29- 31, 2012. Proceedings (Lecture Notes in Computer Science, Vol. 7618), Tat Wing Chim and Tsz Hon Yuen (Eds.). Springer, 264...
-
[13]
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. InConference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977, Robert M. Graham, Michael A. Harrison, and Ravi Sethi (Ed...
-
[14]
Dorothy E. Denning. 1976. A Lattice Model of Secure Information Flow.Commun. ACM19, 5 (1976), 236–243. doi: 10.1145/360051.360056
-
[15]
Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. 2024. LLM Agents can Autonomously Exploit One-day Vulnerabilities.CoRRabs/2404.08144 (2024). arXiv:2404.08144 doi: 10.48550/ARXIV.2404.08144
-
[16]
Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang. 2024. Teams of LLM Agents can Exploit Zero-Day Vulnerabilities.CoRRabs/2406.01637 (2024). arXiv:2406.01637 doi: 10.48550/ARXIV.2406.01637
-
[17]
Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, and Noah D. Goodman. 2025. Cognitive Behaviors that Enable Self-Improving Reason- ers, or, Four Habits of Highly Effective STaRs.CoRRabs/2503.01307 (2025). arXiv:2503.01307 doi: 10.48550/ARXIV.2503.01307
work page internal anchor Pith review doi:10.48550/arxiv.2503.01307 2025
-
[18]
Dawei Gao, Zitao Li, Weirui Kuang, Xuchen Pan, Daoyuan Chen, Zhijian Ma, Bingchen Qian, Liuyi Yao, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, and Jingren Zhou. 2024. AgentScope: A Flexible yet Robust Multi-Agent Platform.CoRRabs/2402.14034 (2024). arXiv:2402.14034 doi: 10.48550/ARXIV. 2402.14034 ICPC ’26, April 12–13, 2026, Rio de Janeiro, ...
work page internal anchor Pith review doi:10.48550/arxiv 2024
-
[19]
GitHub Security Lab. 2021.CodeQL. GitHub. https://codeql.github.com/docs/
work page 2021
-
[20]
Katerina Goseva-Popstojanova and Andrei Perhinschi. 2015. On the capability of static code analysis to detect security vulnerabilities.Inf. Softw. Technol.68 (2015), 18–33. doi: 10.1016/J.INFSOF.2015.08.002
-
[21]
Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, LiuYiBo LiuYiBo, Qian- guosun Qianguosun, Yuxin Liang, Hao Wang, Enming Zhang, and Jiaxing Zhang
-
[22]
Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Comput...
-
[23]
Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi, and Matthew Purver. 2025. Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly. InProceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025, Owen Rambow, Leo ...
work page 2025
-
[24]
David Jin, Qian Fu, and Yuekang Li. 2025. Good News for Script Kiddies? Evalu- ating Large Language Models for Automated Exploit Generation. In2025 IEEE Security and Privacy, SP 2025 - Workshops, San Francisco, CA, USA, May 15, 2025, Marina Blanton, William Enck, and Cristina Nita-Rotaru (Eds.). IEEE, 278–282. doi: 10.1109/SPW67851.2025.00039
-
[25]
Mingqing Kang, Yichao Xu, Song Li, Rigel Gjomemo, Jianwei Hou, V. N. Venkatakrishnan, and Yinzhi Cao. 2023. Scaling JavaScript Abstract Interpretation to Detect and Exploit Node.js Taint-style Vulnerability. In44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25, 2023. IEEE, 1059–1076. doi: 10.1109/SP46215.2023.10179352
-
[26]
Rody Kersten, Kasper Søe Luckow, and Corina S. Pasareanu. 2017. POSTER: AFL-based Fuzzing for Java with Kelinci. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, Bhavani Thuraisingham, David Evans, Tal Malkin, and Dongyan Xu (Eds.). ACM, 2511–2513. doi: 10.1...
-
[27]
James C. King. 1976. Symbolic Execution and Program Testing.Commun. ACM 19, 7 (1976), 385–394. doi: 10.1145/360248.360252
-
[28]
Maxwell Koo. 2024. Uncovering Vulnerabilities In Open Source Libraries: A Technical Case Study. https://www.mayhem.security/blog/uncovering-vulner abilities-in-open-source-libraries
work page 2024
- [29]
-
[30]
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Trans. Assoc. Comput. Linguistics12 (2024), 157–173. doi: 10.1162/TACL_A_00638
-
[31]
Zijun Liu, Zhennan Wan, Peng Li, Ming Yan, Ji Zhang, Fei Huang, and Yang Liu
-
[32]
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration.CoRRabs/2505.21471 (2025). arXiv:2505.21471 doi: 10.48550/ARXIV.2505.21471
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.21471 2025
-
[33]
Valentin J. M. Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo. 2021. The Art, Science, and Engineering of Fuzzing: A Survey.IEEE Trans. Software Eng.47, 11 (2021), 2312–
work page 2021
-
[34]
doi: 10.1109/TSE.2019.2946563
-
[35]
Coimbra, Nuno Santos, Limin Jia, and José Fragoso Santos
Filipe Marques, Mafalda Ferreira, André Nascimento, Miguel E. Coimbra, Nuno Santos, Limin Jia, and José Fragoso Santos. 2025. Automated Exploit Generation for Node.js Packages.Proc. ACM Program. Lang.9, PLDI (2025), 1341–1366. doi: 10.1145/3729304
-
[36]
Antonio Germán Márquez, Ángel Jesús Varela-Vaca, María Teresa Gómez-López, José A. Galindo, and David Benavides. 2024. Vulnerability impact analysis in software project dependencies based on Satisfiability Modulo Theories (SMT). Comput. Secur.139 (2024), 103669. doi: 10.1016/J.COSE.2023.103669
-
[37]
Miller, Lars Fredriksen, and Bryan So
Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities.Commun. ACM33, 12 (1990), 32–44. doi: 10.1145/96267.96279
-
[38]
James Newsome and Dawn Xiaodong Song. 2005. Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Com- modity Software. InProceedings of the Network and Distributed System Secu- rity Symposium, NDSS 2005, San Diego, California, USA. The Internet Society. https://www.ndss-symposium.org/ndss2005/dynamic-taint-analy...
work page 2005
-
[39]
Vikram Nitin, Baishakhi Ray, and Roshanak Zilouchian Moghaddam. 2025. Fault- Line: Automated Proof-of-Vulnerability Generation Using LLM Agents.CoRR abs/2507.15241 (2025). arXiv:2507.15241 doi: 10.48550/ARXIV.2507.15241
-
[40]
Ana Nunez, Nafis Tanveer Islam, Sumit Kumar Jha, and Peyman Najafirad. 2024. AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Genera- tion through Static Analysis and Fuzz Testing.CoRRabs/2409.10737 (2024). arXiv:2409.10737 doi: 10.48550/ARXIV.2409.10737
-
[41]
Wanzong Peng, Lin Ye, Xuetao Du, Hongli Zhang, Dongyang Zhan, Yunting Zhang, Yicheng Guo, and Chen Zhang. 2025. PwnGPT: Automatic Exploit Gener- ation Based on Large Language Models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxia...
work page 2025
-
[42]
Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou, and Junyang Lin. 2025. Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture- of-Expert Models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL...
work page 2025
-
[43]
Francisco Ribeiro. 2023. Large Language Models for Automated Program Repair. InCompanion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, SPLASH 2023, Cascais, Portugal, October 22-27, 2023, Vasco Thudichum Vasconce- los (Ed.). ACM, 7–9. doi: 10.1145/3618305.3623587
-
[44]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang
-
[45]
Pinpoint: fast and precise sparse value flow analysis for million lines of code. InProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation(Philadelphia, PA, USA)(PLDI 2018). Association for Computing Machinery, New York, NY, USA, 693–706. doi: 10.1145/3192366. 3192418
-
[46]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Krügel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. InIEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016. IEEE Comput...
-
[47]
Deniz Simsek, Aryaz Eghbali, and Michael Pradel. 2025. PoCGen: Generat- ing Proof-of-Concept Exploits for Vulnerabilities in Npm Packages.CoRR abs/2506.04962 (2025). arXiv:2506.04962 doi: 10.48550/ARXIV.2506.04962
-
[48]
Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vi- gna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execu- tion. In23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016. T...
work page 2016
- [49]
- [50]
-
[51]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novem...
work page 2022
-
[52]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview .net/forum?id=WE_vluYUL-X
work page 2023
-
[53]
Michał Zalewski. 2014. American fuzzy lop. http://lcamtuf.coredump.cx/afl/. (2014)
work page 2014
-
[54]
Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, and Lingpeng Kong. 2023. CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Hon- olulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara E...
work page 2023
-
[55]
Yuntong Zhang, Jiawei Wang, Dominic Berzin, Martin Mirchev, Dongge Liu, Ab- hishek Arya, Oliver Chang, and Abhik Roychoudhury. 2024. Fixing Security Vul- nerabilities with AI in OSS-Fuzz.CoRRabs/2411.03346 (2024). arXiv:2411.03346 doi: 10.48550/ARXIV.2411.03346
-
[56]
Zexin Zhong, Jiangchao Liu, Diyu Wu, Peng Di, Yulei Sui, Alex X. Liu, and John C. S. Lui. 2023. Scalable Compositional Static Taint Analysis for Sen- sitive Data Tracing on Industrial Micro-Services. In45th IEEE/ACM Interna- tional Conference on Software Engineering: Software Engineering in Practice, A Multi-Agent Framework for Automated Exploit Generatio...
-
[57]
Zhuotong Zhou, Yongzhuo Yang, Susheng Wu, Yiheng Huang, Bihuan Chen, and Xin Peng. 2024. Magneto: A Step-Wise Approach to Exploit Vulnerabilities in Dependent Libraries via LLM-Empowered Directed Fuzzing. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.