Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection

Bo Lin; Jie Yu; Jing Wang; Jun Ma; Shangwen Wang; Xiaoguang Mao; Xiaoling Li; Xin Peng

arxiv: 2605.18153 · v1 · pith:VFDHV4XGnew · submitted 2026-05-18 · 💻 cs.SE

Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection

Xin Peng , Bo Lin , Jing Wang , Xiaoling Li , Jun Ma , Jie Yu , Xiaoguang Mao , Shangwen Wang This is my paper

Pith reviewed 2026-05-20 09:13 UTC · model grok-4.3

classification 💻 cs.SE

keywords vulnerability detectionlarge language modelsmulti-agent systemssoftware securitycode analysisreasoning frameworksconflict resolutiondebate mechanisms

0 comments

The pith

Three LLM agents using distinct reasoning modes detect code vulnerabilities more accurately by debating disagreements until resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework in which three large language model agents, each following a separate reasoning style, first review source code independently for potential security issues. These agents then engage in a structured process of challenging one another's conclusions and revising their views through multiple rounds of rebuttal. The goal is to produce a final judgment that accounts for perspectives a single agent would overlook. A sympathetic reader would care because many real-world vulnerabilities arise from complex interactions that isolated analyses tend to miss, and successful collaboration could reduce undetected flaws in deployed software.

Core claim

The central claim is that a multi-perspective reasoning framework built around three specialized LLM agents, combined with an iterative rebuttal-revision debate mechanism, produces superior vulnerability detection by surfacing and correcting individual errors that arise when agents disagree, leading to more reliable identification of security flaws in source code.

What carries the argument

Three LLM agents each embodying a distinct reasoning mode together with the iterative rebuttal-revision debate mechanism that drives conflict resolution and collaborative judgment.

If this is right

Vulnerability detection becomes more robust when multiple complementary reasoning styles are forced to reconcile differences rather than relying on any one style alone.
The debate mechanism provides a built-in error-correction step that improves results on cases where individual agents reach conflicting conclusions.
The same structure can be applied to other code-analysis tasks that benefit from reconciling divergent initial judgments.
Automated security tools gain reliability without requiring changes to the underlying language models themselves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Development teams could embed similar multi-agent debate steps inside continuous integration pipelines to flag security risks before code is merged.
The approach might generalize to related problems such as detecting logical bugs or performance issues where single analyses often diverge.
Further gains could come from testing whether adding a fourth reasoning mode or occasional human oversight strengthens the error-correction effect.

Load-bearing premise

The three reasoning modes are sufficiently different that their debate process corrects errors instead of simply reinforcing shared mistakes or model biases.

What would settle it

Run the framework on a collection of code snippets containing documented vulnerabilities that single-reasoning methods fail to flag; if the multi-agent system also misses them or resolves its internal debates in favor of incorrect outcomes, the central claim would be refuted.

Figures

Figures reproduced from arXiv: 2605.18153 by Bo Lin, Jie Yu, Jing Wang, Jun Ma, Shangwen Wang, Xiaoguang Mao, Xiaoling Li, Xin Peng.

**Figure 1.** Figure 1: The overlap of vulnerabilities correctly identified by SAVul (Deductive), VulRAG (Inductive), and VulnSage (Abductive) on the PairVul dataset. While recent LLM-based methods have shown considerable promise in vulnerability detection, their effectiveness is often constrained by an implicit reliance on a single reasoning paradigm. To investigate this limitation, we conducted an empirical study on the Pair… view at source ↗

**Figure 2.** Figure 2: Three motivating cases from the Linux kernel, selected from the PairVul dataset, demonstrating the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The workflow of ReasonVul. This two-stage architecture is a direct response to our findings in Section 3. Relying on a single reasoning paradigm results in critical blind spots, while simplistic aggregation methods like majority voting are insufficient for resolving nuanced disagreements. By contrast, ReasonVul’s debate mechanism facilitates a deeper integration of perspectives, leading to more accurate an… view at source ↗

**Figure 4.** Figure 4: The impact of debate rounds Introducing debate substantially improves performance: with just 1 round, PairAcc increases to 26.46%, Accuracy to 59.38%, and F1- score to 61.54%. The best results are achieved at 2 rounds, where PairAcc reaches 30.42%, Accuracy 62.50%, and F1-score 63.41%, demonstrating that structured debate effectively promotes cognitive synergy through iterative refinement. Beyond 2 roun… view at source ↗

**Figure 5.** Figure 5: Case study of the ReasonVul for CVE-2021-41864. structured debate, enables ReasonVul to detect subtle vulnerabilities that might be overlooked by any single reasoning method or by simple aggregation approaches such as majority voting. The condensed transcript of the agent interactions is depicted in [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: The overlaps of the vulnerabilities detected by [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Automated vulnerability detection is crucial for enhancing software security by identifying potential flaws that attackers could exploit, thereby reducing the reliance on labor-intensive manual code audits. Recent advancements have shifted towards leveraging large language models (LLMs) for vulnerability detection, with techniques like Vul-RAG and VulnSage demonstrating progress through structured prompting and external knowledge integration. However, these approaches typically rely on a single reasoning paradigm, limiting their ability to address the complex and diverse nature of real-world vulnerabilities. To overcome these limitations, we propose ReasonVul, a novel multi-perspective reasoning framework that harnesses cognitive synergy among three specialized LLM agents, each embodying a distinct reasoning mode. The framework begins with independent analyses of the source code, followed by a structured debate mechanism to resolve conflicts through iterative rebuttal and revision, ultimately converging on a collaborative judgment. Evaluated on the PrimeVul dataset, ReasonVul achieves a PairAcc of 40.00% and an F1-score of 72.52%, surpassing the best baseline by 81.24% in PairAcc. Further tests on the JITVUL dataset confirm its generalizability, with a PairAcc of 28.67%. Additionally, we analyzed 542 conflict cases and found that 389 were correctly resolved, highlighting the framework's ability to uncover hidden vulnerabilities through the error-correction mechanism driven by the debate. This work emphasizes the importance of multi-perspective reasoning and collaborative validation in achieving robust and comprehensive vulnerability detection in real-world software systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Three-agent LLM debate improves vulnerability detection metrics over baselines, but needs ablations to confirm the mechanism works as claimed.

read the letter

The main thing to know is that this three-agent debate framework for vulnerability detection reports solid gains over prior single-paradigm LLM methods on the datasets they tested. The new part is the combination of three specialized agents, each with a distinct reasoning mode, followed by structured debate with rebuttal and revision to handle conflicts. They show this on PrimeVul with a big jump in PairAcc to 40% and F1 of 72.52%, plus confirmation on JITVUL. Breaking down the 542 conflicts and resolving 389 correctly is a nice touch that gives some insight into the process. It does well at presenting a practical framework that moves beyond single prompting, and the numbers suggest the collaborative approach adds value in catching more issues. The soft spots are around the experimental controls. The abstract gives no ablations that isolate the debate step from just using more tokens or different prompting overall. Without prompt details or comparisons to majority voting, it's tough to confirm that the improvement comes from complementary perspectives rather than shared model biases being averaged out or extra inference steps. The central claim about error correction through debate rests on those unshown details. This paper is for folks in software security and LLM applications to code analysis. A reader working on automated vulnerability tools would get practical ideas from the setup and results. It has enough empirical grounding to deserve a serious referee, who could push for the missing ablations and statistical checks. I would recommend putting it through peer review rather than desk rejecting it.

Referee Report

2 major / 1 minor

Summary. The paper proposes ReasonVul, a multi-perspective reasoning framework for enhanced vulnerability detection using three specialized LLM agents that perform independent analyses and engage in an iterative debate with rebuttal and revision to resolve conflicts. It reports achieving a PairAcc of 40.00% and F1-score of 72.52% on the PrimeVul dataset, surpassing the best baseline by 81.24%, with generalizability shown on JITVUL at 28.67% PairAcc and correct resolution of 389 out of 542 conflict cases.

Significance. If the experimental claims hold after verification, this work would demonstrate the value of structured multi-agent debate for improving LLM reliability on vulnerability detection tasks, potentially advancing automated security analysis beyond single-paradigm prompting approaches.

major comments (2)

[Abstract] Abstract: the central performance claims (40.00% PairAcc and 81.24% relative gain on PrimeVul) are load-bearing, yet the manuscript provides no description of baseline implementations, exact data splits, or statistical significance tests, preventing assessment of whether the reported lift is robust.
[Evaluation] Evaluation section: the claim that the debate mechanism correctly resolves 389/542 conflicts and drives error correction rests on the assumption of complementary reasoning modes, but no ablation (e.g., replacing iterative rebuttal-revision with majority vote or single extended chain) or control for total LLM calls/tokens is reported, leaving open alternative explanations for the observed gains.

minor comments (1)

[Abstract] Abstract: define PairAcc explicitly and clarify how 'correct resolution' of conflicts is determined against ground truth.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and commit to revisions that strengthen the experimental reporting and analysis.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (40.00% PairAcc and 81.24% relative gain on PrimeVul) are load-bearing, yet the manuscript provides no description of baseline implementations, exact data splits, or statistical significance tests, preventing assessment of whether the reported lift is robust.

Authors: We agree that the abstract and main text would benefit from greater transparency on these points. In the revised manuscript we will add concise descriptions of the baseline implementations, the precise train/test splits on PrimeVul and JITVUL, and the results of statistical significance tests (e.g., McNemar’s test) for the reported PairAcc and F1 improvements. revision: yes
Referee: [Evaluation] Evaluation section: the claim that the debate mechanism correctly resolves 389/542 conflicts and drives error correction rests on the assumption of complementary reasoning modes, but no ablation (e.g., replacing iterative rebuttal-revision with majority vote or single extended chain) or control for total LLM calls/tokens is reported, leaving open alternative explanations for the observed gains.

Authors: This concern is well-founded. We will include new ablation studies that replace the iterative rebuttal-revision process with (i) a majority-vote aggregator and (ii) a single extended chain-of-thought prompt, while explicitly controlling for the total number of LLM calls and tokens. These experiments will be reported alongside the existing 389/542 conflict-resolution analysis to isolate the contribution of the structured debate. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical framework evaluation on held-out data

full rationale

The paper introduces ReasonVul as a multi-agent LLM framework with independent analyses followed by rebuttal-revision debate, then reports measured performance (PairAcc 40.00%, F1 72.52% on PrimeVul; 28.67% on JITVUL) and counts of resolved conflicts (389/542) directly from experiments on external datasets. No equations, fitted parameters, or first-principles derivations exist that could reduce to self-defined inputs. Claims rest on observed outcomes rather than quantities constructed by definition or self-citation chains. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical performance of LLM agents whose internal reasoning capabilities are treated as black-box oracles; no new mathematical axioms or invented physical entities are introduced.

axioms (2)

domain assumption LLM agents can embody distinct and complementary reasoning modes when given specialized prompts
Invoked in the description of the three specialized agents and the expectation that their independent analyses will produce useful diversity.
domain assumption Structured debate via iterative rebuttal and revision will converge on more accurate judgments than independent analysis
Central to the framework's error-correction mechanism described in the abstract.

pith-pipeline@v0.9.0 · 5819 in / 1428 out tokens · 36403 ms · 2026-05-20T09:13:12.302000+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three specialized LLM agents, each embodying a distinct reasoning mode... deductive, inductive, and abductive reasoning... structured debate mechanism to resolve conflicts through iterative rebuttal and revision
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PairAcc of 40.00% and an F1-score of 72.52%... 389 out of 542 conflict cases

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 2 internal anchors

[1]

Andrei Arusoaie, Stefan Ciobâca, Vlad Craciun, Dragos Gavrilut, and Dorel Lucanu. 2017. A Comparison of Open- Source Static Analysis Tools for Vulnerability Detection in C/C++ Code. In2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). 161–168. doi:10.1109/SYNASC.2017.00035

work page doi:10.1109/synasc.2017.00035 2017
[2]

Canan Batur Şahin and Laith Abualigah. 2021. A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection.Neural Computing and Applications33, 20 (2021), 14049–14067

work page 2021
[3]

Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. Bgnn4vd: Constructing bidirectional graph neural- network for vulnerability detection.Information and Software Technology136 (2021), 106576. , Vol. 1, No. 1, Article . Publication date: May 2026. ReasonVul: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection 21

work page 2021
[4]

Sicong Cao, Xiaobing Sun, Lili Bo, Rongxin Wu, Bin Li, and Chuanqi Tao. 2022. MVD: memory-related vulnerability detection based on flow-sensitive graph neural networks. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1456–1468. doi:10.11...

work page doi:10.1145/3510003.3510219 2022
[5]

Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, and Wei Liu. 2024. Coca: Improving and explaining graph neural network-based vulnerability detection systems. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

work page 2024
[6]

Ira Ceka, Feitong Qiao, Anik Dey, Aastha Valecha, Gail Kaiser, and Baishakhi Ray. 2024. Can llm prompting serve as a proxy for static analysis in vulnerability detection.arXiv preprint arXiv:2412.12039(2024)

work page arXiv 2024
[7]

Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network.ACM Trans. Softw. Eng. Methodol.30, 3, Article 38 (April 2021), 33 pages. doi:10.1145/3436877

work page doi:10.1145/3436877 2021
[8]

Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, and Haipeng Cai. 2025. VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs.ACM Transactions on Software Engineering and Methodology(2025)

work page 2025
[9]

Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. 2024. Vulnerability Detection with Code Language Models: How Far Are We?. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 469–481

work page 2024
[10]

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, et al. 2024. Vul-rag: Enhancing llm-based vulnerability detection via knowledge-level rag.arXiv preprint arXiv:2406.11147(2024)

work page arXiv 2024
[11]

Andrew Estornell and Yang Liu. 2024. Multi-LLM Debate: Framework, Principals, and Interventions. 37 (2024), 28938–28964. doi:10.52202/079017-0911

work page doi:10.52202/079017-0911 2024
[12]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al . 2020. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[13]

Michael Fu and Chakkrit Tantithamthavorn. 2022. Linevul: A transformer-based line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories. 608–620

work page 2022
[14]

Michael Fu, Chakkrit Kla Tantithamthavorn, Van Nguyen, and Trung Le. 2023. Chatgpt for vulnerability detection, classification, and repair: How far are we?. In2023 30th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 632–636

work page 2023
[15]

Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, and Chao Zhang. 2023. How far have we gone in vulnerability detection using large language models.arXiv preprint arXiv:2311.12420(2023)

work page arXiv 2023
[16]

GNU. 2025. Gnu cflow: analyzing a collection of c source files, charting control flow within the program. (2025)

work page 2025
[17]

Nima Shiri Harzevili, Alvine Boaye Belle, Junjie Wang, Song Wang, Zhen Ming, Nachiappan Nagappan, et al. 2023. A survey on automated software vulnerability detection using machine learning and deep learning.arXiv preprint arXiv:2306.11673(2023)

work page arXiv 2023
[18]

Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Trans. Softw. Eng. Methodol.34, 5, Article 124 (May 2025), 30 pages. doi:10.1145/3712003

work page doi:10.1145/3712003 2025
[19]

Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, and Zhiyu Zoey Chen. 2024. Idea: Enhancing the rule learning ability of large language model agent through induction, deduction, and abduction.arXiv preprint arXiv:2408.10455(2024)

work page arXiv 2024
[20]

Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu. 2023. Large language model-powered smart contract vulnerability detection: New perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 297–306

work page 2023
[21]

Larry Huynh, Yinghao Zhang, Djimon Jayasundera, Woojin Jeon, Hyoungshick Kim, Tingting Bi, and Jin B. Hong

work page
[22]

Faster Hash-based Multi-valued Validated Asynchronous Byzantine Agreement

Detecting Code Vulnerabilities using LLMs. In2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 401–414. doi:10.1109/DSN64029.2025.00047

work page doi:10.1109/dsn64029.2025.00047 2025
[23]

Ridhi Jain, Nicole Gervasoni, Mthandazo Ndhlovu, and Sanjay Rawat. 2023. A code centric evaluation of c/c++ vulnerability datasets for deep learning based vulnerability detection techniques. InProceedings of the 16th Innovations in Software Engineering Conference. 1–10

work page 2023
[24]

Yuan Jiang, Yujian Zhang, Xiaohong Su, Christoph Treude, and Tiantian Wang. 2024. Stagedvulbert: Multi-granular vulnerability detection with a novel pre-trained code model.IEEE Transactions on Software Engineering(2024)

work page 2024
[25]

Jiaqi Li, Mengmeng Wang, Zilong Zheng, and Muhan Zhang. 2023. Loogle: Can long-context language models understand long contexts?arXiv preprint arXiv:2311.04939(2023)

work page arXiv 2023
[26]

Yue Li, Xiao Li, Hao Wu, Minghui Xu, Yue Zhang, Xiuzhen Cheng, Fengyuan Xu, and Sheng Zhong. 2025. Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask.arXiv preprint arXiv:2504.13474 (2025). , Vol. 1, No. 1, Article . Publication date: May 2026. 22 X. Peng, B. Lin, j. Wang, X. Li, J. Ma, J. Yu, X. Mao, S. Wang

work page arXiv 2025
[27]

Ziyang Li, Saikat Dutta, and Mayur Naik. 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. arXiv preprint arXiv:2405.17238(2024)

work page arXiv 2024
[28]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection.arXiv preprint arXiv:1801.01681(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

RongTao Liao, XueHu Yan, and KaiLong Zhu. 2025. KLRAG: Deep Learning Library Vulnerability Detection via Knowledge-Level RAG. InInternational Conference on Intelligent Computing. Springer, 443–456

work page 2025
[30]

Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, and Xiaoguang Mao. 2025. Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection. Association for Computing Machinery, New York, NY, USA. doi:10.1145/3719027.3765049

work page doi:10.1145/3719027.3765049 2025
[31]

Guanjun Lin, Sheng Wen, Qing-Long Han, Jun Zhang, and Yang Xiang. 2020. Software Vulnerability Detection Using Deep Neural Networks: A Survey.Proc. IEEE108, 10 (2020), 1825–1848. doi:10.1109/JPROC.2020.2993293

work page doi:10.1109/jproc.2020.2993293 2020
[32]

Stephan Lipp, Sebastian Banescu, and Alexander Pretschner. 2022. An empirical study on the effectiveness of static C code analyzers for vulnerability detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis(Virtual, South Korea)(ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 544–555. d...

work page doi:10.1145/3533767.3534380 2022
[33]

Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Machinery, N...

work page doi:10.1145/3726302.3730092 2025
[34]

Yisroel Mirsky, George Macon, Michael Brown, Carter Yagemann, Matthew Pruett, Evan Downing, Sukarno Mertoguno, and Wenke Lee. 2023. VulChecker: Graph-based Vulnerability Localization in Source Code. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, 6557–6574

work page 2023
[35]

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, and Dinh Phung. 2024. Deep domain adaptation with max-margin principle for cross-project imbalanced software vulnerability detection.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 1–34

work page 2024
[36]

Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, and Haipeng Cai. 2024. Chain-of-thought prompting of large language models for discovering and fixing software vulnerabilities.arXiv preprint arXiv:2402.17230 (2024)

work page arXiv 2024
[37]

Chitu Okoli. 2023. Inductive, abductive and deductive theorising.International Journal of Management Concepts and Philosophy16, 3 (2023), 302–316. doi:10.1504/IJMCP.2023.131769

work page doi:10.1504/ijmcp.2023.131769 2023
[38]

Xin Peng, Jieren Cheng, Xiangyan Tang, Jingxin Liu, and Jiahua Wu. 2023. Dual contrastive learning network for graph clustering.IEEE Transactions on Neural Networks and Learning Systems35, 8 (2023), 10846–10856

work page 2023
[39]

Xin Peng, Jieren Cheng, Xiangyan Tang, Bin Zhang, and Wenxuan Tu. 2024. Multi-view graph imputation network. Information Fusion102 (2024), 102024

work page 2024
[40]

Xin Peng, Shangwen Wang, Yihao Qin, Bo Lin, Liqian Chen, Jieren Cheng, and Xiaoguang Mao. 2025. Keep It Simple: Self-Adaptive Code Graph Simplification for Accurate Vulnerability Detection.IEEE Transactions on Software Engineering51, 10 (2025), 2744–2763. doi:10.1109/TSE.2025.3593515

work page doi:10.1109/tse.2025.3593515 2025
[41]

Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, and Wei Le. 2024. Towards Causal Deep Learning for Vulnerability Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. Association for Computing Machinery. doi:10.1145/3597503.3639170

work page doi:10.1145/3597503.3639170 2024
[42]

Shahriyar Zaman Ridoy, Md Shazzad Hossain Shaon, Alfredo Cuzzocrea, and Mst Shapna Akter. 2024. EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code. In 2024 IEEE International Conference on Big Data (BigData). IEEE, 6356–6364

work page 2024
[43]

Niklas Risse, Jing Liu, and Marcel Böhme. 2025. Top score on the wrong exam: On benchmarking in machine learning for vulnerability detection.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 388–410

work page 2025
[44]

Robert C Seacord. 2008. The CERT C secure coding standard.Pearson Education(2008)

work page 2008
[45]

Miaomiao Shao and Yuxin Ding. 2024. FVD-DPM: Fine-grained Vulnerability Detection via Conditional Diffusion Probabilistic Models. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 7375–7392

work page 2024
[46]

Yu Sheng, Wanting Wen, Linjing Li, and Daniel Zeng. 2025. Evaluating Generalization Capability of Language Models across Abductive, Deductive and Inductive Logical Reasoning. InProceedings of the 31st International Conference on Computational Linguistics. 4945–4957

work page 2025
[47]

Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, and Jeff Huang. 2025. LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights.arXiv preprint arXiv:2502.07049(2025)

work page arXiv 2025
[48]

Samiha Shimmi, Ashiqur Rahman, Mohan Gadde, Hamed Okhravi, and Mona Rahimi. 2024. VulSim: Leveraging Similarity of Multi-Dimensional Neighbor Embeddings for Vulnerability Detection. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 1777–1794. , Vol. 1, No. 1, Article . Publication date: May 2026. ReasonVul: A Multi-perspective Rea...

work page 2024
[49]

Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis-Inspired Deep Learning for Efficient Vul- nerability Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. Association for Computing Machinery. doi:10.1145/3597503.3623345

work page doi:10.1145/3597503.3623345 2024
[50]

Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2237–2248. doi:10.1109/ICSE48619.2023.00188

work page doi:10.1109/icse48619.2023.00188 2023
[51]

Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Hengbo Tong, Swarna Das, Earl T Barr, and Wei Le. 2024. To err is machine: Vulnerability detection challenges llm reasoning.arXiv preprint arXiv:2403.17218(2024)

work page arXiv 2024
[52]

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, et al. 2024. jina-embeddings-v3: Multilingual embeddings with task lora.arXiv preprint arXiv:2409.10173(2024)

work page arXiv 2024
[53]

Shangwen Wang, Bo Lin, Liqian Chen, and Xiaoguang Mao. 2025. Divide-and-Conquer: Automating Code Revisions via Localization-and-Revision. 34, 3 (2025). doi:10.1145/3697013

work page doi:10.1145/3697013 2025
[54]

Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, and Qing Liao. 2024. Reposvul: A repository-level high-quality vulnerability dataset. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 472–483

work page 2024
[55]

Zhiyuan Wei, Jing Sun, Yuqiang Sun, Ye Liu, Daoyuan Wu, Zijian Zhang, Xianhao Zhang, Meng Li, Yang Liu, Chunmiao Li, et al . 2025. Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems.IEEE Transactions on Software Engineering(2025)

work page 2025
[56]

Xin-Cheng Wen, Yupan Chen, Cuiyun Gao, Hongyu Zhang, Jie M Zhang, and Qing Liao. 2023. Vulnerability detection with graph simplification and enhanced graph representation learning. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2275–2286

work page 2023
[57]

Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, and Cuiyun Gao. 2024. Vuleval: Towards repository- level evaluation of software vulnerability detection.arXiv preprint arXiv:2404.15596(2024)

work page arXiv 2024
[58]

Xin-Cheng Wen, Jiaxin Ye, Cuiyun Gao, Lianwei Wu, and Qing Liao. 2024. EvalSVA: Multi-Agent Evaluators for Next-Gen Software Vulnerability Assessment.arXiv preprint arXiv:2501.14737(2024)

work page arXiv 2024
[59]

Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, and David Lo. 2025. Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents.arXiv preprint arXiv:2505.10961(2025)

work page arXiv 2025
[60]

Yueming Wu, Deqing Zou, Shihan Dou, Wei Yang, Duo Xu, and Hai Jin. 2022. Vulcnn: An image-inspired scalable vulnerability detection system. InProceedings of the 44th International Conference on Software Engineering. 2365–2376

work page 2022
[61]

Yuying Xia, Haijian Shao, and Xing Deng. 2024. Vulcobert: A codebert-based system for source code vulnerability detection. InProceedings of the 2024 international conference on generative artificial intelligence and information security. 249–252

work page 2024
[62]

Xu Yang, Shaowei Wang, Yi Li, and Shaohua Wang. 2023. Does data sampling improve deep learning-based vulnerability detection? yeas! and nays!. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2287–2298

work page 2023
[63]

Xu Yang, Shaowei Wang, Jiayuan Zhou, and Wenhan Zhu. 2025. One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE).Proceedings of the ACM on Software Engineering2, FSE (2025), 446–464

work page 2025
[64]

Alperen Yildiz, Sin G Teo, Yiling Lou, Yebo Feng, Chong Wang, and Dinil M Divakaran. 2025. Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories.arXiv preprint arXiv:2503.03586(2025)

work page arXiv 2025
[65]

Xin Yin, Chao Ni, and Shaohua Wang. 2024. Multitask-based evaluation of open-source llm on software vulnerability. IEEE Transactions on Software Engineering(2024)

work page 2024
[66]

Jeffy Yu. 2024. Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection.arXiv preprint arXiv:2407.14838(2024)

work page arXiv 2024
[67]

Chenyuan Zhang, Hao Liu, Jiutian Zeng, Kejing Yang, Yuhong Li, and Hui Li. 2024. Prompt-enhanced software vulnerability detection using chatgpt. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 276–277

work page 2024
[68]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu, and Yang Liu. 2023. Detecting condition-related bugs with control flow graph neural network. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1370–1382

work page 2023
[69]

Tiehua Zhang, Rui Xu, Jianping Zhang, Yuze Liu, Xin Chen, Jun Yin, and Xi Zheng. 2024. DSHGT: Dual-Supervisors Heterogeneous Graph Transformer—A Pioneer Study of Using Heterogeneous Graph Learning for Detecting Software Vulnerabilities.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–31

work page 2024
[70]

Yu Zhao, Lina Gong, Zhiqiu Huang, Yongwei Wang, Mingqiang Wei, and Fei Wu. 2024. Coding-ptms: How to find optimal code pre-trained models for code embedding in vulnerability detection?. InProceedings of the 39th IEEE/ACM , Vol. 1, No. 1, Article . Publication date: May 2026. 24 X. Peng, B. Lin, j. Wang, X. Li, J. Ma, J. Yu, X. Mao, S. Wang International C...

work page 2024
[71]

Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2025. Large language model for vulnerability detection and repair: Literature review and the road ahead.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1–31

work page 2025
[72]

Xin Zhou, Ting Zhang, and David Lo. 2024. Large language model for vulnerability detection: Emerging results and future directions. InProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 47–51

work page 2024
[73]

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks.Advances in neural information processing systems32 (2019)

work page 2019
[74]

Kangchen Zhu, Zhiliang Tian, Shangwen Wang, Weiguo Chen, Zixuan Dong, Mingyue Leng, and Xiaoguang Mao

work page
[75]

ACM Softw

MiSum: Multi-modality Heterogeneous Code Graph Learning for Multi-intent Binary Code Summarization.Proc. ACM Softw. Eng.2, FSE (2025). doi:10.1145/3715780

work page doi:10.1145/3715780 2025
[76]

Yuan Zhuang, Zhenguang Liu, Peng Qian, Qi Liu, Xiang Wang, and Qinming He. 2021. Smart contract vulnerability detection using graph neural networks. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence(Yokohama, Yokohama, Japan)(IJCAI’20). Article 454, 8 pages

work page 2021
[77]

Arastoo Zibaeirad and Marco Vieira. 2024. Vulnllmeval: A framework for evaluating large language models in software vulnerability detection and patching.arXiv preprint arXiv:2409.10756(2024)

work page arXiv 2024
[78]

Arastoo Zibaeirad and Marco Vieira. 2025. Reasoning with llms for zero-shot vulnerability detection.arXiv preprint arXiv:2503.17885(2025). , Vol. 1, No. 1, Article . Publication date: May 2026

work page arXiv 2025

[1] [1]

Andrei Arusoaie, Stefan Ciobâca, Vlad Craciun, Dragos Gavrilut, and Dorel Lucanu. 2017. A Comparison of Open- Source Static Analysis Tools for Vulnerability Detection in C/C++ Code. In2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). 161–168. doi:10.1109/SYNASC.2017.00035

work page doi:10.1109/synasc.2017.00035 2017

[2] [2]

Canan Batur Şahin and Laith Abualigah. 2021. A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection.Neural Computing and Applications33, 20 (2021), 14049–14067

work page 2021

[3] [3]

Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. Bgnn4vd: Constructing bidirectional graph neural- network for vulnerability detection.Information and Software Technology136 (2021), 106576. , Vol. 1, No. 1, Article . Publication date: May 2026. ReasonVul: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection 21

work page 2021

[4] [4]

Sicong Cao, Xiaobing Sun, Lili Bo, Rongxin Wu, Bin Li, and Chuanqi Tao. 2022. MVD: memory-related vulnerability detection based on flow-sensitive graph neural networks. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1456–1468. doi:10.11...

work page doi:10.1145/3510003.3510219 2022

[5] [5]

Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, and Wei Liu. 2024. Coca: Improving and explaining graph neural network-based vulnerability detection systems. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

work page 2024

[6] [6]

Ira Ceka, Feitong Qiao, Anik Dey, Aastha Valecha, Gail Kaiser, and Baishakhi Ray. 2024. Can llm prompting serve as a proxy for static analysis in vulnerability detection.arXiv preprint arXiv:2412.12039(2024)

work page arXiv 2024

[7] [7]

Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network.ACM Trans. Softw. Eng. Methodol.30, 3, Article 38 (April 2021), 33 pages. doi:10.1145/3436877

work page doi:10.1145/3436877 2021

[8] [8]

Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, and Haipeng Cai. 2025. VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs.ACM Transactions on Software Engineering and Methodology(2025)

work page 2025

[9] [9]

Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. 2024. Vulnerability Detection with Code Language Models: How Far Are We?. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 469–481

work page 2024

[10] [10]

Xueying Du, Geng Zheng, Kaixin Wang, Yi Zou, Yujia Wang, Wentai Deng, Jiayi Feng, Mingwei Liu, Bihuan Chen, Xin Peng, et al. 2024. Vul-rag: Enhancing llm-based vulnerability detection via knowledge-level rag.arXiv preprint arXiv:2406.11147(2024)

work page arXiv 2024

[11] [11]

Andrew Estornell and Yang Liu. 2024. Multi-LLM Debate: Framework, Principals, and Interventions. 37 (2024), 28938–28964. doi:10.52202/079017-0911

work page doi:10.52202/079017-0911 2024

[12] [12]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al . 2020. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[13] [13]

Michael Fu and Chakkrit Tantithamthavorn. 2022. Linevul: A transformer-based line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories. 608–620

work page 2022

[14] [14]

Michael Fu, Chakkrit Kla Tantithamthavorn, Van Nguyen, and Trung Le. 2023. Chatgpt for vulnerability detection, classification, and repair: How far are we?. In2023 30th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 632–636

work page 2023

[15] [15]

Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, and Chao Zhang. 2023. How far have we gone in vulnerability detection using large language models.arXiv preprint arXiv:2311.12420(2023)

work page arXiv 2023

[16] [16]

GNU. 2025. Gnu cflow: analyzing a collection of c source files, charting control flow within the program. (2025)

work page 2025

[17] [17]

Nima Shiri Harzevili, Alvine Boaye Belle, Junjie Wang, Song Wang, Zhen Ming, Nachiappan Nagappan, et al. 2023. A survey on automated software vulnerability detection using machine learning and deep learning.arXiv preprint arXiv:2306.11673(2023)

work page arXiv 2023

[18] [18]

Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Trans. Softw. Eng. Methodol.34, 5, Article 124 (May 2025), 30 pages. doi:10.1145/3712003

work page doi:10.1145/3712003 2025

[19] [19]

Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, and Zhiyu Zoey Chen. 2024. Idea: Enhancing the rule learning ability of large language model agent through induction, deduction, and abduction.arXiv preprint arXiv:2408.10455(2024)

work page arXiv 2024

[20] [20]

Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu. 2023. Large language model-powered smart contract vulnerability detection: New perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 297–306

work page 2023

[21] [21]

Larry Huynh, Yinghao Zhang, Djimon Jayasundera, Woojin Jeon, Hyoungshick Kim, Tingting Bi, and Jin B. Hong

work page

[22] [22]

Faster Hash-based Multi-valued Validated Asynchronous Byzantine Agreement

Detecting Code Vulnerabilities using LLMs. In2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 401–414. doi:10.1109/DSN64029.2025.00047

work page doi:10.1109/dsn64029.2025.00047 2025

[23] [23]

Ridhi Jain, Nicole Gervasoni, Mthandazo Ndhlovu, and Sanjay Rawat. 2023. A code centric evaluation of c/c++ vulnerability datasets for deep learning based vulnerability detection techniques. InProceedings of the 16th Innovations in Software Engineering Conference. 1–10

work page 2023

[24] [24]

Yuan Jiang, Yujian Zhang, Xiaohong Su, Christoph Treude, and Tiantian Wang. 2024. Stagedvulbert: Multi-granular vulnerability detection with a novel pre-trained code model.IEEE Transactions on Software Engineering(2024)

work page 2024

[25] [25]

Jiaqi Li, Mengmeng Wang, Zilong Zheng, and Muhan Zhang. 2023. Loogle: Can long-context language models understand long contexts?arXiv preprint arXiv:2311.04939(2023)

work page arXiv 2023

[26] [26]

Yue Li, Xiao Li, Hao Wu, Minghui Xu, Yue Zhang, Xiuzhen Cheng, Fengyuan Xu, and Sheng Zhong. 2025. Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask.arXiv preprint arXiv:2504.13474 (2025). , Vol. 1, No. 1, Article . Publication date: May 2026. 22 X. Peng, B. Lin, j. Wang, X. Li, J. Ma, J. Yu, X. Mao, S. Wang

work page arXiv 2025

[27] [27]

Ziyang Li, Saikat Dutta, and Mayur Naik. 2024. IRIS: LLM-assisted static analysis for detecting security vulnerabilities. arXiv preprint arXiv:2405.17238(2024)

work page arXiv 2024

[28] [28]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection.arXiv preprint arXiv:1801.01681(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

RongTao Liao, XueHu Yan, and KaiLong Zhu. 2025. KLRAG: Deep Learning Library Vulnerability Detection via Knowledge-Level RAG. InInternational Conference on Intelligent Computing. Springer, 443–456

work page 2025

[30] [30]

Bo Lin, Shangwen Wang, Yihao Qin, Liqian Chen, and Xiaoguang Mao. 2025. Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection. Association for Computing Machinery, New York, NY, USA. doi:10.1145/3719027.3765049

work page doi:10.1145/3719027.3765049 2025

[31] [31]

Guanjun Lin, Sheng Wen, Qing-Long Han, Jun Zhang, and Yang Xiang. 2020. Software Vulnerability Detection Using Deep Neural Networks: A Survey.Proc. IEEE108, 10 (2020), 1825–1848. doi:10.1109/JPROC.2020.2993293

work page doi:10.1109/jproc.2020.2993293 2020

[32] [32]

Stephan Lipp, Sebastian Banescu, and Alexander Pretschner. 2022. An empirical study on the effectiveness of static C code analyzers for vulnerability detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis(Virtual, South Korea)(ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 544–555. d...

work page doi:10.1145/3533767.3534380 2022

[33] [33]

Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Machinery, N...

work page doi:10.1145/3726302.3730092 2025

[34] [34]

Yisroel Mirsky, George Macon, Michael Brown, Carter Yagemann, Matthew Pruett, Evan Downing, Sukarno Mertoguno, and Wenke Lee. 2023. VulChecker: Graph-based Vulnerability Localization in Source Code. In32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, 6557–6574

work page 2023

[35] [35]

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, and Dinh Phung. 2024. Deep domain adaptation with max-margin principle for cross-project imbalanced software vulnerability detection.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 1–34

work page 2024

[36] [36]

Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, and Haipeng Cai. 2024. Chain-of-thought prompting of large language models for discovering and fixing software vulnerabilities.arXiv preprint arXiv:2402.17230 (2024)

work page arXiv 2024

[37] [37]

Chitu Okoli. 2023. Inductive, abductive and deductive theorising.International Journal of Management Concepts and Philosophy16, 3 (2023), 302–316. doi:10.1504/IJMCP.2023.131769

work page doi:10.1504/ijmcp.2023.131769 2023

[38] [38]

Xin Peng, Jieren Cheng, Xiangyan Tang, Jingxin Liu, and Jiahua Wu. 2023. Dual contrastive learning network for graph clustering.IEEE Transactions on Neural Networks and Learning Systems35, 8 (2023), 10846–10856

work page 2023

[39] [39]

Xin Peng, Jieren Cheng, Xiangyan Tang, Bin Zhang, and Wenxuan Tu. 2024. Multi-view graph imputation network. Information Fusion102 (2024), 102024

work page 2024

[40] [40]

Xin Peng, Shangwen Wang, Yihao Qin, Bo Lin, Liqian Chen, Jieren Cheng, and Xiaoguang Mao. 2025. Keep It Simple: Self-Adaptive Code Graph Simplification for Accurate Vulnerability Detection.IEEE Transactions on Software Engineering51, 10 (2025), 2744–2763. doi:10.1109/TSE.2025.3593515

work page doi:10.1109/tse.2025.3593515 2025

[41] [41]

Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, and Wei Le. 2024. Towards Causal Deep Learning for Vulnerability Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. Association for Computing Machinery. doi:10.1145/3597503.3639170

work page doi:10.1145/3597503.3639170 2024

[42] [42]

Shahriyar Zaman Ridoy, Md Shazzad Hossain Shaon, Alfredo Cuzzocrea, and Mst Shapna Akter. 2024. EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code. In 2024 IEEE International Conference on Big Data (BigData). IEEE, 6356–6364

work page 2024

[43] [43]

Niklas Risse, Jing Liu, and Marcel Böhme. 2025. Top score on the wrong exam: On benchmarking in machine learning for vulnerability detection.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 388–410

work page 2025

[44] [44]

Robert C Seacord. 2008. The CERT C secure coding standard.Pearson Education(2008)

work page 2008

[45] [45]

Miaomiao Shao and Yuxin Ding. 2024. FVD-DPM: Fine-grained Vulnerability Detection via Conditional Diffusion Probabilistic Models. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 7375–7392

work page 2024

[46] [46]

Yu Sheng, Wanting Wen, Linjing Li, and Daniel Zeng. 2025. Evaluating Generalization Capability of Language Models across Abductive, Deductive and Inductive Logical Reasoning. InProceedings of the 31st International Conference on Computational Linguistics. 4945–4957

work page 2025

[47] [47]

Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, and Jeff Huang. 2025. LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights.arXiv preprint arXiv:2502.07049(2025)

work page arXiv 2025

[48] [48]

Samiha Shimmi, Ashiqur Rahman, Mohan Gadde, Hamed Okhravi, and Mona Rahimi. 2024. VulSim: Leveraging Similarity of Multi-Dimensional Neighbor Embeddings for Vulnerability Detection. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, 1777–1794. , Vol. 1, No. 1, Article . Publication date: May 2026. ReasonVul: A Multi-perspective Rea...

work page 2024

[49] [49]

Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis-Inspired Deep Learning for Efficient Vul- nerability Detection. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. Association for Computing Machinery. doi:10.1145/3597503.3623345

work page doi:10.1145/3597503.3623345 2024

[50] [50]

Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2237–2248. doi:10.1109/ICSE48619.2023.00188

work page doi:10.1109/icse48619.2023.00188 2023

[51] [51]

Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Hengbo Tong, Swarna Das, Earl T Barr, and Wei Le. 2024. To err is machine: Vulnerability detection challenges llm reasoning.arXiv preprint arXiv:2403.17218(2024)

work page arXiv 2024

[52] [52]

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, et al. 2024. jina-embeddings-v3: Multilingual embeddings with task lora.arXiv preprint arXiv:2409.10173(2024)

work page arXiv 2024

[53] [53]

Shangwen Wang, Bo Lin, Liqian Chen, and Xiaoguang Mao. 2025. Divide-and-Conquer: Automating Code Revisions via Localization-and-Revision. 34, 3 (2025). doi:10.1145/3697013

work page doi:10.1145/3697013 2025

[54] [54]

Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, and Qing Liao. 2024. Reposvul: A repository-level high-quality vulnerability dataset. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 472–483

work page 2024

[55] [55]

Zhiyuan Wei, Jing Sun, Yuqiang Sun, Ye Liu, Daoyuan Wu, Zijian Zhang, Xianhao Zhang, Meng Li, Yang Liu, Chunmiao Li, et al . 2025. Advanced Smart Contract Vulnerability Detection via LLM-Powered Multi-Agent Systems.IEEE Transactions on Software Engineering(2025)

work page 2025

[56] [56]

Xin-Cheng Wen, Yupan Chen, Cuiyun Gao, Hongyu Zhang, Jie M Zhang, and Qing Liao. 2023. Vulnerability detection with graph simplification and enhanced graph representation learning. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2275–2286

work page 2023

[57] [57]

Xin-Cheng Wen, Xinchen Wang, Yujia Chen, Ruida Hu, David Lo, and Cuiyun Gao. 2024. Vuleval: Towards repository- level evaluation of software vulnerability detection.arXiv preprint arXiv:2404.15596(2024)

work page arXiv 2024

[58] [58]

Xin-Cheng Wen, Jiaxin Ye, Cuiyun Gao, Lianwei Wu, and Qing Liao. 2024. EvalSVA: Multi-Agent Evaluators for Next-Gen Software Vulnerability Assessment.arXiv preprint arXiv:2501.14737(2024)

work page arXiv 2024

[59] [59]

Ratnadira Widyasari, Martin Weyssow, Ivana Clairine Irsan, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, and David Lo. 2025. Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents.arXiv preprint arXiv:2505.10961(2025)

work page arXiv 2025

[60] [60]

Yueming Wu, Deqing Zou, Shihan Dou, Wei Yang, Duo Xu, and Hai Jin. 2022. Vulcnn: An image-inspired scalable vulnerability detection system. InProceedings of the 44th International Conference on Software Engineering. 2365–2376

work page 2022

[61] [61]

Yuying Xia, Haijian Shao, and Xing Deng. 2024. Vulcobert: A codebert-based system for source code vulnerability detection. InProceedings of the 2024 international conference on generative artificial intelligence and information security. 249–252

work page 2024

[62] [62]

Xu Yang, Shaowei Wang, Yi Li, and Shaohua Wang. 2023. Does data sampling improve deep learning-based vulnerability detection? yeas! and nays!. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2287–2298

work page 2023

[63] [63]

Xu Yang, Shaowei Wang, Jiayuan Zhou, and Wenhan Zhu. 2025. One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE).Proceedings of the ACM on Software Engineering2, FSE (2025), 446–464

work page 2025

[64] [64]

Alperen Yildiz, Sin G Teo, Yiling Lou, Yebo Feng, Chong Wang, and Dinil M Divakaran. 2025. Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories.arXiv preprint arXiv:2503.03586(2025)

work page arXiv 2025

[65] [65]

Xin Yin, Chao Ni, and Shaohua Wang. 2024. Multitask-based evaluation of open-source llm on software vulnerability. IEEE Transactions on Software Engineering(2024)

work page 2024

[66] [66]

Jeffy Yu. 2024. Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection.arXiv preprint arXiv:2407.14838(2024)

work page arXiv 2024

[67] [67]

Chenyuan Zhang, Hao Liu, Jiutian Zeng, Kejing Yang, Yuhong Li, and Hui Li. 2024. Prompt-enhanced software vulnerability detection using chatgpt. InProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings. 276–277

work page 2024

[68] [68]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, Chunming Hu, and Yang Liu. 2023. Detecting condition-related bugs with control flow graph neural network. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1370–1382

work page 2023

[69] [69]

Tiehua Zhang, Rui Xu, Jianping Zhang, Yuze Liu, Xin Chen, Jun Yin, and Xi Zheng. 2024. DSHGT: Dual-Supervisors Heterogeneous Graph Transformer—A Pioneer Study of Using Heterogeneous Graph Learning for Detecting Software Vulnerabilities.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–31

work page 2024

[70] [70]

Yu Zhao, Lina Gong, Zhiqiu Huang, Yongwei Wang, Mingqiang Wei, and Fei Wu. 2024. Coding-ptms: How to find optimal code pre-trained models for code embedding in vulnerability detection?. InProceedings of the 39th IEEE/ACM , Vol. 1, No. 1, Article . Publication date: May 2026. 24 X. Peng, B. Lin, j. Wang, X. Li, J. Ma, J. Yu, X. Mao, S. Wang International C...

work page 2024

[71] [71]

Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2025. Large language model for vulnerability detection and repair: Literature review and the road ahead.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1–31

work page 2025

[72] [72]

Xin Zhou, Ting Zhang, and David Lo. 2024. Large language model for vulnerability detection: Emerging results and future directions. InProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 47–51

work page 2024

[73] [73]

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks.Advances in neural information processing systems32 (2019)

work page 2019

[74] [74]

Kangchen Zhu, Zhiliang Tian, Shangwen Wang, Weiguo Chen, Zixuan Dong, Mingyue Leng, and Xiaoguang Mao

work page

[75] [75]

ACM Softw

MiSum: Multi-modality Heterogeneous Code Graph Learning for Multi-intent Binary Code Summarization.Proc. ACM Softw. Eng.2, FSE (2025). doi:10.1145/3715780

work page doi:10.1145/3715780 2025

[76] [76]

Yuan Zhuang, Zhenguang Liu, Peng Qian, Qi Liu, Xiang Wang, and Qinming He. 2021. Smart contract vulnerability detection using graph neural networks. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence(Yokohama, Yokohama, Japan)(IJCAI’20). Article 454, 8 pages

work page 2021

[77] [77]

Arastoo Zibaeirad and Marco Vieira. 2024. Vulnllmeval: A framework for evaluating large language models in software vulnerability detection and patching.arXiv preprint arXiv:2409.10756(2024)

work page arXiv 2024

[78] [78]

Arastoo Zibaeirad and Marco Vieira. 2025. Reasoning with llms for zero-shot vulnerability detection.arXiv preprint arXiv:2503.17885(2025). , Vol. 1, No. 1, Article . Publication date: May 2026

work page arXiv 2025