Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

Chris Hicks; Dan Ristea; Ezzeldin Shereen; Madeleine Dwyer; Sanyam Vyas; Shae McFadden; Vasilios Mavroudis

arxiv: 2412.11194 · v2 · submitted 2024-12-15 · 💻 cs.SE · cs.AI

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

Dan Ristea , Shae McFadden , Ezzeldin Shereen , Madeleine Dwyer , Sanyam Vyas , Chris Hicks , Vasilios Mavroudis This is my paper

Pith reviewed 2026-05-23 06:48 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords automated vulnerability detectionmachine learningsurveypain pointsfeedback loopsdatasetsbinary classificationC/C++

0 comments

The pith

Twelve pain points in ML-based vulnerability detection form self-reinforcing loops that confine research to binary classification of C/C++ functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey of 87 works on machine learning for automated vulnerability detection identifies twelve pain points across the entire pipeline from data to evaluation. These pain points are not isolated but connected through feedback loops involving datasets, problem formulations, baselines, and metrics. The loops keep the field focused on a narrow task of binary classification for C/C++ code at the function level rather than broader goals like predicting vulnerability types or supporting more languages. The authors provide concrete recommendations to break each loop and test their ideas against a recent high-profile effort. If the analysis holds, following the recommendations would allow the field to address more realistic and useful detection problems.

Core claim

Through a systematization of 87 influential works based on problem formulation, input and detection granularity, target languages, metrics, datasets, and approach, the paper identifies twelve pain points that span the ML4AVD pipeline and demonstrates that they are self-reinforcing and causally inter-meshed, with feedback loops between datasets, formulations, baselines, and metrics that explain the persistent concentration on binary classification of C/C++ vulnerabilities at the function level, while pairing each pain point with recommendations to break the loops and using AIxCC as a case study.

What carries the argument

The twelve pain points spanning the ML4AVD pipeline and their causal inter-meshing through feedback loops between datasets, formulations, baselines, and metrics.

If this is right

The field optimizes for a narrow and artificial problem that omits vulnerability type prediction.
Broader language support beyond C/C++ remains unaddressed.
Separation of input granularity from detection granularity is not pursued.
Feedback loops between datasets, formulations, baselines, and metrics perpetuate the narrow focus.
Concrete recommendations paired with each pain point can break the reinforcing loops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar self-reinforcing loops between data, tasks, and metrics may operate in other machine learning applications to software engineering.
The rise of agentic coding frameworks that increase code production volume could make addressing these pain points more urgent.
Empirical tests could check whether following the recommendations produces measurable gains in practical detection utility outside the current narrow setting.

Load-bearing premise

The authors' categorization of the 87 works and their interpretation of causal inter-meshing among the pain points accurately reflect the dominant dynamics of the field rather than selection or interpretive artifacts.

What would settle it

A new empirical study that achieves strong, sustained performance gains on vulnerability type prediction, multiple languages beyond C/C++, and separated input versus detection granularity while using non-binary metrics would show the loops do not prevent progress as claimed.

Figures

Figures reproduced from arXiv: 2412.11194 by Chris Hicks, Dan Ristea, Ezzeldin Shereen, Madeleine Dwyer, Sanyam Vyas, Shae McFadden, Vasilios Mavroudis.

**Figure 2.** Figure 2: Summary of the article collection and screening process. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of the publication year of included papers after the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The five main components of AVD literature systematized in the paper. Each component is studied in one sub-section of Section [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of commonly-used programming languages accord [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Distribution of the most common evaluation metrics used in AVD. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Reported F1-scores by AVD solutions on the three most popular AVD datasets per year. Each point corresponds to one solution evaluated on a dataset. Some solutions are evaluated on multiple datasets. of the performance. Overall, 19 solutions used the F1-score in conjunction with one of the aforementioned datasets. The results of our meta-analysis are shown in [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of research has applied machine learning machine learning to automate vulnerability detection (ML4AVD), yet self-reported performance on the most popular datasets shows no clear upward trend. The ML4AVD research community has identified several flaws in problem formulations, datasets, and metrics, but these are discussed in isolation, leaving the overarching problems that generate and reinforce these flaws unaddressed. We first systematize the field through a survey of 87 influential works based on their problem formulation, input and detection granularity, target programming languages, evaluation metrics, datasets, and detection approach. Drawing on this corpus and prior empirical work, we identify twelve pain points spanning the ML4AVD pipeline and show that they are self-reinforcing and causally inter-meshed: feedback loops between datasets, formulations, baselines, and metrics perpetuate each other and explain the field's persistent concentration on binary classification of C/C++ vulnerabilities at the function level. Thus, the field optimizes for a narrow and artificial problem that omits vulnerability type prediction, broader language support, and separation of input from detection granularity. We pair each pain point with concrete recommendations to break these loops. Finally, we use AIxCC as a case study to assess how well a recent high-profile effort aligns with these recommendations and reflect on the relevance of ML4AVD in the era of agentic AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey of 87 papers identifies twelve pain points in ML4AVD and claims they form self-reinforcing loops, but the causal part rests on interpretation rather than direct tests.

read the letter

The paper's core contribution is a systematization of ML-based automated vulnerability detection that names twelve pain points across the pipeline and argues they interconnect to lock the field into binary classification of C/C++ functions. It draws on 87 works plus earlier empirical studies to map problem formulations, granularity choices, languages, metrics, datasets, and approaches, then pairs each pain point with concrete recommendations. It also checks a recent effort like AIxCC against those recommendations and notes the shift toward agentic AI.

Referee Report

2 major / 2 minor

Summary. The paper surveys 87 influential works in ML4AVD, systematizing them according to problem formulation, input and detection granularity, target programming languages, evaluation metrics, datasets, and detection approach. Drawing on this corpus and prior empirical studies, it identifies twelve pain points spanning the ML4AVD pipeline and argues that they are self-reinforcing and causally inter-meshed via feedback loops between datasets, formulations, baselines, and metrics; this explains the field's persistent concentration on binary classification of C/C++ vulnerabilities at the function level. The paper pairs each pain point with concrete recommendations, uses AIxCC as a case study, and reflects on relevance in the era of agentic AI.

Significance. If the categorization of the 87 works is representative and the inter-meshing interpretation holds, the work offers a significant synthesis that moves beyond isolated critiques of flaws in formulations, datasets, and metrics to a holistic view of systemic issues. The scale of the surveyed corpus and the explicit pairing of pain points with recommendations are strengths that could usefully guide the field toward broader support for vulnerability type prediction, language diversity, and separation of input from detection granularity. The AIxCC case study adds practical relevance.

major comments (2)

[Pain points section (drawing on systematization of 87 works)] The central claim that the twelve pain points are self-reinforcing and causally inter-meshed (feedback loops between datasets, formulations, baselines, and metrics explain the narrow concentration on binary C/C++ function-level classification) is presented as an interpretive synthesis from the 87-work corpus. No quantitative test (citation-graph analysis, temporal co-occurrence statistics, or ablation of the corpus) is provided to establish that altering one element would propagate to others rather than arising from independent causes such as early dataset releases. This is load-bearing for the main thesis.
[Systematization section] The selection criteria for the 87 works and the detailed categorization tables (by problem formulation, granularity, languages, metrics, datasets, approaches) are not described with sufficient explicitness to verify that the identified pain points and their inter-meshing accurately reflect dominant field dynamics rather than selection or interpretive artifacts.

minor comments (2)

[Abstract] Abstract contains a duplicated phrase: 'machine learning machine learning'.
[Title] Minor inconsistencies in capitalization or phrasing in section headings (e.g., 'all of its Pain Points' in the title).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments identify areas where our interpretive synthesis and methodology description can be strengthened. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Pain points section (drawing on systematization of 87 works)] The central claim that the twelve pain points are self-reinforcing and causally inter-meshed (feedback loops between datasets, formulations, baselines, and metrics explain the narrow concentration on binary C/C++ function-level classification) is presented as an interpretive synthesis from the 87-work corpus. No quantitative test (citation-graph analysis, temporal co-occurrence statistics, or ablation of the corpus) is provided to establish that altering one element would propagate to others rather than arising from independent causes such as early dataset releases. This is load-bearing for the main thesis.

Authors: We agree that the central claim is an interpretive synthesis rather than the result of a new quantitative analysis. The paper integrates patterns observed across the 87 works with prior empirical studies to propose feedback loops as an explanatory framework; it does not claim to have performed citation-graph analysis or ablation studies. To make the supporting evidence more transparent, we will add a subsection that explicitly maps concrete examples from multiple papers in the corpus to each proposed loop. This revision will clarify the basis for the interpretation without converting the survey into a meta-analytic study. revision: partial
Referee: [Systematization section] The selection criteria for the 87 works and the detailed categorization tables (by problem formulation, granularity, languages, metrics, datasets, approaches) are not described with sufficient explicitness to verify that the identified pain points and their inter-meshing accurately reflect dominant field dynamics rather than selection or interpretive artifacts.

Authors: We accept this criticism. The revised manuscript will include an expanded 'Survey Methodology' subsection that specifies the search databases and keywords, inclusion/exclusion criteria, time bounds, and the categorization procedure (including how disagreements were resolved). The tables will be updated with references to this methodology and brief notes on classification decisions for edge cases. These additions will allow readers to evaluate the corpus selection and categorization directly. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesis from external corpus

full rationale

The paper performs a systematization of 87 external works plus prior empirical studies to identify and interpret twelve pain points. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations reduce the central interpretive claim to quantities defined inside the paper; the concentration on binary C/C++ function-level classification is presented as an observed pattern in the surveyed literature rather than a constructed prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the assumption that the selected 87 works are representative of influential ML4AVD research and that the authors' qualitative analysis correctly identifies causal links among pain points.

axioms (1)

domain assumption The 87 influential works selected for the survey are representative of the dominant trends in ML4AVD research.
The systematization and pain-point identification are derived from this corpus.

pith-pipeline@v0.9.0 · 5842 in / 1248 out tokens · 54107 ms · 2026-05-23T06:48:27.566614+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

177 extracted references · 177 canonical work pages · 5 internal anchors

[1]

Cyber Grand Challenge,

DARPA, “Cyber Grand Challenge,” (Accessed 2024-10- 06). [Online]. Available: https://www.darpa.mil/about-us/timeline/ cyber-grand-challenge

work page 2024
[2]

AI Cyber Challenge (AIxCC),

——, “AI Cyber Challenge (AIxCC),” (Accessed 2024-10-03). [Online]. Available: https://aicyberchallenge.com/

work page 2024
[3]

Flawfinder,

D. A. Wheeler, “Flawfinder,” (Accessed 2024-09-23). [Online]. Available: https://github.com/david-a-wheeler/flawfinder

work page 2024
[4]

RATS - rough auditing tool for security,

A. Dunham, “RATS - rough auditing tool for security,” (Accessed 2024-09-23). [Online]. Available: https://github.com/ andrew-d/rough-auditing-tool-for-security

work page 2024
[5]

VulDeePecker: A deep learning-based system for vulnerability detection,

Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y . Zhong, “VulDeePecker: A deep learning-based system for vulnerability detection,” in Network and Distributed System Security Symposium , ser. NDSS 2018. Internet Society, 2018. [Online]. Available: https://doi.org/10.14722/ndss.2018.23158

work page doi:10.14722/ndss.2018.23158 2018
[6]

Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,

Y . Zhou, S. Liu, J. Siow, X. Du, and Y . Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/ 49265d2447bc3bbfe9e76306ce40a31f-Abstract.html

work page 2019
[7]

Liang, Y.-C

H. Hanif and S. Maffeis, “VulBERTa: Simplified source code pre-training for vulnerability detection,” in IEEE International Joint Conference on Neural Networks , 2022, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IJCNN55064.2022.9892280

work page doi:10.1109/ijcnn55064.2022.9892280 2022
[8]

LineVul: A transformer-based line- level vulnerability prediction,

M. Fu and C. Tantithamthavorn, “LineVul: A transformer-based line- level vulnerability prediction,” in Proceedings of the International Conference on Mining Software Repositories , 2022, pp. 608–620. [Online]. Available: https://doi.org/10.1145/3524842.3528452

work page doi:10.1145/3524842.3528452 2022
[9]

Large language model for vulnerability detection: Emerging results and future directions,

X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” in ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results , 2024, pp. 47–51. [Online]. Available: https://doi.org/10.1145/3639476.3639762

work page doi:10.1145/3639476.3639762 2024
[10]

An investigation of quality issues in vulnerability detection datasets,

Y . Guo and S. Bettaieb, “An investigation of quality issues in vulnerability detection datasets,” in IEEE European Symposium on Security and Privacy Workshops , 2023, pp. 29–33

work page 2023
[11]

Open science in software engineering: A study on deep learning-based vulnerability detection,

Y . Nong, R. Sharma, A. Hamou-Lhadj, X. Luo, and H. Cai, “Open science in software engineering: A study on deep learning-based vulnerability detection,” IEEE Trans. on Soft. Eng. , vol. 49, no. 4, pp. 1983–2005, 2022

work page 1983
[12]

Ai cyber risk benchmark: Automated exploitation capabilities,

D. Ristea, V . Mavroudis, and C. Hicks, “Ai cyber risk benchmark: Automated exploitation capabilities,” arXiv preprint arXiv:2410.21939v2, 2024

work page arXiv 2024
[13]

Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,

C. Beaman, M. Redbourne, J. D. Mummery, and S. Hakak, “Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,” Computers & Security , vol. 120, p. 102813, 2022

work page 2022
[14]

Vulnerability-oriented directed fuzzing for binary programs,

L. Yu, Y . Lu, Y . Shen, Y . Li, and Z. Pan, “Vulnerability-oriented directed fuzzing for binary programs,” Scientific Reports , vol. 12, no. 1, p. 4271, 2022

work page 2022
[15]

A Survey of Learning-based Automated Program Repair,

Q. Zhang, C. Fang, Y . Ma, W. Sun, and Z. Chen, “A Survey of Learning-based Automated Program Repair,” ACM Trans. on Software Engineering and Methodology , vol. 33, no. 2, Dec. 2023. [Online]. Available: https://doi.org/10.1145/3631974

work page doi:10.1145/3631974 2023
[16]

A theory of condition,

J. R. Rice, “A theory of condition,” SIAM Journal on Numerical Analysis, vol. 3, no. 2, pp. 287–310, 1966

work page 1966
[17]

Multilayer feedforward networks are universal approximators,

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks , vol. 2, no. 5, pp. 359–366, 1989

work page 1989
[18]

Ad- dressSanitizer: A fast address sanity checker,

K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Ad- dressSanitizer: A fast address sanity checker,” inUSENIX ATC 2012, 2012

work page 2012
[19]

Select—a formal system for testing and debugging programs by symbolic execution,

R. S. Boyer, B. Elspas, and K. N. Levitt, “Select—a formal system for testing and debugging programs by symbolic execution,” ACM SigPlan Notices, vol. 10, no. 6, pp. 234–245, 1975

work page 1975
[20]

Distributed representations of words and phrases and their compositionality,

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, vol. 26, 2013, code.google.com/p/word2vec. [Online]. Available: https://doi.org/10.5555/2999792.2999959

work page doi:10.5555/2999792.2999959 2013
[21]

In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP)

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Conference on Empirical Methods in Natural Language Processing , 2014, pp. 1532– 1543, https://github.com/stanfordnlp/GloVe. [Online]. Available: https://doi.org/10.3115/v1/D14-1162

work page doi:10.3115/v1/d14-1162 2014
[22]

Unsupervised learning of sentence embeddings using compositional n-gram features,

M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using compositional n-gram features,” in Conference of the North American Chapter of the Association for Computational Linguistics, 2018, https://github.com/epfml/sent2vec. [Online]. Available: https://doi.org/10.18653/v1/N18-1049

work page doi:10.18653/v1/n18-1049 2018
[23]

Distributed representations of sentences and documents,

Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International Conference on Machine Learning , 2014, pp. 1188–1196. [Online]. Available: https://proceedings.mlr. press/v32/le14.html

work page 2014
[24]

Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,

X. Yi, J. Wu, G. Li, A. K. Bashir, J. Li, and A. A. AlZubi, “Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,”IEEE Trans. on Network Science and Engineering, vol. 10, no. 5, pp. 2537–2550, 2022

work page 2022
[25]

ANTLR (ANother Tool for Language Recognition),

T. Parr, “ANTLR (ANother Tool for Language Recognition),” (Accessed 2024-10-23). [Online]. Available: https://github.com/ antlr/antlr4

work page 2024
[26]

astminer,

JetBrains, “astminer,” (Accessed 2024-10-23). [Online]. Available: https://github.com/JetBrains-Research/astmine

work page 2024
[27]

Joern: The Bug Hunter’s Workbench,

joern.io, “Joern: The Bug Hunter’s Workbench,” Jan. 2024, (Accessed 2024-10-23). [Online]. Available: https://github.com/ joernio/joern

work page 2024
[28]

Learning representations by back-propagating errors,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986

work page 1986
[29]

Long short-term memory,

S. Hochreiter, “Long short-term memory,” Neural Computation MIT- Press, 1997

work page 1997
[30]

Empirical evaluation of gated recurrent neural networks on sequence modeling,

J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014

work page 2014
[31]

Bidirectional recurrent neural net- works,

M. Schuster and K. Paliwal, “Bidirectional recurrent neural net- works,” IEEE Trans. on Signal Processing , vol. 45, no. 11, pp. 2673–2681, 1997

work page 1997
[32]

Attention is all you need,

A. Vaswani, “Attention is all you need,” Advances in Neural Infor- mation Processing Systems , 2017

work page 2017
[33]

BERT: Pre- training of deep bidirectional transformers for language understand- ing,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understand- ing,” in North American Chapter of the Association for Computa- tional Linguistics, 2019

work page 2019
[34]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “GPT4 technical report,” arXiv preprint arXiv:2303.08774 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al. , “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002
[36]

How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,

I. Rafols, L. Leydesdorff, A. O’Hare, P. Nightingale, and A. Stirling, “How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,” Research policy, vol. 41, no. 7, pp. 1262–1282, 2012. 14

work page 2012
[37]

Sok: Prudent evaluation practices for fuzzing,

M. Schloegel, N. Bars, N. Schiller, L. Bernhard, T. Scharnowski, A. Crump, A. Ale-Ebrahim, N. Bissantz, M. Muench, and T. Holz, “Sok: Prudent evaluation practices for fuzzing,” in 2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 1974– 1993

work page 2024
[38]

Rayyan—a web and mobile app for systematic reviews,

M. Ouzzani, H. Hammady, Z. Fedorowicz, and A. Elmagarmid, “Rayyan—a web and mobile app for systematic reviews,” Systematic Reviews, vol. 5, no. 1, p. 210, 2016

work page 2016
[39]

Citation analysis as a tool in journal evaluation,

E. Garfield, “Citation analysis as a tool in journal evaluation,” Science, vol. 178, no. 4060, pp. 471–479, 1972. [Online]. Available: https://www.science.org/doi/abs/10.1126/science.178.4060.471

work page doi:10.1126/science.178.4060.471 1972
[40]

Citation analysis of computer systems papers,

E. Frachtenberg, “Citation analysis of computer systems papers,” PeerJ Computer science, vol. 9, p. e1389, 2023

work page 2023
[41]

Toward large-scale vulnerability discovery using machine learning,

G. Grieco, G. L. Grinblat, L. Uzal, S. Rawat, J. Feist, and L. Mounier, “Toward large-scale vulnerability discovery using machine learning,” in Proceedings of the sixth ACM Conference on Data and Application Security and Privacy , 2016, pp. 85–96. [Online]. Available: https://doi.org/10.1145/2857705.2857720

work page doi:10.1145/2857705.2857720 2016
[42]

Static detection of vulnerabilities in x86 executables,

M. Cova, V . Felmetsger, G. Banks, and G. Vigna, “Static detection of vulnerabilities in x86 executables,” in IEEE Annual Computer Security Applications Conference , 2006, pp. 269–278. [Online]. Available: https://doi.org/10.1109/ACSAC.2006.50

work page doi:10.1109/acsac.2006.50 2006
[43]

HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,

H. Yan, S. Luo, L. Pan, and Y . Zhang, “HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,” Computers & Security , vol. 108, p. 102286, 2021. [Online]. Available: https://doi.org/10.1016/j.cose.2022.103023

work page doi:10.1016/j.cose.2022.103023 2021
[44]

Inputs of coma: Static detection of denial-of- service vulnerabilities,

R. Chang, G. Jiang, F. Ivancic, S. Sankaranarayanan, and V . Shmatikov, “Inputs of coma: Static detection of denial-of- service vulnerabilities,” in IEEE Computer Security Foundations Symposium, 2009, pp. 186–199. [Online]. Available: https: //doi.org/10.1109/CSF.2009.13

work page doi:10.1109/csf.2009.13 2009
[45]

Code-centric learning-based just-in-time vulnerability detection,

S. Nguyen, T.-T. Nguyen, T. T. Vu, T.-D. Do, K.-T. Ngo, and H. D. V o, “Code-centric learning-based just-in-time vulnerability detection,” Journal of Systems and Software , vol. 214, p. 112014,

work page
[46]

Available: https://doi.org/10.1016/j.jss.2024.112014

[Online]. Available: https://doi.org/10.1016/j.jss.2024.112014

work page doi:10.1016/j.jss.2024.112014 2024
[47]

Static detection of cross-site scripting vulnerabilities,

G. Wassermann and Z. Su, “Static detection of cross-site scripting vulnerabilities,” in International Conference on Software Engineering, 2008, pp. 171–180. [Online]. Available: https: //doi.org/10.1145/1368088.1368112

work page doi:10.1145/1368088.1368112 2008
[48]

ReDeBug: Finding unpatched code clones in entire OS distributions,

J. Jang, A. Agrawal, and D. Brumley, “ReDeBug: Finding unpatched code clones in entire OS distributions,” in IEEE Symposium on Security and Privacy , 2012, pp. 48–62. [Online]. Available: https://doi.org/10.1109/SP.2012.13

work page doi:10.1109/sp.2012.13 2012
[49]

LLbezpeky: Leveraging large language models for vulnerability detection,

N. S. Mathews, Y . Brus, Y . Aafer, M. Nagappan, and S. McIntosh, “LLbezpeky: Leveraging large language models for vulnerability detection,” arXiv preprint arXiv:2401.01269 , 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.01269

work page doi:10.48550/arxiv.2401.01269 2024
[50]

SySeVR: A framework for using deep learning to detect software vulnerabilities,

Z. Li, D. Zou, S. Xu, H. Jin, Y . Zhu, and Z. Chen, “SySeVR: A framework for using deep learning to detect software vulnerabilities,” IEEE Trans. on Dependable and Secure Computing, vol. 19, no. 4, p. 2244–2258, Jul. 2022. [Online]. Available: https://doi.org/10.1109/TDSC.2021.3051525

work page doi:10.1109/tdsc.2021.3051525 2022
[51]

DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,

X. Cheng, H. Wang, J. Hua, G. Xu, and Y . Sui, “DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,” ACM Trans. on Software Engineering and Methodology, vol. 30, no. 3, pp. 1–33, 2021. [Online]. Available: https://doi.org/10.1145/3436877

work page doi:10.1145/3436877 2021
[52]

Vu1SPG: Vulnerability detection based on slice property graph representation learning,

W. Zheng, Y . Jiang, and X. Su, “Vu1SPG: Vulnerability detection based on slice property graph representation learning,” in IEEE International Symposium on Software Reliability Engineering, 2021, pp. 457–467

work page 2021
[53]

VulDeBERT: A vulnerability detection system using BERT,

S. Kim, J. Choi, M. E. Ahmed, S. Nepal, and H. Kim, “VulDeBERT: A vulnerability detection system using BERT,” in IEEE Interna- tional Symposium on Software Reliability Engineering Workshops , 2022, pp. 69–74

work page 2022
[54]

Example- based vulnerability detection and repair in java code,

Y . Zhang, Y . Xiao, M. M. A. Kabir, D. Yao, and N. Meng, “Example- based vulnerability detection and repair in java code,” in Proceed- ings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022, pp. 190–201

work page 2022
[55]

VulSlicer: Vulnerability detection through code slicing,

S. Salimi and M. Kharrazi, “VulSlicer: Vulnerability detection through code slicing,” Journal of Systems and Software , vol. 193, p. 111450, 2022

work page 2022
[56]

Deep learning based vulnerability detection: Are we there yet?

S. Chakraborty, R. Krishna, Y . Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” IEEE Trans. on Soft. Eng., vol. 48, no. 9, pp. 3280–3296, 2021. [Online]. Available: https://doi.org/10.1109/TSE.2021.3087402

work page doi:10.1109/tse.2021.3087402 2021
[57]

Automated vulnerability detection in source code using deep representation learning,

R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and M. McConley, “Automated vulnerability detection in source code using deep representation learning,” in IEEE International Conference on Machine Learning and Applications , 2018, pp. 757–762. [Online]. Available: https: //doi.org/10.1109/ICMLA.2018.00120

work page doi:10.1109/icmla.2018.00120 2018
[58]

Combining graph-based learning with automated data collection for code vulnerability detection,

H. Wang, G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, Y . Feng, L. Bian, and Z. Wang, “Combining graph-based learning with automated data collection for code vulnerability detection,” IEEE Trans. on Information Forensics and Security , vol. 16, pp. 1943–1958, 2020. [Online]. Available: https: //doi.org/10.1109/TIFS.2020.3044773

work page doi:10.1109/tifs.2020.3044773 1943
[59]

Vulnerability detection with fine-grained interpretations,

Y . Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with fine-grained interpretations,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021, pp. 292–303. [Online]. Available: https://doi.org/10.1145/3468264. 3468597

work page doi:10.1145/3468264 2021
[60]

VUDDY: A scalable approach for vulnerable code clone discovery,

S. Kim, S. Woo, H. Lee, and H. Oh, “VUDDY: A scalable approach for vulnerable code clone discovery,” in Symposium on Security and Privacy , 2017, pp. 595–614. [Online]. Available: https://doi.org/10.1109/SP.2017.62

work page doi:10.1109/sp.2017.62 2017
[61]

Prompt- enhanced software vulnerability detection using ChatGPT,

C. Zhang, H. Liu, J. Zeng, K. Yang, Y . Li, and H. Li, “Prompt- enhanced software vulnerability detection using ChatGPT,” in IEEE/ACM International Conference on Software Engineering: Companion Proceedings , 2024, pp. 276–277. [Online]. Available: https://doi.org/10.1145/3639478.3643065

work page doi:10.1145/3639478.3643065 2024
[62]

BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,

S. Cao, X. Sun, L. Bo, Y . Wei, and B. Li, “BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,” Information and Software Technology , vol. 136, p. 106576, 2021. [Online]. Available: https://doi.org/10.1016/j.infsof.2021.106576

work page doi:10.1016/j.infsof.2021.106576 2021
[63]

MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,

S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,” in International Conference on Software Engineering , 2022, pp. 1456–1468

work page 2022
[64]

VulCNN: An image-inspired scalable vulnerability detection system,

Y . Wu, D. Zou, S. Dou, W. Yang, D. Xu, and H. Jin, “VulCNN: An image-inspired scalable vulnerability detection system,” in International Conference on Software Engineering, 2022, pp. 2365–

work page 2022
[65]

Available: https://doi.org/10.1145/3510003.3510229

[Online]. Available: https://doi.org/10.1145/3510003.3510229

work page doi:10.1145/3510003.3510229
[66]

ReGVD: Revisiting graph neural networks for vulnerability detection,

V .-A. Nguyen, D. Q. Nguyen, V . Nguyen, T. Le, Q. H. Tran, and D. Phung, “ReGVD: Revisiting graph neural networks for vulnerability detection,” in ACM/IEEE International Conference on Software Engineering: Companion Proceedings, 2022, pp. 178–182

work page 2022
[67]

Automated software vulnerability detection with machine learning

J. A. Harer, L. Y . Kim, R. L. Russell, O. Ozdemir, L. R. Kosta, A. Rangamani, L. H. Hamilton, G. I. Centeno, J. R. Key, P. M. Ellingwood et al. , “Automated software vulnerability detection with machine learning,” arXiv preprint arXiv:1803.04497 , 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1803.04497

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.04497 2018
[68]

Path-sensitive code embedding via contrastive learning for software vulnerability detection,

X. Cheng, G. Zhang, H. Wang, and Y . Sui, “Path-sensitive code embedding via contrastive learning for software vulnerability detection,” in ACM International Symposium on Software Testing and Analysis , 2022, pp. 519–531. [Online]. Available: https: //doi.org/10.1145/3533767.3534371 15

work page doi:10.1145/3533767.3534371 2022
[69]

Software vulnerability discovery via learning multi-domain knowledge bases,

G. Lin, J. Zhang, W. Luo, L. Pan, O. De Vel, P. Montague, and Y . Xiang, “Software vulnerability discovery via learning multi-domain knowledge bases,” IEEE Trans. on Dependable and Secure Computing, vol. 18, no. 5, pp. 2469–2485, 2019. [Online]. Available: https://doi.org/10.1109/TDSC.2019.2954088

work page doi:10.1109/tdsc.2019.2954088 2019
[70]

CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,

S. Liu, G. Lin, L. Qu, J. Zhang, O. De Vel, P. Montague, and Y . Xiang, “CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,” IEEE Trans. on Dependable and Secure Computing , vol. 19, no. 1, pp. 438–451, 2020. [Online]. Available: https://doi.org/10.1109/TDSC.2020.2984505

work page doi:10.1109/tdsc.2020.2984505 2020
[71]

Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,

X. Du, B. Chen, Y . Li, J. Guo, Y . Zhou, Y . Liu, and Y . Jiang, “Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,” in IEEE/ACM International Conference on Soft. Eng. , 2019, pp. 60–71

work page 2019
[72]

Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,

S. Liu, G. Lin, Q.-L. Han, S. Wen, J. Zhang, and Y . Xiang, “Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,” IEEE Trans. on Fuzzy Systems , vol. 28, no. 7, pp. 1329– 1343, 2019

work page 2019
[73]

POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,

G. Lin, J. Zhang, W. Luo, L. Pan, and Y . Xiang, “POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,” in ACM Conference on Computer and Communications Security, 2017, pp. 2539–2541

work page 2017
[74]

MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,

Y . Xiao, B. Chen, C. Yu, Z. Xu, Z. Yuan, F. Li, B. Liu, Y . Liu, W. Huo, W. Zou, and W. Shi, “MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,” in 29th USENIX Security Symposium . USENIX Association, Aug. 2020, pp. 1165–1182. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity20/presentation/xiao

work page 2020
[75]

GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,

G. Lu, X. Ju, X. Chen, W. Pei, and Z. Cai, “GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,” Journal of Systems and Software , vol. 212, p. 112031, 2024. [Online]. Available: https://doi.org/10.1016/j.jss.2024.112031

work page doi:10.1016/j.jss.2024.112031 2024
[76]

Chucky: Exposing missing checks in source code for vulnerability discovery,

F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck, “Chucky: Exposing missing checks in source code for vulnerability discovery,” in ACM Conference on Computer and Communications Security , 2013, pp. 499–510

work page 2013
[77]

CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,

W. Tang, M. Tang, M. Ban, Z. Zhao, and M. Feng, “CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,” Journal of Systems and Software, vol. 199, p. 111623, 2023

work page 2023
[78]

Vulnerability detection with deep learning,

F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in IEEE International Conference on Computer and Communications, 2017, pp. 1298–1302

work page 2017
[79]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

X.-C. Wen, Y . Chen, C. Gao, H. Zhang, J. M. Zhang, and Q. Liao, “Vulnerability detection with graph simplification and enhanced graph representation learning,” in IEEE/ACM 45th International Conference on Soft. Eng., 2023, pp. 2275–2286. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00191

work page doi:10.1109/icse48619.2023.00191 2023
[80]

Dataflow analysis-inspired deep learning for efficient vulnerability detection,

B. Steenhoek, H. Gao, and W. Le, “Dataflow analysis-inspired deep learning for efficient vulnerability detection,” in Proceedings of the 46th IEEE/ACM International Conference on Soft. Eng. , 2024, pp. 1–13

work page 2024

Showing first 80 references.

[1] [1]

Cyber Grand Challenge,

DARPA, “Cyber Grand Challenge,” (Accessed 2024-10- 06). [Online]. Available: https://www.darpa.mil/about-us/timeline/ cyber-grand-challenge

work page 2024

[2] [2]

AI Cyber Challenge (AIxCC),

——, “AI Cyber Challenge (AIxCC),” (Accessed 2024-10-03). [Online]. Available: https://aicyberchallenge.com/

work page 2024

[3] [3]

Flawfinder,

D. A. Wheeler, “Flawfinder,” (Accessed 2024-09-23). [Online]. Available: https://github.com/david-a-wheeler/flawfinder

work page 2024

[4] [4]

RATS - rough auditing tool for security,

A. Dunham, “RATS - rough auditing tool for security,” (Accessed 2024-09-23). [Online]. Available: https://github.com/ andrew-d/rough-auditing-tool-for-security

work page 2024

[5] [5]

VulDeePecker: A deep learning-based system for vulnerability detection,

Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y . Zhong, “VulDeePecker: A deep learning-based system for vulnerability detection,” in Network and Distributed System Security Symposium , ser. NDSS 2018. Internet Society, 2018. [Online]. Available: https://doi.org/10.14722/ndss.2018.23158

work page doi:10.14722/ndss.2018.23158 2018

[6] [6]

Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,

Y . Zhou, S. Liu, J. Siow, X. Du, and Y . Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/ 49265d2447bc3bbfe9e76306ce40a31f-Abstract.html

work page 2019

[7] [7]

Liang, Y.-C

H. Hanif and S. Maffeis, “VulBERTa: Simplified source code pre-training for vulnerability detection,” in IEEE International Joint Conference on Neural Networks , 2022, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IJCNN55064.2022.9892280

work page doi:10.1109/ijcnn55064.2022.9892280 2022

[8] [8]

LineVul: A transformer-based line- level vulnerability prediction,

M. Fu and C. Tantithamthavorn, “LineVul: A transformer-based line- level vulnerability prediction,” in Proceedings of the International Conference on Mining Software Repositories , 2022, pp. 608–620. [Online]. Available: https://doi.org/10.1145/3524842.3528452

work page doi:10.1145/3524842.3528452 2022

[9] [9]

Large language model for vulnerability detection: Emerging results and future directions,

X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” in ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results , 2024, pp. 47–51. [Online]. Available: https://doi.org/10.1145/3639476.3639762

work page doi:10.1145/3639476.3639762 2024

[10] [10]

An investigation of quality issues in vulnerability detection datasets,

Y . Guo and S. Bettaieb, “An investigation of quality issues in vulnerability detection datasets,” in IEEE European Symposium on Security and Privacy Workshops , 2023, pp. 29–33

work page 2023

[11] [11]

Open science in software engineering: A study on deep learning-based vulnerability detection,

Y . Nong, R. Sharma, A. Hamou-Lhadj, X. Luo, and H. Cai, “Open science in software engineering: A study on deep learning-based vulnerability detection,” IEEE Trans. on Soft. Eng. , vol. 49, no. 4, pp. 1983–2005, 2022

work page 1983

[12] [12]

Ai cyber risk benchmark: Automated exploitation capabilities,

D. Ristea, V . Mavroudis, and C. Hicks, “Ai cyber risk benchmark: Automated exploitation capabilities,” arXiv preprint arXiv:2410.21939v2, 2024

work page arXiv 2024

[13] [13]

Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,

C. Beaman, M. Redbourne, J. D. Mummery, and S. Hakak, “Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,” Computers & Security , vol. 120, p. 102813, 2022

work page 2022

[14] [14]

Vulnerability-oriented directed fuzzing for binary programs,

L. Yu, Y . Lu, Y . Shen, Y . Li, and Z. Pan, “Vulnerability-oriented directed fuzzing for binary programs,” Scientific Reports , vol. 12, no. 1, p. 4271, 2022

work page 2022

[15] [15]

A Survey of Learning-based Automated Program Repair,

Q. Zhang, C. Fang, Y . Ma, W. Sun, and Z. Chen, “A Survey of Learning-based Automated Program Repair,” ACM Trans. on Software Engineering and Methodology , vol. 33, no. 2, Dec. 2023. [Online]. Available: https://doi.org/10.1145/3631974

work page doi:10.1145/3631974 2023

[16] [16]

A theory of condition,

J. R. Rice, “A theory of condition,” SIAM Journal on Numerical Analysis, vol. 3, no. 2, pp. 287–310, 1966

work page 1966

[17] [17]

Multilayer feedforward networks are universal approximators,

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks , vol. 2, no. 5, pp. 359–366, 1989

work page 1989

[18] [18]

Ad- dressSanitizer: A fast address sanity checker,

K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Ad- dressSanitizer: A fast address sanity checker,” inUSENIX ATC 2012, 2012

work page 2012

[19] [19]

Select—a formal system for testing and debugging programs by symbolic execution,

R. S. Boyer, B. Elspas, and K. N. Levitt, “Select—a formal system for testing and debugging programs by symbolic execution,” ACM SigPlan Notices, vol. 10, no. 6, pp. 234–245, 1975

work page 1975

[20] [20]

Distributed representations of words and phrases and their compositionality,

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, vol. 26, 2013, code.google.com/p/word2vec. [Online]. Available: https://doi.org/10.5555/2999792.2999959

work page doi:10.5555/2999792.2999959 2013

[21] [21]

In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP)

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Conference on Empirical Methods in Natural Language Processing , 2014, pp. 1532– 1543, https://github.com/stanfordnlp/GloVe. [Online]. Available: https://doi.org/10.3115/v1/D14-1162

work page doi:10.3115/v1/d14-1162 2014

[22] [22]

Unsupervised learning of sentence embeddings using compositional n-gram features,

M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using compositional n-gram features,” in Conference of the North American Chapter of the Association for Computational Linguistics, 2018, https://github.com/epfml/sent2vec. [Online]. Available: https://doi.org/10.18653/v1/N18-1049

work page doi:10.18653/v1/n18-1049 2018

[23] [23]

Distributed representations of sentences and documents,

Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International Conference on Machine Learning , 2014, pp. 1188–1196. [Online]. Available: https://proceedings.mlr. press/v32/le14.html

work page 2014

[24] [24]

Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,

X. Yi, J. Wu, G. Li, A. K. Bashir, J. Li, and A. A. AlZubi, “Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,”IEEE Trans. on Network Science and Engineering, vol. 10, no. 5, pp. 2537–2550, 2022

work page 2022

[25] [25]

ANTLR (ANother Tool for Language Recognition),

T. Parr, “ANTLR (ANother Tool for Language Recognition),” (Accessed 2024-10-23). [Online]. Available: https://github.com/ antlr/antlr4

work page 2024

[26] [26]

astminer,

JetBrains, “astminer,” (Accessed 2024-10-23). [Online]. Available: https://github.com/JetBrains-Research/astmine

work page 2024

[27] [27]

Joern: The Bug Hunter’s Workbench,

joern.io, “Joern: The Bug Hunter’s Workbench,” Jan. 2024, (Accessed 2024-10-23). [Online]. Available: https://github.com/ joernio/joern

work page 2024

[28] [28]

Learning representations by back-propagating errors,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986

work page 1986

[29] [29]

Long short-term memory,

S. Hochreiter, “Long short-term memory,” Neural Computation MIT- Press, 1997

work page 1997

[30] [30]

Empirical evaluation of gated recurrent neural networks on sequence modeling,

J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014

work page 2014

[31] [31]

Bidirectional recurrent neural net- works,

M. Schuster and K. Paliwal, “Bidirectional recurrent neural net- works,” IEEE Trans. on Signal Processing , vol. 45, no. 11, pp. 2673–2681, 1997

work page 1997

[32] [32]

Attention is all you need,

A. Vaswani, “Attention is all you need,” Advances in Neural Infor- mation Processing Systems , 2017

work page 2017

[33] [33]

BERT: Pre- training of deep bidirectional transformers for language understand- ing,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understand- ing,” in North American Chapter of the Association for Computa- tional Linguistics, 2019

work page 2019

[34] [34]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “GPT4 technical report,” arXiv preprint arXiv:2303.08774 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al. , “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002

[36] [36]

How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,

I. Rafols, L. Leydesdorff, A. O’Hare, P. Nightingale, and A. Stirling, “How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,” Research policy, vol. 41, no. 7, pp. 1262–1282, 2012. 14

work page 2012

[37] [37]

Sok: Prudent evaluation practices for fuzzing,

M. Schloegel, N. Bars, N. Schiller, L. Bernhard, T. Scharnowski, A. Crump, A. Ale-Ebrahim, N. Bissantz, M. Muench, and T. Holz, “Sok: Prudent evaluation practices for fuzzing,” in 2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 1974– 1993

work page 2024

[38] [38]

Rayyan—a web and mobile app for systematic reviews,

M. Ouzzani, H. Hammady, Z. Fedorowicz, and A. Elmagarmid, “Rayyan—a web and mobile app for systematic reviews,” Systematic Reviews, vol. 5, no. 1, p. 210, 2016

work page 2016

[39] [39]

Citation analysis as a tool in journal evaluation,

E. Garfield, “Citation analysis as a tool in journal evaluation,” Science, vol. 178, no. 4060, pp. 471–479, 1972. [Online]. Available: https://www.science.org/doi/abs/10.1126/science.178.4060.471

work page doi:10.1126/science.178.4060.471 1972

[40] [40]

Citation analysis of computer systems papers,

E. Frachtenberg, “Citation analysis of computer systems papers,” PeerJ Computer science, vol. 9, p. e1389, 2023

work page 2023

[41] [41]

Toward large-scale vulnerability discovery using machine learning,

G. Grieco, G. L. Grinblat, L. Uzal, S. Rawat, J. Feist, and L. Mounier, “Toward large-scale vulnerability discovery using machine learning,” in Proceedings of the sixth ACM Conference on Data and Application Security and Privacy , 2016, pp. 85–96. [Online]. Available: https://doi.org/10.1145/2857705.2857720

work page doi:10.1145/2857705.2857720 2016

[42] [42]

Static detection of vulnerabilities in x86 executables,

M. Cova, V . Felmetsger, G. Banks, and G. Vigna, “Static detection of vulnerabilities in x86 executables,” in IEEE Annual Computer Security Applications Conference , 2006, pp. 269–278. [Online]. Available: https://doi.org/10.1109/ACSAC.2006.50

work page doi:10.1109/acsac.2006.50 2006

[43] [43]

HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,

H. Yan, S. Luo, L. Pan, and Y . Zhang, “HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,” Computers & Security , vol. 108, p. 102286, 2021. [Online]. Available: https://doi.org/10.1016/j.cose.2022.103023

work page doi:10.1016/j.cose.2022.103023 2021

[44] [44]

Inputs of coma: Static detection of denial-of- service vulnerabilities,

R. Chang, G. Jiang, F. Ivancic, S. Sankaranarayanan, and V . Shmatikov, “Inputs of coma: Static detection of denial-of- service vulnerabilities,” in IEEE Computer Security Foundations Symposium, 2009, pp. 186–199. [Online]. Available: https: //doi.org/10.1109/CSF.2009.13

work page doi:10.1109/csf.2009.13 2009

[45] [45]

Code-centric learning-based just-in-time vulnerability detection,

S. Nguyen, T.-T. Nguyen, T. T. Vu, T.-D. Do, K.-T. Ngo, and H. D. V o, “Code-centric learning-based just-in-time vulnerability detection,” Journal of Systems and Software , vol. 214, p. 112014,

work page

[46] [46]

Available: https://doi.org/10.1016/j.jss.2024.112014

[Online]. Available: https://doi.org/10.1016/j.jss.2024.112014

work page doi:10.1016/j.jss.2024.112014 2024

[47] [47]

Static detection of cross-site scripting vulnerabilities,

G. Wassermann and Z. Su, “Static detection of cross-site scripting vulnerabilities,” in International Conference on Software Engineering, 2008, pp. 171–180. [Online]. Available: https: //doi.org/10.1145/1368088.1368112

work page doi:10.1145/1368088.1368112 2008

[48] [48]

ReDeBug: Finding unpatched code clones in entire OS distributions,

J. Jang, A. Agrawal, and D. Brumley, “ReDeBug: Finding unpatched code clones in entire OS distributions,” in IEEE Symposium on Security and Privacy , 2012, pp. 48–62. [Online]. Available: https://doi.org/10.1109/SP.2012.13

work page doi:10.1109/sp.2012.13 2012

[49] [49]

LLbezpeky: Leveraging large language models for vulnerability detection,

N. S. Mathews, Y . Brus, Y . Aafer, M. Nagappan, and S. McIntosh, “LLbezpeky: Leveraging large language models for vulnerability detection,” arXiv preprint arXiv:2401.01269 , 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.01269

work page doi:10.48550/arxiv.2401.01269 2024

[50] [50]

SySeVR: A framework for using deep learning to detect software vulnerabilities,

Z. Li, D. Zou, S. Xu, H. Jin, Y . Zhu, and Z. Chen, “SySeVR: A framework for using deep learning to detect software vulnerabilities,” IEEE Trans. on Dependable and Secure Computing, vol. 19, no. 4, p. 2244–2258, Jul. 2022. [Online]. Available: https://doi.org/10.1109/TDSC.2021.3051525

work page doi:10.1109/tdsc.2021.3051525 2022

[51] [51]

DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,

X. Cheng, H. Wang, J. Hua, G. Xu, and Y . Sui, “DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,” ACM Trans. on Software Engineering and Methodology, vol. 30, no. 3, pp. 1–33, 2021. [Online]. Available: https://doi.org/10.1145/3436877

work page doi:10.1145/3436877 2021

[52] [52]

Vu1SPG: Vulnerability detection based on slice property graph representation learning,

W. Zheng, Y . Jiang, and X. Su, “Vu1SPG: Vulnerability detection based on slice property graph representation learning,” in IEEE International Symposium on Software Reliability Engineering, 2021, pp. 457–467

work page 2021

[53] [53]

VulDeBERT: A vulnerability detection system using BERT,

S. Kim, J. Choi, M. E. Ahmed, S. Nepal, and H. Kim, “VulDeBERT: A vulnerability detection system using BERT,” in IEEE Interna- tional Symposium on Software Reliability Engineering Workshops , 2022, pp. 69–74

work page 2022

[54] [54]

Example- based vulnerability detection and repair in java code,

Y . Zhang, Y . Xiao, M. M. A. Kabir, D. Yao, and N. Meng, “Example- based vulnerability detection and repair in java code,” in Proceed- ings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022, pp. 190–201

work page 2022

[55] [55]

VulSlicer: Vulnerability detection through code slicing,

S. Salimi and M. Kharrazi, “VulSlicer: Vulnerability detection through code slicing,” Journal of Systems and Software , vol. 193, p. 111450, 2022

work page 2022

[56] [56]

Deep learning based vulnerability detection: Are we there yet?

S. Chakraborty, R. Krishna, Y . Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” IEEE Trans. on Soft. Eng., vol. 48, no. 9, pp. 3280–3296, 2021. [Online]. Available: https://doi.org/10.1109/TSE.2021.3087402

work page doi:10.1109/tse.2021.3087402 2021

[57] [57]

Automated vulnerability detection in source code using deep representation learning,

R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and M. McConley, “Automated vulnerability detection in source code using deep representation learning,” in IEEE International Conference on Machine Learning and Applications , 2018, pp. 757–762. [Online]. Available: https: //doi.org/10.1109/ICMLA.2018.00120

work page doi:10.1109/icmla.2018.00120 2018

[58] [58]

Combining graph-based learning with automated data collection for code vulnerability detection,

H. Wang, G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, Y . Feng, L. Bian, and Z. Wang, “Combining graph-based learning with automated data collection for code vulnerability detection,” IEEE Trans. on Information Forensics and Security , vol. 16, pp. 1943–1958, 2020. [Online]. Available: https: //doi.org/10.1109/TIFS.2020.3044773

work page doi:10.1109/tifs.2020.3044773 1943

[59] [59]

Vulnerability detection with fine-grained interpretations,

Y . Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with fine-grained interpretations,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021, pp. 292–303. [Online]. Available: https://doi.org/10.1145/3468264. 3468597

work page doi:10.1145/3468264 2021

[60] [60]

VUDDY: A scalable approach for vulnerable code clone discovery,

S. Kim, S. Woo, H. Lee, and H. Oh, “VUDDY: A scalable approach for vulnerable code clone discovery,” in Symposium on Security and Privacy , 2017, pp. 595–614. [Online]. Available: https://doi.org/10.1109/SP.2017.62

work page doi:10.1109/sp.2017.62 2017

[61] [61]

Prompt- enhanced software vulnerability detection using ChatGPT,

C. Zhang, H. Liu, J. Zeng, K. Yang, Y . Li, and H. Li, “Prompt- enhanced software vulnerability detection using ChatGPT,” in IEEE/ACM International Conference on Software Engineering: Companion Proceedings , 2024, pp. 276–277. [Online]. Available: https://doi.org/10.1145/3639478.3643065

work page doi:10.1145/3639478.3643065 2024

[62] [62]

BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,

S. Cao, X. Sun, L. Bo, Y . Wei, and B. Li, “BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,” Information and Software Technology , vol. 136, p. 106576, 2021. [Online]. Available: https://doi.org/10.1016/j.infsof.2021.106576

work page doi:10.1016/j.infsof.2021.106576 2021

[63] [63]

MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,

S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,” in International Conference on Software Engineering , 2022, pp. 1456–1468

work page 2022

[64] [64]

VulCNN: An image-inspired scalable vulnerability detection system,

Y . Wu, D. Zou, S. Dou, W. Yang, D. Xu, and H. Jin, “VulCNN: An image-inspired scalable vulnerability detection system,” in International Conference on Software Engineering, 2022, pp. 2365–

work page 2022

[65] [65]

Available: https://doi.org/10.1145/3510003.3510229

[Online]. Available: https://doi.org/10.1145/3510003.3510229

work page doi:10.1145/3510003.3510229

[66] [66]

ReGVD: Revisiting graph neural networks for vulnerability detection,

V .-A. Nguyen, D. Q. Nguyen, V . Nguyen, T. Le, Q. H. Tran, and D. Phung, “ReGVD: Revisiting graph neural networks for vulnerability detection,” in ACM/IEEE International Conference on Software Engineering: Companion Proceedings, 2022, pp. 178–182

work page 2022

[67] [67]

Automated software vulnerability detection with machine learning

J. A. Harer, L. Y . Kim, R. L. Russell, O. Ozdemir, L. R. Kosta, A. Rangamani, L. H. Hamilton, G. I. Centeno, J. R. Key, P. M. Ellingwood et al. , “Automated software vulnerability detection with machine learning,” arXiv preprint arXiv:1803.04497 , 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1803.04497

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.04497 2018

[68] [68]

Path-sensitive code embedding via contrastive learning for software vulnerability detection,

X. Cheng, G. Zhang, H. Wang, and Y . Sui, “Path-sensitive code embedding via contrastive learning for software vulnerability detection,” in ACM International Symposium on Software Testing and Analysis , 2022, pp. 519–531. [Online]. Available: https: //doi.org/10.1145/3533767.3534371 15

work page doi:10.1145/3533767.3534371 2022

[69] [69]

Software vulnerability discovery via learning multi-domain knowledge bases,

G. Lin, J. Zhang, W. Luo, L. Pan, O. De Vel, P. Montague, and Y . Xiang, “Software vulnerability discovery via learning multi-domain knowledge bases,” IEEE Trans. on Dependable and Secure Computing, vol. 18, no. 5, pp. 2469–2485, 2019. [Online]. Available: https://doi.org/10.1109/TDSC.2019.2954088

work page doi:10.1109/tdsc.2019.2954088 2019

[70] [70]

CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,

S. Liu, G. Lin, L. Qu, J. Zhang, O. De Vel, P. Montague, and Y . Xiang, “CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,” IEEE Trans. on Dependable and Secure Computing , vol. 19, no. 1, pp. 438–451, 2020. [Online]. Available: https://doi.org/10.1109/TDSC.2020.2984505

work page doi:10.1109/tdsc.2020.2984505 2020

[71] [71]

Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,

X. Du, B. Chen, Y . Li, J. Guo, Y . Zhou, Y . Liu, and Y . Jiang, “Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,” in IEEE/ACM International Conference on Soft. Eng. , 2019, pp. 60–71

work page 2019

[72] [72]

Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,

S. Liu, G. Lin, Q.-L. Han, S. Wen, J. Zhang, and Y . Xiang, “Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,” IEEE Trans. on Fuzzy Systems , vol. 28, no. 7, pp. 1329– 1343, 2019

work page 2019

[73] [73]

POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,

G. Lin, J. Zhang, W. Luo, L. Pan, and Y . Xiang, “POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,” in ACM Conference on Computer and Communications Security, 2017, pp. 2539–2541

work page 2017

[74] [74]

MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,

Y . Xiao, B. Chen, C. Yu, Z. Xu, Z. Yuan, F. Li, B. Liu, Y . Liu, W. Huo, W. Zou, and W. Shi, “MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,” in 29th USENIX Security Symposium . USENIX Association, Aug. 2020, pp. 1165–1182. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity20/presentation/xiao

work page 2020

[75] [75]

GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,

G. Lu, X. Ju, X. Chen, W. Pei, and Z. Cai, “GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,” Journal of Systems and Software , vol. 212, p. 112031, 2024. [Online]. Available: https://doi.org/10.1016/j.jss.2024.112031

work page doi:10.1016/j.jss.2024.112031 2024

[76] [76]

Chucky: Exposing missing checks in source code for vulnerability discovery,

F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck, “Chucky: Exposing missing checks in source code for vulnerability discovery,” in ACM Conference on Computer and Communications Security , 2013, pp. 499–510

work page 2013

[77] [77]

CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,

W. Tang, M. Tang, M. Ban, Z. Zhao, and M. Feng, “CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,” Journal of Systems and Software, vol. 199, p. 111623, 2023

work page 2023

[78] [78]

Vulnerability detection with deep learning,

F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in IEEE International Conference on Computer and Communications, 2017, pp. 1298–1302

work page 2017

[79] [79]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

X.-C. Wen, Y . Chen, C. Gao, H. Zhang, J. M. Zhang, and Q. Liao, “Vulnerability detection with graph simplification and enhanced graph representation learning,” in IEEE/ACM 45th International Conference on Soft. Eng., 2023, pp. 2275–2286. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00191

work page doi:10.1109/icse48619.2023.00191 2023

[80] [80]

Dataflow analysis-inspired deep learning for efficient vulnerability detection,

B. Steenhoek, H. Gao, and W. Le, “Dataflow analysis-inspired deep learning for efficient vulnerability detection,” in Proceedings of the 46th IEEE/ACM International Conference on Soft. Eng. , 2024, pp. 1–13

work page 2024