pith. sign in

arxiv: 2412.11194 · v2 · submitted 2024-12-15 · 💻 cs.SE · cs.AI

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

Pith reviewed 2026-05-23 06:48 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords automated vulnerability detectionmachine learningsurveypain pointsfeedback loopsdatasetsbinary classificationC/C++
0
0 comments X

The pith

Twelve pain points in ML-based vulnerability detection form self-reinforcing loops that confine research to binary classification of C/C++ functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey of 87 works on machine learning for automated vulnerability detection identifies twelve pain points across the entire pipeline from data to evaluation. These pain points are not isolated but connected through feedback loops involving datasets, problem formulations, baselines, and metrics. The loops keep the field focused on a narrow task of binary classification for C/C++ code at the function level rather than broader goals like predicting vulnerability types or supporting more languages. The authors provide concrete recommendations to break each loop and test their ideas against a recent high-profile effort. If the analysis holds, following the recommendations would allow the field to address more realistic and useful detection problems.

Core claim

Through a systematization of 87 influential works based on problem formulation, input and detection granularity, target languages, metrics, datasets, and approach, the paper identifies twelve pain points that span the ML4AVD pipeline and demonstrates that they are self-reinforcing and causally inter-meshed, with feedback loops between datasets, formulations, baselines, and metrics that explain the persistent concentration on binary classification of C/C++ vulnerabilities at the function level, while pairing each pain point with recommendations to break the loops and using AIxCC as a case study.

What carries the argument

The twelve pain points spanning the ML4AVD pipeline and their causal inter-meshing through feedback loops between datasets, formulations, baselines, and metrics.

If this is right

  • The field optimizes for a narrow and artificial problem that omits vulnerability type prediction.
  • Broader language support beyond C/C++ remains unaddressed.
  • Separation of input granularity from detection granularity is not pursued.
  • Feedback loops between datasets, formulations, baselines, and metrics perpetuate the narrow focus.
  • Concrete recommendations paired with each pain point can break the reinforcing loops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar self-reinforcing loops between data, tasks, and metrics may operate in other machine learning applications to software engineering.
  • The rise of agentic coding frameworks that increase code production volume could make addressing these pain points more urgent.
  • Empirical tests could check whether following the recommendations produces measurable gains in practical detection utility outside the current narrow setting.

Load-bearing premise

The authors' categorization of the 87 works and their interpretation of causal inter-meshing among the pain points accurately reflect the dominant dynamics of the field rather than selection or interpretive artifacts.

What would settle it

A new empirical study that achieves strong, sustained performance gains on vulnerability type prediction, multiple languages beyond C/C++, and separated input versus detection granularity while using non-binary metrics would show the loops do not prevent progress as claimed.

Figures

Figures reproduced from arXiv: 2412.11194 by Chris Hicks, Dan Ristea, Ezzeldin Shereen, Madeleine Dwyer, Sanyam Vyas, Shae McFadden, Vasilios Mavroudis.

Figure 1
Figure 1. Figure 1: Vulnerability analysis pipeline. The flow between detection, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Summary of the article collection and screening process. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of the publication year of included papers after the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The five main components of AVD literature systematized in the paper. Each component is studied in one sub-section of Section [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of commonly-used programming languages accord [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of the most common evaluation metrics used in AVD. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Reported F1-scores by AVD solutions on the three most popular AVD datasets per year. Each point corresponds to one solution evaluated on a dataset. Some solutions are evaluated on multiple datasets. of the performance. Overall, 19 solutions used the F1-score in conjunction with one of the aforementioned datasets. The results of our meta-analysis are shown in [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of research has applied machine learning machine learning to automate vulnerability detection (ML4AVD), yet self-reported performance on the most popular datasets shows no clear upward trend. The ML4AVD research community has identified several flaws in problem formulations, datasets, and metrics, but these are discussed in isolation, leaving the overarching problems that generate and reinforce these flaws unaddressed. We first systematize the field through a survey of 87 influential works based on their problem formulation, input and detection granularity, target programming languages, evaluation metrics, datasets, and detection approach. Drawing on this corpus and prior empirical work, we identify twelve pain points spanning the ML4AVD pipeline and show that they are self-reinforcing and causally inter-meshed: feedback loops between datasets, formulations, baselines, and metrics perpetuate each other and explain the field's persistent concentration on binary classification of C/C++ vulnerabilities at the function level. Thus, the field optimizes for a narrow and artificial problem that omits vulnerability type prediction, broader language support, and separation of input from detection granularity. We pair each pain point with concrete recommendations to break these loops. Finally, we use AIxCC as a case study to assess how well a recent high-profile effort aligns with these recommendations and reflect on the relevance of ML4AVD in the era of agentic AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper surveys 87 influential works in ML4AVD, systematizing them according to problem formulation, input and detection granularity, target programming languages, evaluation metrics, datasets, and detection approach. Drawing on this corpus and prior empirical studies, it identifies twelve pain points spanning the ML4AVD pipeline and argues that they are self-reinforcing and causally inter-meshed via feedback loops between datasets, formulations, baselines, and metrics; this explains the field's persistent concentration on binary classification of C/C++ vulnerabilities at the function level. The paper pairs each pain point with concrete recommendations, uses AIxCC as a case study, and reflects on relevance in the era of agentic AI.

Significance. If the categorization of the 87 works is representative and the inter-meshing interpretation holds, the work offers a significant synthesis that moves beyond isolated critiques of flaws in formulations, datasets, and metrics to a holistic view of systemic issues. The scale of the surveyed corpus and the explicit pairing of pain points with recommendations are strengths that could usefully guide the field toward broader support for vulnerability type prediction, language diversity, and separation of input from detection granularity. The AIxCC case study adds practical relevance.

major comments (2)
  1. [Pain points section (drawing on systematization of 87 works)] The central claim that the twelve pain points are self-reinforcing and causally inter-meshed (feedback loops between datasets, formulations, baselines, and metrics explain the narrow concentration on binary C/C++ function-level classification) is presented as an interpretive synthesis from the 87-work corpus. No quantitative test (citation-graph analysis, temporal co-occurrence statistics, or ablation of the corpus) is provided to establish that altering one element would propagate to others rather than arising from independent causes such as early dataset releases. This is load-bearing for the main thesis.
  2. [Systematization section] The selection criteria for the 87 works and the detailed categorization tables (by problem formulation, granularity, languages, metrics, datasets, approaches) are not described with sufficient explicitness to verify that the identified pain points and their inter-meshing accurately reflect dominant field dynamics rather than selection or interpretive artifacts.
minor comments (2)
  1. [Abstract] Abstract contains a duplicated phrase: 'machine learning machine learning'.
  2. [Title] Minor inconsistencies in capitalization or phrasing in section headings (e.g., 'all of its Pain Points' in the title).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments identify areas where our interpretive synthesis and methodology description can be strengthened. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Pain points section (drawing on systematization of 87 works)] The central claim that the twelve pain points are self-reinforcing and causally inter-meshed (feedback loops between datasets, formulations, baselines, and metrics explain the narrow concentration on binary C/C++ function-level classification) is presented as an interpretive synthesis from the 87-work corpus. No quantitative test (citation-graph analysis, temporal co-occurrence statistics, or ablation of the corpus) is provided to establish that altering one element would propagate to others rather than arising from independent causes such as early dataset releases. This is load-bearing for the main thesis.

    Authors: We agree that the central claim is an interpretive synthesis rather than the result of a new quantitative analysis. The paper integrates patterns observed across the 87 works with prior empirical studies to propose feedback loops as an explanatory framework; it does not claim to have performed citation-graph analysis or ablation studies. To make the supporting evidence more transparent, we will add a subsection that explicitly maps concrete examples from multiple papers in the corpus to each proposed loop. This revision will clarify the basis for the interpretation without converting the survey into a meta-analytic study. revision: partial

  2. Referee: [Systematization section] The selection criteria for the 87 works and the detailed categorization tables (by problem formulation, granularity, languages, metrics, datasets, approaches) are not described with sufficient explicitness to verify that the identified pain points and their inter-meshing accurately reflect dominant field dynamics rather than selection or interpretive artifacts.

    Authors: We accept this criticism. The revised manuscript will include an expanded 'Survey Methodology' subsection that specifies the search databases and keywords, inclusion/exclusion criteria, time bounds, and the categorization procedure (including how disagreements were resolved). The tables will be updated with references to this methodology and brief notes on classification decisions for edge cases. These additions will allow readers to evaluate the corpus selection and categorization directly. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesis from external corpus

full rationale

The paper performs a systematization of 87 external works plus prior empirical studies to identify and interpret twelve pain points. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations reduce the central interpretive claim to quantities defined inside the paper; the concentration on binary C/C++ function-level classification is presented as an observed pattern in the surveyed literature rather than a constructed prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the assumption that the selected 87 works are representative of influential ML4AVD research and that the authors' qualitative analysis correctly identifies causal links among pain points.

axioms (1)
  • domain assumption The 87 influential works selected for the survey are representative of the dominant trends in ML4AVD research.
    The systematization and pain-point identification are derived from this corpus.

pith-pipeline@v0.9.0 · 5842 in / 1248 out tokens · 54107 ms · 2026-05-23T06:48:27.566614+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

177 extracted references · 177 canonical work pages · 5 internal anchors

  1. [1]

    Cyber Grand Challenge,

    DARPA, “Cyber Grand Challenge,” (Accessed 2024-10- 06). [Online]. Available: https://www.darpa.mil/about-us/timeline/ cyber-grand-challenge

  2. [2]

    AI Cyber Challenge (AIxCC),

    ——, “AI Cyber Challenge (AIxCC),” (Accessed 2024-10-03). [Online]. Available: https://aicyberchallenge.com/

  3. [3]

    Flawfinder,

    D. A. Wheeler, “Flawfinder,” (Accessed 2024-09-23). [Online]. Available: https://github.com/david-a-wheeler/flawfinder

  4. [4]

    RATS - rough auditing tool for security,

    A. Dunham, “RATS - rough auditing tool for security,” (Accessed 2024-09-23). [Online]. Available: https://github.com/ andrew-d/rough-auditing-tool-for-security

  5. [5]

    VulDeePecker: A deep learning-based system for vulnerability detection,

    Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y . Zhong, “VulDeePecker: A deep learning-based system for vulnerability detection,” in Network and Distributed System Security Symposium , ser. NDSS 2018. Internet Society, 2018. [Online]. Available: https://doi.org/10.14722/ndss.2018.23158

  6. [6]

    Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,

    Y . Zhou, S. Liu, J. Siow, X. Du, and Y . Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in Neural Information Processing Systems , vol. 32, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/ 49265d2447bc3bbfe9e76306ce40a31f-Abstract.html

  7. [7]

    Liang, Y.-C

    H. Hanif and S. Maffeis, “VulBERTa: Simplified source code pre-training for vulnerability detection,” in IEEE International Joint Conference on Neural Networks , 2022, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IJCNN55064.2022.9892280

  8. [8]

    LineVul: A transformer-based line- level vulnerability prediction,

    M. Fu and C. Tantithamthavorn, “LineVul: A transformer-based line- level vulnerability prediction,” in Proceedings of the International Conference on Mining Software Repositories , 2022, pp. 608–620. [Online]. Available: https://doi.org/10.1145/3524842.3528452

  9. [9]

    Large language model for vulnerability detection: Emerging results and future directions,

    X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” in ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results , 2024, pp. 47–51. [Online]. Available: https://doi.org/10.1145/3639476.3639762

  10. [10]

    An investigation of quality issues in vulnerability detection datasets,

    Y . Guo and S. Bettaieb, “An investigation of quality issues in vulnerability detection datasets,” in IEEE European Symposium on Security and Privacy Workshops , 2023, pp. 29–33

  11. [11]

    Open science in software engineering: A study on deep learning-based vulnerability detection,

    Y . Nong, R. Sharma, A. Hamou-Lhadj, X. Luo, and H. Cai, “Open science in software engineering: A study on deep learning-based vulnerability detection,” IEEE Trans. on Soft. Eng. , vol. 49, no. 4, pp. 1983–2005, 2022

  12. [12]

    Ai cyber risk benchmark: Automated exploitation capabilities,

    D. Ristea, V . Mavroudis, and C. Hicks, “Ai cyber risk benchmark: Automated exploitation capabilities,” arXiv preprint arXiv:2410.21939v2, 2024

  13. [13]

    Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,

    C. Beaman, M. Redbourne, J. D. Mummery, and S. Hakak, “Fuzzing vulnerability discovery techniques: Survey, challenges and future directions,” Computers & Security , vol. 120, p. 102813, 2022

  14. [14]

    Vulnerability-oriented directed fuzzing for binary programs,

    L. Yu, Y . Lu, Y . Shen, Y . Li, and Z. Pan, “Vulnerability-oriented directed fuzzing for binary programs,” Scientific Reports , vol. 12, no. 1, p. 4271, 2022

  15. [15]

    A Survey of Learning-based Automated Program Repair,

    Q. Zhang, C. Fang, Y . Ma, W. Sun, and Z. Chen, “A Survey of Learning-based Automated Program Repair,” ACM Trans. on Software Engineering and Methodology , vol. 33, no. 2, Dec. 2023. [Online]. Available: https://doi.org/10.1145/3631974

  16. [16]

    A theory of condition,

    J. R. Rice, “A theory of condition,” SIAM Journal on Numerical Analysis, vol. 3, no. 2, pp. 287–310, 1966

  17. [17]

    Multilayer feedforward networks are universal approximators,

    K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks , vol. 2, no. 5, pp. 359–366, 1989

  18. [18]

    Ad- dressSanitizer: A fast address sanity checker,

    K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Ad- dressSanitizer: A fast address sanity checker,” inUSENIX ATC 2012, 2012

  19. [19]

    Select—a formal system for testing and debugging programs by symbolic execution,

    R. S. Boyer, B. Elspas, and K. N. Levitt, “Select—a formal system for testing and debugging programs by symbolic execution,” ACM SigPlan Notices, vol. 10, no. 6, pp. 234–245, 1975

  20. [20]

    Distributed representations of words and phrases and their compositionality,

    T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in Neural Information Processing Systems, vol. 26, 2013, code.google.com/p/word2vec. [Online]. Available: https://doi.org/10.5555/2999792.2999959

  21. [21]

    In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP)

    J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Conference on Empirical Methods in Natural Language Processing , 2014, pp. 1532– 1543, https://github.com/stanfordnlp/GloVe. [Online]. Available: https://doi.org/10.3115/v1/D14-1162

  22. [22]

    Unsupervised learning of sentence embeddings using compositional n-gram features,

    M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using compositional n-gram features,” in Conference of the North American Chapter of the Association for Computational Linguistics, 2018, https://github.com/epfml/sent2vec. [Online]. Available: https://doi.org/10.18653/v1/N18-1049

  23. [23]

    Distributed representations of sentences and documents,

    Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International Conference on Machine Learning , 2014, pp. 1188–1196. [Online]. Available: https://proceedings.mlr. press/v32/le14.html

  24. [24]

    Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,

    X. Yi, J. Wu, G. Li, A. K. Bashir, J. Li, and A. A. AlZubi, “Recur- rent semantic learning-driven fast binary vulnerability detection in healthcare cyber physical systems,”IEEE Trans. on Network Science and Engineering, vol. 10, no. 5, pp. 2537–2550, 2022

  25. [25]

    ANTLR (ANother Tool for Language Recognition),

    T. Parr, “ANTLR (ANother Tool for Language Recognition),” (Accessed 2024-10-23). [Online]. Available: https://github.com/ antlr/antlr4

  26. [26]

    astminer,

    JetBrains, “astminer,” (Accessed 2024-10-23). [Online]. Available: https://github.com/JetBrains-Research/astmine

  27. [27]

    Joern: The Bug Hunter’s Workbench,

    joern.io, “Joern: The Bug Hunter’s Workbench,” Jan. 2024, (Accessed 2024-10-23). [Online]. Available: https://github.com/ joernio/joern

  28. [28]

    Learning representations by back-propagating errors,

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986

  29. [29]

    Long short-term memory,

    S. Hochreiter, “Long short-term memory,” Neural Computation MIT- Press, 1997

  30. [30]

    Empirical evaluation of gated recurrent neural networks on sequence modeling,

    J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014

  31. [31]

    Bidirectional recurrent neural net- works,

    M. Schuster and K. Paliwal, “Bidirectional recurrent neural net- works,” IEEE Trans. on Signal Processing , vol. 45, no. 11, pp. 2673–2681, 1997

  32. [32]

    Attention is all you need,

    A. Vaswani, “Attention is all you need,” Advances in Neural Infor- mation Processing Systems , 2017

  33. [33]

    BERT: Pre- training of deep bidirectional transformers for language understand- ing,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of deep bidirectional transformers for language understand- ing,” in North American Chapter of the Association for Computa- tional Linguistics, 2019

  34. [34]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “GPT4 technical report,” arXiv preprint arXiv:2303.08774 , 2023

  35. [35]

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al. , “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020

  36. [36]

    How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,

    I. Rafols, L. Leydesdorff, A. O’Hare, P. Nightingale, and A. Stirling, “How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & manage- ment,” Research policy, vol. 41, no. 7, pp. 1262–1282, 2012. 14

  37. [37]

    Sok: Prudent evaluation practices for fuzzing,

    M. Schloegel, N. Bars, N. Schiller, L. Bernhard, T. Scharnowski, A. Crump, A. Ale-Ebrahim, N. Bissantz, M. Muench, and T. Holz, “Sok: Prudent evaluation practices for fuzzing,” in 2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 1974– 1993

  38. [38]

    Rayyan—a web and mobile app for systematic reviews,

    M. Ouzzani, H. Hammady, Z. Fedorowicz, and A. Elmagarmid, “Rayyan—a web and mobile app for systematic reviews,” Systematic Reviews, vol. 5, no. 1, p. 210, 2016

  39. [39]

    Citation analysis as a tool in journal evaluation,

    E. Garfield, “Citation analysis as a tool in journal evaluation,” Science, vol. 178, no. 4060, pp. 471–479, 1972. [Online]. Available: https://www.science.org/doi/abs/10.1126/science.178.4060.471

  40. [40]

    Citation analysis of computer systems papers,

    E. Frachtenberg, “Citation analysis of computer systems papers,” PeerJ Computer science, vol. 9, p. e1389, 2023

  41. [41]

    Toward large-scale vulnerability discovery using machine learning,

    G. Grieco, G. L. Grinblat, L. Uzal, S. Rawat, J. Feist, and L. Mounier, “Toward large-scale vulnerability discovery using machine learning,” in Proceedings of the sixth ACM Conference on Data and Application Security and Privacy , 2016, pp. 85–96. [Online]. Available: https://doi.org/10.1145/2857705.2857720

  42. [42]

    Static detection of vulnerabilities in x86 executables,

    M. Cova, V . Felmetsger, G. Banks, and G. Vigna, “Static detection of vulnerabilities in x86 executables,” in IEEE Annual Computer Security Applications Conference , 2006, pp. 269–278. [Online]. Available: https://doi.org/10.1109/ACSAC.2006.50

  43. [43]

    HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,

    H. Yan, S. Luo, L. Pan, and Y . Zhang, “HAN-BSVD: A hierarchical attention network for binary software vulnerability detection,” Computers & Security , vol. 108, p. 102286, 2021. [Online]. Available: https://doi.org/10.1016/j.cose.2022.103023

  44. [44]

    Inputs of coma: Static detection of denial-of- service vulnerabilities,

    R. Chang, G. Jiang, F. Ivancic, S. Sankaranarayanan, and V . Shmatikov, “Inputs of coma: Static detection of denial-of- service vulnerabilities,” in IEEE Computer Security Foundations Symposium, 2009, pp. 186–199. [Online]. Available: https: //doi.org/10.1109/CSF.2009.13

  45. [45]

    Code-centric learning-based just-in-time vulnerability detection,

    S. Nguyen, T.-T. Nguyen, T. T. Vu, T.-D. Do, K.-T. Ngo, and H. D. V o, “Code-centric learning-based just-in-time vulnerability detection,” Journal of Systems and Software , vol. 214, p. 112014,

  46. [46]

    Available: https://doi.org/10.1016/j.jss.2024.112014

    [Online]. Available: https://doi.org/10.1016/j.jss.2024.112014

  47. [47]

    Static detection of cross-site scripting vulnerabilities,

    G. Wassermann and Z. Su, “Static detection of cross-site scripting vulnerabilities,” in International Conference on Software Engineering, 2008, pp. 171–180. [Online]. Available: https: //doi.org/10.1145/1368088.1368112

  48. [48]

    ReDeBug: Finding unpatched code clones in entire OS distributions,

    J. Jang, A. Agrawal, and D. Brumley, “ReDeBug: Finding unpatched code clones in entire OS distributions,” in IEEE Symposium on Security and Privacy , 2012, pp. 48–62. [Online]. Available: https://doi.org/10.1109/SP.2012.13

  49. [49]

    LLbezpeky: Leveraging large language models for vulnerability detection,

    N. S. Mathews, Y . Brus, Y . Aafer, M. Nagappan, and S. McIntosh, “LLbezpeky: Leveraging large language models for vulnerability detection,” arXiv preprint arXiv:2401.01269 , 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2401.01269

  50. [50]

    SySeVR: A framework for using deep learning to detect software vulnerabilities,

    Z. Li, D. Zou, S. Xu, H. Jin, Y . Zhu, and Z. Chen, “SySeVR: A framework for using deep learning to detect software vulnerabilities,” IEEE Trans. on Dependable and Secure Computing, vol. 19, no. 4, p. 2244–2258, Jul. 2022. [Online]. Available: https://doi.org/10.1109/TDSC.2021.3051525

  51. [51]

    DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,

    X. Cheng, H. Wang, J. Hua, G. Xu, and Y . Sui, “DeepWukong: Statically detecting software vulnerabilities using deep graph neural network,” ACM Trans. on Software Engineering and Methodology, vol. 30, no. 3, pp. 1–33, 2021. [Online]. Available: https://doi.org/10.1145/3436877

  52. [52]

    Vu1SPG: Vulnerability detection based on slice property graph representation learning,

    W. Zheng, Y . Jiang, and X. Su, “Vu1SPG: Vulnerability detection based on slice property graph representation learning,” in IEEE International Symposium on Software Reliability Engineering, 2021, pp. 457–467

  53. [53]

    VulDeBERT: A vulnerability detection system using BERT,

    S. Kim, J. Choi, M. E. Ahmed, S. Nepal, and H. Kim, “VulDeBERT: A vulnerability detection system using BERT,” in IEEE Interna- tional Symposium on Software Reliability Engineering Workshops , 2022, pp. 69–74

  54. [54]

    Example- based vulnerability detection and repair in java code,

    Y . Zhang, Y . Xiao, M. M. A. Kabir, D. Yao, and N. Meng, “Example- based vulnerability detection and repair in java code,” in Proceed- ings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022, pp. 190–201

  55. [55]

    VulSlicer: Vulnerability detection through code slicing,

    S. Salimi and M. Kharrazi, “VulSlicer: Vulnerability detection through code slicing,” Journal of Systems and Software , vol. 193, p. 111450, 2022

  56. [56]

    Deep learning based vulnerability detection: Are we there yet?

    S. Chakraborty, R. Krishna, Y . Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” IEEE Trans. on Soft. Eng., vol. 48, no. 9, pp. 3280–3296, 2021. [Online]. Available: https://doi.org/10.1109/TSE.2021.3087402

  57. [57]

    Automated vulnerability detection in source code using deep representation learning,

    R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and M. McConley, “Automated vulnerability detection in source code using deep representation learning,” in IEEE International Conference on Machine Learning and Applications , 2018, pp. 757–762. [Online]. Available: https: //doi.org/10.1109/ICMLA.2018.00120

  58. [58]

    Combining graph-based learning with automated data collection for code vulnerability detection,

    H. Wang, G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, Y . Feng, L. Bian, and Z. Wang, “Combining graph-based learning with automated data collection for code vulnerability detection,” IEEE Trans. on Information Forensics and Security , vol. 16, pp. 1943–1958, 2020. [Online]. Available: https: //doi.org/10.1109/TIFS.2020.3044773

  59. [59]

    Vulnerability detection with fine-grained interpretations,

    Y . Li, S. Wang, and T. N. Nguyen, “Vulnerability detection with fine-grained interpretations,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , 2021, pp. 292–303. [Online]. Available: https://doi.org/10.1145/3468264. 3468597

  60. [60]

    VUDDY: A scalable approach for vulnerable code clone discovery,

    S. Kim, S. Woo, H. Lee, and H. Oh, “VUDDY: A scalable approach for vulnerable code clone discovery,” in Symposium on Security and Privacy , 2017, pp. 595–614. [Online]. Available: https://doi.org/10.1109/SP.2017.62

  61. [61]

    Prompt- enhanced software vulnerability detection using ChatGPT,

    C. Zhang, H. Liu, J. Zeng, K. Yang, Y . Li, and H. Li, “Prompt- enhanced software vulnerability detection using ChatGPT,” in IEEE/ACM International Conference on Software Engineering: Companion Proceedings , 2024, pp. 276–277. [Online]. Available: https://doi.org/10.1145/3639478.3643065

  62. [62]

    BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,

    S. Cao, X. Sun, L. Bo, Y . Wei, and B. Li, “BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection,” Information and Software Technology , vol. 136, p. 106576, 2021. [Online]. Available: https://doi.org/10.1016/j.infsof.2021.106576

  63. [63]

    MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,

    S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “MVD: memory- related vulnerability detection based on flow-sensitive graph neural networks,” in International Conference on Software Engineering , 2022, pp. 1456–1468

  64. [64]

    VulCNN: An image-inspired scalable vulnerability detection system,

    Y . Wu, D. Zou, S. Dou, W. Yang, D. Xu, and H. Jin, “VulCNN: An image-inspired scalable vulnerability detection system,” in International Conference on Software Engineering, 2022, pp. 2365–

  65. [65]

    Available: https://doi.org/10.1145/3510003.3510229

    [Online]. Available: https://doi.org/10.1145/3510003.3510229

  66. [66]

    ReGVD: Revisiting graph neural networks for vulnerability detection,

    V .-A. Nguyen, D. Q. Nguyen, V . Nguyen, T. Le, Q. H. Tran, and D. Phung, “ReGVD: Revisiting graph neural networks for vulnerability detection,” in ACM/IEEE International Conference on Software Engineering: Companion Proceedings, 2022, pp. 178–182

  67. [67]

    Automated software vulnerability detection with machine learning

    J. A. Harer, L. Y . Kim, R. L. Russell, O. Ozdemir, L. R. Kosta, A. Rangamani, L. H. Hamilton, G. I. Centeno, J. R. Key, P. M. Ellingwood et al. , “Automated software vulnerability detection with machine learning,” arXiv preprint arXiv:1803.04497 , 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1803.04497

  68. [68]

    Path-sensitive code embedding via contrastive learning for software vulnerability detection,

    X. Cheng, G. Zhang, H. Wang, and Y . Sui, “Path-sensitive code embedding via contrastive learning for software vulnerability detection,” in ACM International Symposium on Software Testing and Analysis , 2022, pp. 519–531. [Online]. Available: https: //doi.org/10.1145/3533767.3534371 15

  69. [69]

    Software vulnerability discovery via learning multi-domain knowledge bases,

    G. Lin, J. Zhang, W. Luo, L. Pan, O. De Vel, P. Montague, and Y . Xiang, “Software vulnerability discovery via learning multi-domain knowledge bases,” IEEE Trans. on Dependable and Secure Computing, vol. 18, no. 5, pp. 2469–2485, 2019. [Online]. Available: https://doi.org/10.1109/TDSC.2019.2954088

  70. [70]

    CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,

    S. Liu, G. Lin, L. Qu, J. Zhang, O. De Vel, P. Montague, and Y . Xiang, “CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation,” IEEE Trans. on Dependable and Secure Computing , vol. 19, no. 1, pp. 438–451, 2020. [Online]. Available: https://doi.org/10.1109/TDSC.2020.2984505

  71. [71]

    Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,

    X. Du, B. Chen, Y . Li, J. Guo, Y . Zhou, Y . Liu, and Y . Jiang, “Leopard: Identifying vulnerable code for vulnerability assessment through program metrics,” in IEEE/ACM International Conference on Soft. Eng. , 2019, pp. 60–71

  72. [72]

    Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,

    S. Liu, G. Lin, Q.-L. Han, S. Wen, J. Zhang, and Y . Xiang, “Deep- balance: Deep-learning and fuzzy oversampling for vulnerability detection,” IEEE Trans. on Fuzzy Systems , vol. 28, no. 7, pp. 1329– 1343, 2019

  73. [73]

    POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,

    G. Lin, J. Zhang, W. Luo, L. Pan, and Y . Xiang, “POSTER: Vulnera- bility discovery with function representation learning from unlabeled projects,” in ACM Conference on Computer and Communications Security, 2017, pp. 2539–2541

  74. [74]

    MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,

    Y . Xiao, B. Chen, C. Yu, Z. Xu, Z. Yuan, F. Li, B. Liu, Y . Liu, W. Huo, W. Zou, and W. Shi, “MVP: Detecting vulnerabilities using Patch-Enhanced vulnerability signatures,” in 29th USENIX Security Symposium . USENIX Association, Aug. 2020, pp. 1165–1182. [Online]. Available: https://www.usenix.org/conference/ usenixsecurity20/presentation/xiao

  75. [75]

    GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,

    G. Lu, X. Ju, X. Chen, W. Pei, and Z. Cai, “GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning,” Journal of Systems and Software , vol. 212, p. 112031, 2024. [Online]. Available: https://doi.org/10.1016/j.jss.2024.112031

  76. [76]

    Chucky: Exposing missing checks in source code for vulnerability discovery,

    F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck, “Chucky: Exposing missing checks in source code for vulnerability discovery,” in ACM Conference on Computer and Communications Security , 2013, pp. 499–510

  77. [77]

    CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,

    W. Tang, M. Tang, M. Ban, Z. Zhao, and M. Feng, “CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection,” Journal of Systems and Software, vol. 199, p. 111623, 2023

  78. [78]

    Vulnerability detection with deep learning,

    F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in IEEE International Conference on Computer and Communications, 2017, pp. 1298–1302

  79. [79]

    In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

    X.-C. Wen, Y . Chen, C. Gao, H. Zhang, J. M. Zhang, and Q. Liao, “Vulnerability detection with graph simplification and enhanced graph representation learning,” in IEEE/ACM 45th International Conference on Soft. Eng., 2023, pp. 2275–2286. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00191

  80. [80]

    Dataflow analysis-inspired deep learning for efficient vulnerability detection,

    B. Steenhoek, H. Gao, and W. Le, “Dataflow analysis-inspired deep learning for efficient vulnerability detection,” in Proceedings of the 46th IEEE/ACM International Conference on Soft. Eng. , 2024, pp. 1–13

Showing first 80 references.