pith. sign in

arxiv: 2606.28403 · v1 · pith:R5PPBHYInew · submitted 2026-06-24 · 💻 cs.SE · cs.AI· cs.CR· cs.LG

Reinforcement Learning for Software Vulnerability Analysis: A Systematic Review with Emphasis on C/C++ Source Code and Static Analysis

Pith reviewed 2026-06-30 01:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CRcs.LG
keywords reinforcement learningvulnerability detectionstatic analysisC/C++control flow graphssystematic reviewfuzzing
0
0 comments X

The pith

No existing reinforcement learning agent uses control flow graphs from C/C++ source code as states to detect or localize vulnerabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews twenty-one studies applying reinforcement learning to vulnerability analysis in C/C++ under static analysis settings. Fifteen of those studies target fuzzing or guided exploration, three address direct detection, and only one attempts statement-level localization. Structural representations such as control flow graphs and abstract syntax trees are almost never supplied to the learning agent as its state. The authors organize the literature with a task- and formulation-oriented taxonomy and conclude that the missing formulation is an RL agent whose state is the source-code control flow graph and whose objective is to mark vulnerable nodes.

Core claim

Current RL work on C/C++ vulnerability analysis has not yet produced agents that receive statically extracted control flow graphs as states and are trained to detect or localize vulnerable statements.

What carries the argument

The task- and formulation-oriented taxonomy that groups studies by the security task solved and by the precise definition of state, action, reward, and environment.

If this is right

  • Future RL agents for this domain should be formulated with control flow graphs as the state representation to support node-level localization.
  • Evaluation benchmarks must be constructed so that detection and localization performance can be compared across different state representations.
  • Static analysis pipelines could incorporate RL components that operate directly on graph structures extracted from source code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Graph neural networks would be a natural way to process the control flow graph states inside such an RL agent.
  • The same gap likely exists for other languages whose static representations include explicit control flow graphs.
  • If the gap is closed, the resulting agents could be tested on existing vulnerability datasets to measure improvement over current fuzzing-focused methods.

Load-bearing premise

The PRISMA-guided search located every relevant study and the resulting categorization into tasks and formulations is accurate.

What would settle it

A published RL agent that ingests C/C++ control flow graphs as states and produces per-statement vulnerability labels would falsify the claimed gap.

Figures

Figures reproduced from arXiv: 2606.28403 by Bruno Caro-V\'asquez, Carola Figueroa-Flores, Gast\'on Marquez.

Figure 1
Figure 1. Figure 1: PRISMA flow diagram of the study selection process 3 Results 3.1 Descriptive Analysis A total of 21 primary studies were included. Their publication years span 2019– 2026, peaking at 7 in 2024 and 3 in 2025. This reflects the consolidation of deep RL frameworks and growing interest in modeling security tasks as sequential decision-making, rather than a maturation of source-code vulnerability detec￾tion, as… view at source ↗
read the original abstract

Vulnerability detection in C/C++ software remains a major security challenge due to code complexity, manual memory management, and the limitations of traditional static analysis. Reinforcement Learning (RL) has emerged as a promising approach, particularly for fuzzing, test generation, program exploration, and, more recently, vulnerability detection and localization. Following PRISMA 2020 guidelines, this work reviews RL techniques for software vulnerability analysis, focusing on C/C++ source code and static analysis. We identified 21 primary studies published between 2015 and 2026 from major scientific databases and complementary searches. We analyze the addressed tasks, algorithms, state-action-reward-environment formulations, code representations, datasets, and evaluation metrics. Results show that 15 studies focus on fuzzing and guided exploration, only 3 on direct vulnerability detection, and just 1 on statement-level localization. Moreover, statically extracted structural representations such as Control Flow Graphs (CFGs) and Abstract Syntax Trees (ASTs) are rarely used as agent states, and benchmarks lack comparability. We propose a task- and formulation-oriented taxonomy and identify a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This systematic review follows PRISMA 2020 guidelines to synthesize 21 primary studies (2015–2026) on reinforcement learning for vulnerability analysis in C/C++ source code with emphasis on static analysis. It categorizes the studies by task (15 on fuzzing/exploration, 3 on direct detection, 1 on statement-level localization), examines state-action-reward formulations, code representations (noting rare use of CFGs/ASTs as states), datasets, and metrics, proposes a task- and formulation-oriented taxonomy, and identifies a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.

Significance. If the search completeness and categorization accuracy hold, the review would usefully highlight an underexplored intersection of RL with structural static representations for vulnerability localization, potentially guiding future work. The contribution is limited, however, by the absence of verifiable methodological details that would allow readers to assess whether the asserted gap is robust or an artifact of incomplete coverage.

major comments (2)
  1. [Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.
  2. [Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.
minor comments (2)
  1. [Discussion] The proposed taxonomy is described at a high level but lacks a visual diagram or explicit decision tree that readers could use to classify new work.
  2. [Results] Dataset and metric tables would benefit from explicit column headers indicating whether each entry is drawn from the 21 studies or from external benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that greater methodological transparency is needed to support verification of our search process and categorizations. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.

    Authors: We agree that the exact Boolean search strings and the full PRISMA 2020 flow diagram with exclusion counts are required for readers to assess search completeness and representativeness. In the revised manuscript we will add the precise search queries employed for each database together with the complete PRISMA flow diagram. revision: yes

  2. Referee: [Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.

    Authors: We accept that an explicit mapping is necessary to allow independent verification of the task counts and state-representation claims. The revised version will include a supplementary table (or appendix) that lists each of the 21 studies together with its assigned task category, state-action-reward formulation, and code representation. revision: yes

Circularity Check

0 steps flagged

No circularity: systematic review with no derivations or fitted predictions

full rationale

The paper is a PRISMA-guided literature synthesis that reviews 21 existing studies, categorizes tasks/formulations, and identifies an absence of CFG-based RL agents. No equations, parameters, or predictions are present that could reduce to the paper's own inputs by construction. The gap claim rests on the empirical completeness of the search and categorization accuracy, which are external to any self-referential loop. No self-citation load-bearing steps, ansatzes, or renamings occur. This matches the default non-circular outcome for a review paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a systematic review the central claims rest on the completeness of the database search and the accuracy of manual categorization of the 21 studies; no free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)
  • domain assumption PRISMA 2020 guidelines provide a complete and unbiased method for identifying and synthesizing relevant primary studies
    Invoked in the abstract as the review methodology; standard in the field but not proven within the paper.

pith-pipeline@v0.9.1-grok · 5767 in / 1314 out tokens · 35263 ms · 2026-06-30T01:14:19.619846+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 26 canonical work pages

  1. [1]

    https://doi.org/10.1109/TSE.2021.3087402

    Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402

  2. [2]

    In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)

    Chen, C.: Grey-box fuzzing with deep reinforcement learning and process trace back. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). pp. 1167–1171 (2021). https://doi.org/10.1109/AEMCSE51986.2021.00238

  3. [3]

    In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

    Ding, A., Chan, M., Hass, A., Tippenhauer, N.O., Ma, S., Zonouz, S.: Get your cyber-physical tests done! data-driven vulnerability assessment of robotic aerial vehicles. In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 67–80 (2023). https://doi.org/10.1109/DSN58367.2023.00020

  4. [4]

    In: 2016 IEEE Symposium on Security and Privacy (SP)

    Dolan-Gavitt, B., Hulin, P., Kirda, E., Leek, T., Mambretti, A., Robertson, W.K., Ulrich, F., Whelan, R.: Lava: Large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP). pp. 110–121 (2016). https://doi.org/10.1109/SP.2016.15

  5. [5]

    In: Proceedings of the 17th International Conference on Mining Software Repositories

    Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories. p. 508–512. MSR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379597.3387501 14 Bruno Caro Vásquez, Carola Figu...

  6. [6]

    ACM Comput

    Gomes, D., Felix, E., Aires, F., Vieira, M.: Static code analysis for iot security: A systematic literature review. ACM Comput. Surv.58(3) (Sep 2025). https://doi.org/10.1145/3745019

  7. [7]

    In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT)

    Gong, K., Yang, W., Cui, B., Chen, C.: Drlfcfuzzer: fuzzing with deep-reinforcement-learning under format constraints. In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT). pp. 374–380 (2022). https://doi.org/10.1109/EIECT58010.2022.00080

  8. [8]

    In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

    Götz, R., Sendner, C., Ruck, N., Rostami, M., Dmitrienko, A., Sadeghi, A.R.: Rlfuzz: Accelerating hardware fuzzing with deep reinforcement learning. In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). pp. 358–369 (2025). https://doi.org/10.1109/HOST64725.2025.11050051

  9. [9]

    IEEE Transactions on Industrial Informatics pp

    Huang, K., Yu, Y., Hao, X., Song, J., Li, Y.: Drl-fuzzer: A generative and lightweight approach for modbus vulnerability mining in industrial control systems. IEEE Transactions on Industrial Informatics pp. 1–12 (2026). https://doi.org/10.1109/TII.2026.3688830, early Access

  10. [10]

    In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC)

    Huang, Z., Song, X., Luo, Y., Yang, J., Cui, B.: Syzballer: Kernel fuzzing based on basic block weight and multi-armed bandit. In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC). pp. 2364–2369 (2022). https://doi.org/10.1109/ICCC56324.2022.10065711

  11. [11]

    In: 2024 International Computer Symposium (ICS)

    Jhang, S.W., Huang, S.K.: Multi-argument fuzzing by reinforcement learning. In: 2024 International Computer Symposium (ICS). pp. 101–106 (2024). https://doi.org/10.1109/ICS64339.2024.00026

  12. [12]

    IEEE Transactions on Software Engineering51(10), 2900–2920 (2025)

    Jiang, Y., Qu, Z., Treude, C., Su, X., Wang, T.: Enhancing fine-grained vulnerability detection with reinforcement learning. IEEE Transactions on Software Engineering51(10), 2900–2920 (2025). https://doi.org/10.1109/TSE.2025.3603400

  13. [13]

    In: 2025 IEEE Conference on Dependable and Secure Computing (DSC)

    Khan, H.M.S., Pashiourtides, K., Marnerides, A.K.: Adaptive fuzzing framework for embedded systems vulnerability detection using reinforcement and deep learning. In: 2025 IEEE Conference on Dependable and Secure Computing (DSC). pp. 1–8 (2025). https://doi.org/10.1109/DSC65356.2025.11260865

  14. [14]

    Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering2(01 2007)

  15. [15]

    In: Proc

    Kuznetsov, A., Shapoval, O., Chernov, K., Yeromin, Y., Popova, M., Syniavska, O.: Automated software vulnerability testing using in-depth training methods. In: Proc. 2nd Int. Workshop on Computer Modeling and Intelligent Systems (CMIS-2019). CEUR Workshop Proceedings, vol. 2353. CEUR-WS.org (2019), https://ceur-ws.org/Vol-2353/paper18.pdf

  16. [16]

    In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON)

    Kuznetsov, A., Yeromin, Y., Shapoval, O., Chernov, K., Popova, M., Serdukov, K.: Automated software vulnerability testing using deep learning methods. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON). pp. 837–841 (2019). https://doi.org/10.1109/UKRCON.2019.8879997

  17. [17]

    In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

    Li, L., Ding, S.H.H., Walenstein, A., Charland, P., Fung, B.C.M.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. p. 1215–1225. CIKM ’24, Association for Computing Machinery, New York, NY, US...

  18. [18]

    In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR)

    Liang, X., Xiao, T.: Rlf: Directed fuzzing based on deep reinforcement learning. In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR). pp. 127–133 (2022). https://doi.org/10.1109/MLCR57210.2022.00032 RL for Software Vulnerability Analysis 15

  19. [19]

    In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)

    Miao, S., Wang, J., Zhang, C., Lin, Z., Gong, J., Zhang, X., Li, J.: Deep learning in fuzzing: A literature survey. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). pp. 220–223 (2022). https://doi.org/10.1109/ICETCI55101.2022.9832143

  20. [20]

    https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

    MIT Lincoln Laboratory: Cyber grand challenge corpus. https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

  21. [21]

    https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

    National Institute of Standards and Technology: Juliet test suite for c/c++ version 1.3. https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

  22. [22]

    In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

    Paduraru, C., Paduraru, M., Stefanescu, A.: Optimizing decision making in concolic execution using reinforcement learning. In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). pp. 52–61 (2020). https://doi.org/10.1109/ICSTW50294.2020.00025

  23. [23]

    https://doi.org/10.1109/ICST49551.2021.00055

    Paduraru, C., Paduraru, M., Stefanescu, A.: Riverfuzzrl - an open-source tool to experiment with reinforcement learning for fuzzing (04 2021). https://doi.org/10.1109/ICST49551.2021.00055

  24. [24]

    BMJ372(2021)

    Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J., Grimshaw, J.M., Hróbjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moher, ...

  25. [25]

    IEEE Access 12, 129064–129080 (2024)

    Pham, V.H., Thi Thu Hien, D., Phuc Chuong, N., Thanh Thai, P., The Duy, P.: A coverage-guided fuzzing method for automatic software vulnerability detection using reinforcement learning-enabled multi-level input mutation. IEEE Access 12, 129064–129080 (2024). https://doi.org/10.1109/ACCESS.2024.3421989

  26. [26]

    Automated Software Engineering31(04 2024)

    Ren, Z., Ju, X., Chen, X., Shen, H.: Prorlearn: boosting prompt tuning-based vulnerability detection by reinforcement learning. Automated Software Engineering31(04 2024). https://doi.org/10.1007/s10515-024-00438-9

  27. [27]

    In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest)

    Steenhoek, B., Tufano, M., Sundaresan, N., Svyatkovskiy, A.: Reinforcement learning from automatic feedback for high-quality unit test generation. In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest). p. 37–44. IEEE Press (2025). https://doi.org/10.1109/DeepTest66595.2025.00011

  28. [28]

    https://doi.org/10.14722/ndss.2021.24486

    Wang, J., Song, C., Yin, H.: Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing (01 2021). https://doi.org/10.14722/ndss.2021.24486

  29. [29]

    International Journal of Intelligent Systems2024(1), 7931792 (2024)

    Xie, L., Zhao, Y., Yang, H., Zhao, Z., Hu, Z., Zhang, L., Cheng, X.: Docfuzz: A directed fuzzing method based on a feedback mechanism mutator. International Journal of Intelligent Systems2024(1), 7931792 (2024). https://doi.org/https://doi.org/10.1155/int/7931792, https://onlinelibrary.wiley.com/doi/abs/10.1155/int/7931792

  30. [30]

    In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

    Yu, X., Liang, H., Wang, C.: Multiple targets directed greybox fuzzing: From reachable to exploited. In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 907–917 (2024). https://doi.org/10.1109/SANER60148.2024.00099

  31. [31]

    In: Advances in Neural Information Processing Systems (NeurIPS)

    Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)