Reinforcement Learning for Software Vulnerability Analysis: A Systematic Review with Emphasis on C/C++ Source Code and Static Analysis
Pith reviewed 2026-06-30 01:14 UTC · model grok-4.3
The pith
No existing reinforcement learning agent uses control flow graphs from C/C++ source code as states to detect or localize vulnerabilities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Current RL work on C/C++ vulnerability analysis has not yet produced agents that receive statically extracted control flow graphs as states and are trained to detect or localize vulnerable statements.
What carries the argument
The task- and formulation-oriented taxonomy that groups studies by the security task solved and by the precise definition of state, action, reward, and environment.
If this is right
- Future RL agents for this domain should be formulated with control flow graphs as the state representation to support node-level localization.
- Evaluation benchmarks must be constructed so that detection and localization performance can be compared across different state representations.
- Static analysis pipelines could incorporate RL components that operate directly on graph structures extracted from source code.
Where Pith is reading between the lines
- Graph neural networks would be a natural way to process the control flow graph states inside such an RL agent.
- The same gap likely exists for other languages whose static representations include explicit control flow graphs.
- If the gap is closed, the resulting agents could be tested on existing vulnerability datasets to measure improvement over current fuzzing-focused methods.
Load-bearing premise
The PRISMA-guided search located every relevant study and the resulting categorization into tasks and formulations is accurate.
What would settle it
A published RL agent that ingests C/C++ control flow graphs as states and produces per-statement vulnerability labels would falsify the claimed gap.
Figures
read the original abstract
Vulnerability detection in C/C++ software remains a major security challenge due to code complexity, manual memory management, and the limitations of traditional static analysis. Reinforcement Learning (RL) has emerged as a promising approach, particularly for fuzzing, test generation, program exploration, and, more recently, vulnerability detection and localization. Following PRISMA 2020 guidelines, this work reviews RL techniques for software vulnerability analysis, focusing on C/C++ source code and static analysis. We identified 21 primary studies published between 2015 and 2026 from major scientific databases and complementary searches. We analyze the addressed tasks, algorithms, state-action-reward-environment formulations, code representations, datasets, and evaluation metrics. Results show that 15 studies focus on fuzzing and guided exploration, only 3 on direct vulnerability detection, and just 1 on statement-level localization. Moreover, statically extracted structural representations such as Control Flow Graphs (CFGs) and Abstract Syntax Trees (ASTs) are rarely used as agent states, and benchmarks lack comparability. We propose a task- and formulation-oriented taxonomy and identify a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This systematic review follows PRISMA 2020 guidelines to synthesize 21 primary studies (2015–2026) on reinforcement learning for vulnerability analysis in C/C++ source code with emphasis on static analysis. It categorizes the studies by task (15 on fuzzing/exploration, 3 on direct detection, 1 on statement-level localization), examines state-action-reward formulations, code representations (noting rare use of CFGs/ASTs as states), datasets, and metrics, proposes a task- and formulation-oriented taxonomy, and identifies a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.
Significance. If the search completeness and categorization accuracy hold, the review would usefully highlight an underexplored intersection of RL with structural static representations for vulnerability localization, potentially guiding future work. The contribution is limited, however, by the absence of verifiable methodological details that would allow readers to assess whether the asserted gap is robust or an artifact of incomplete coverage.
major comments (2)
- [Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.
- [Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.
minor comments (2)
- [Discussion] The proposed taxonomy is described at a high level but lacks a visual diagram or explicit decision tree that readers could use to classify new work.
- [Results] Dataset and metric tables would benefit from explicit column headers indicating whether each entry is drawn from the 21 studies or from external benchmarks.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that greater methodological transparency is needed to support verification of our search process and categorizations. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.
Authors: We agree that the exact Boolean search strings and the full PRISMA 2020 flow diagram with exclusion counts are required for readers to assess search completeness and representativeness. In the revised manuscript we will add the precise search queries employed for each database together with the complete PRISMA flow diagram. revision: yes
-
Referee: [Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.
Authors: We accept that an explicit mapping is necessary to allow independent verification of the task counts and state-representation claims. The revised version will include a supplementary table (or appendix) that lists each of the 21 studies together with its assigned task category, state-action-reward formulation, and code representation. revision: yes
Circularity Check
No circularity: systematic review with no derivations or fitted predictions
full rationale
The paper is a PRISMA-guided literature synthesis that reviews 21 existing studies, categorizes tasks/formulations, and identifies an absence of CFG-based RL agents. No equations, parameters, or predictions are present that could reduce to the paper's own inputs by construction. The gap claim rests on the empirical completeness of the search and categorization accuracy, which are external to any self-referential loop. No self-citation load-bearing steps, ansatzes, or renamings occur. This matches the default non-circular outcome for a review paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption PRISMA 2020 guidelines provide a complete and unbiased method for identifying and synthesizing relevant primary studies
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.1109/TSE.2021.3087402
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402
-
[2]
Chen, C.: Grey-box fuzzing with deep reinforcement learning and process trace back. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). pp. 1167–1171 (2021). https://doi.org/10.1109/AEMCSE51986.2021.00238
-
[3]
In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Ding, A., Chan, M., Hass, A., Tippenhauer, N.O., Ma, S., Zonouz, S.: Get your cyber-physical tests done! data-driven vulnerability assessment of robotic aerial vehicles. In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 67–80 (2023). https://doi.org/10.1109/DSN58367.2023.00020
-
[4]
In: 2016 IEEE Symposium on Security and Privacy (SP)
Dolan-Gavitt, B., Hulin, P., Kirda, E., Leek, T., Mambretti, A., Robertson, W.K., Ulrich, F., Whelan, R.: Lava: Large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP). pp. 110–121 (2016). https://doi.org/10.1109/SP.2016.15
-
[5]
In: Proceedings of the 17th International Conference on Mining Software Repositories
Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories. p. 508–512. MSR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379597.3387501 14 Bruno Caro Vásquez, Carola Figu...
-
[6]
Gomes, D., Felix, E., Aires, F., Vieira, M.: Static code analysis for iot security: A systematic literature review. ACM Comput. Surv.58(3) (Sep 2025). https://doi.org/10.1145/3745019
-
[7]
Gong, K., Yang, W., Cui, B., Chen, C.: Drlfcfuzzer: fuzzing with deep-reinforcement-learning under format constraints. In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT). pp. 374–380 (2022). https://doi.org/10.1109/EIECT58010.2022.00080
-
[8]
In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)
Götz, R., Sendner, C., Ruck, N., Rostami, M., Dmitrienko, A., Sadeghi, A.R.: Rlfuzz: Accelerating hardware fuzzing with deep reinforcement learning. In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). pp. 358–369 (2025). https://doi.org/10.1109/HOST64725.2025.11050051
-
[9]
IEEE Transactions on Industrial Informatics pp
Huang, K., Yu, Y., Hao, X., Song, J., Li, Y.: Drl-fuzzer: A generative and lightweight approach for modbus vulnerability mining in industrial control systems. IEEE Transactions on Industrial Informatics pp. 1–12 (2026). https://doi.org/10.1109/TII.2026.3688830, early Access
-
[10]
In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC)
Huang, Z., Song, X., Luo, Y., Yang, J., Cui, B.: Syzballer: Kernel fuzzing based on basic block weight and multi-armed bandit. In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC). pp. 2364–2369 (2022). https://doi.org/10.1109/ICCC56324.2022.10065711
-
[11]
In: 2024 International Computer Symposium (ICS)
Jhang, S.W., Huang, S.K.: Multi-argument fuzzing by reinforcement learning. In: 2024 International Computer Symposium (ICS). pp. 101–106 (2024). https://doi.org/10.1109/ICS64339.2024.00026
-
[12]
IEEE Transactions on Software Engineering51(10), 2900–2920 (2025)
Jiang, Y., Qu, Z., Treude, C., Su, X., Wang, T.: Enhancing fine-grained vulnerability detection with reinforcement learning. IEEE Transactions on Software Engineering51(10), 2900–2920 (2025). https://doi.org/10.1109/TSE.2025.3603400
-
[13]
In: 2025 IEEE Conference on Dependable and Secure Computing (DSC)
Khan, H.M.S., Pashiourtides, K., Marnerides, A.K.: Adaptive fuzzing framework for embedded systems vulnerability detection using reinforcement and deep learning. In: 2025 IEEE Conference on Dependable and Secure Computing (DSC). pp. 1–8 (2025). https://doi.org/10.1109/DSC65356.2025.11260865
-
[14]
Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering2(01 2007)
2007
-
[15]
In: Proc
Kuznetsov, A., Shapoval, O., Chernov, K., Yeromin, Y., Popova, M., Syniavska, O.: Automated software vulnerability testing using in-depth training methods. In: Proc. 2nd Int. Workshop on Computer Modeling and Intelligent Systems (CMIS-2019). CEUR Workshop Proceedings, vol. 2353. CEUR-WS.org (2019), https://ceur-ws.org/Vol-2353/paper18.pdf
2019
-
[16]
In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON)
Kuznetsov, A., Yeromin, Y., Shapoval, O., Chernov, K., Popova, M., Serdukov, K.: Automated software vulnerability testing using deep learning methods. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON). pp. 837–841 (2019). https://doi.org/10.1109/UKRCON.2019.8879997
-
[17]
In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
Li, L., Ding, S.H.H., Walenstein, A., Charland, P., Fung, B.C.M.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. p. 1215–1225. CIKM ’24, Association for Computing Machinery, New York, NY, US...
-
[18]
In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR)
Liang, X., Xiao, T.: Rlf: Directed fuzzing based on deep reinforcement learning. In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR). pp. 127–133 (2022). https://doi.org/10.1109/MLCR57210.2022.00032 RL for Software Vulnerability Analysis 15
-
[19]
Miao, S., Wang, J., Zhang, C., Lin, Z., Gong, J., Zhang, X., Li, J.: Deep learning in fuzzing: A literature survey. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). pp. 220–223 (2022). https://doi.org/10.1109/ICETCI55101.2022.9832143
-
[20]
https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01
MIT Lincoln Laboratory: Cyber grand challenge corpus. https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01
2017
-
[21]
https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)
National Institute of Standards and Technology: Juliet test suite for c/c++ version 1.3. https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)
2017
-
[22]
Paduraru, C., Paduraru, M., Stefanescu, A.: Optimizing decision making in concolic execution using reinforcement learning. In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). pp. 52–61 (2020). https://doi.org/10.1109/ICSTW50294.2020.00025
-
[23]
https://doi.org/10.1109/ICST49551.2021.00055
Paduraru, C., Paduraru, M., Stefanescu, A.: Riverfuzzrl - an open-source tool to experiment with reinforcement learning for fuzzing (04 2021). https://doi.org/10.1109/ICST49551.2021.00055
-
[24]
Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J., Grimshaw, J.M., Hróbjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moher, ...
-
[25]
IEEE Access 12, 129064–129080 (2024)
Pham, V.H., Thi Thu Hien, D., Phuc Chuong, N., Thanh Thai, P., The Duy, P.: A coverage-guided fuzzing method for automatic software vulnerability detection using reinforcement learning-enabled multi-level input mutation. IEEE Access 12, 129064–129080 (2024). https://doi.org/10.1109/ACCESS.2024.3421989
-
[26]
Automated Software Engineering31(04 2024)
Ren, Z., Ju, X., Chen, X., Shen, H.: Prorlearn: boosting prompt tuning-based vulnerability detection by reinforcement learning. Automated Software Engineering31(04 2024). https://doi.org/10.1007/s10515-024-00438-9
-
[27]
Steenhoek, B., Tufano, M., Sundaresan, N., Svyatkovskiy, A.: Reinforcement learning from automatic feedback for high-quality unit test generation. In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest). p. 37–44. IEEE Press (2025). https://doi.org/10.1109/DeepTest66595.2025.00011
-
[28]
https://doi.org/10.14722/ndss.2021.24486
Wang, J., Song, C., Yin, H.: Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing (01 2021). https://doi.org/10.14722/ndss.2021.24486
-
[29]
International Journal of Intelligent Systems2024(1), 7931792 (2024)
Xie, L., Zhao, Y., Yang, H., Zhao, Z., Hu, Z., Zhang, L., Cheng, X.: Docfuzz: A directed fuzzing method based on a feedback mechanism mutator. International Journal of Intelligent Systems2024(1), 7931792 (2024). https://doi.org/https://doi.org/10.1155/int/7931792, https://onlinelibrary.wiley.com/doi/abs/10.1155/int/7931792
-
[30]
In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Yu, X., Liang, H., Wang, C.: Multiple targets directed greybox fuzzing: From reachable to exploited. In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 907–917 (2024). https://doi.org/10.1109/SANER60148.2024.00099
-
[31]
In: Advances in Neural Information Processing Systems (NeurIPS)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.