Reinforcement Learning for Software Vulnerability Analysis: A Systematic Review with Emphasis on C/C++ Source Code and Static Analysis

Bruno Caro-V\'asquez; Carola Figueroa-Flores; Gast\'on Marquez

arxiv: 2606.28403 · v1 · pith:R5PPBHYInew · submitted 2026-06-24 · 💻 cs.SE · cs.AI· cs.CR· cs.LG

Reinforcement Learning for Software Vulnerability Analysis: A Systematic Review with Emphasis on C/C++ Source Code and Static Analysis

Bruno Caro-V\'asquez , Carola Figueroa-Flores , Gast\'on Marquez This is my paper

Pith reviewed 2026-06-30 01:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CRcs.LG

keywords reinforcement learningvulnerability detectionstatic analysisC/C++control flow graphssystematic reviewfuzzing

0 comments

The pith

No existing reinforcement learning agent uses control flow graphs from C/C++ source code as states to detect or localize vulnerabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews twenty-one studies applying reinforcement learning to vulnerability analysis in C/C++ under static analysis settings. Fifteen of those studies target fuzzing or guided exploration, three address direct detection, and only one attempts statement-level localization. Structural representations such as control flow graphs and abstract syntax trees are almost never supplied to the learning agent as its state. The authors organize the literature with a task- and formulation-oriented taxonomy and conclude that the missing formulation is an RL agent whose state is the source-code control flow graph and whose objective is to mark vulnerable nodes.

Core claim

Current RL work on C/C++ vulnerability analysis has not yet produced agents that receive statically extracted control flow graphs as states and are trained to detect or localize vulnerable statements.

What carries the argument

The task- and formulation-oriented taxonomy that groups studies by the security task solved and by the precise definition of state, action, reward, and environment.

If this is right

Future RL agents for this domain should be formulated with control flow graphs as the state representation to support node-level localization.
Evaluation benchmarks must be constructed so that detection and localization performance can be compared across different state representations.
Static analysis pipelines could incorporate RL components that operate directly on graph structures extracted from source code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Graph neural networks would be a natural way to process the control flow graph states inside such an RL agent.
The same gap likely exists for other languages whose static representations include explicit control flow graphs.
If the gap is closed, the resulting agents could be tested on existing vulnerability datasets to measure improvement over current fuzzing-focused methods.

Load-bearing premise

The PRISMA-guided search located every relevant study and the resulting categorization into tasks and formulations is accurate.

What would settle it

A published RL agent that ingests C/C++ control flow graphs as states and produces per-statement vulnerability labels would falsify the claimed gap.

Figures

Figures reproduced from arXiv: 2606.28403 by Bruno Caro-V\'asquez, Carola Figueroa-Flores, Gast\'on Marquez.

**Figure 1.** Figure 1: PRISMA flow diagram of the study selection process 3 Results 3.1 Descriptive Analysis A total of 21 primary studies were included. Their publication years span 2019– 2026, peaking at 7 in 2024 and 3 in 2025. This reflects the consolidation of deep RL frameworks and growing interest in modeling security tasks as sequential decision-making, rather than a maturation of source-code vulnerability detection, as… view at source ↗

read the original abstract

Vulnerability detection in C/C++ software remains a major security challenge due to code complexity, manual memory management, and the limitations of traditional static analysis. Reinforcement Learning (RL) has emerged as a promising approach, particularly for fuzzing, test generation, program exploration, and, more recently, vulnerability detection and localization. Following PRISMA 2020 guidelines, this work reviews RL techniques for software vulnerability analysis, focusing on C/C++ source code and static analysis. We identified 21 primary studies published between 2015 and 2026 from major scientific databases and complementary searches. We analyze the addressed tasks, algorithms, state-action-reward-environment formulations, code representations, datasets, and evaluation metrics. Results show that 15 studies focus on fuzzing and guided exploration, only 3 on direct vulnerability detection, and just 1 on statement-level localization. Moreover, statically extracted structural representations such as Control Flow Graphs (CFGs) and Abstract Syntax Trees (ASTs) are rarely used as agent states, and benchmarks lack comparability. We propose a task- and formulation-oriented taxonomy and identify a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A standard PRISMA review that maps RL work on C/C++ vulns and flags the missing CFG-state agents, but the gap claim stands or falls on unshown search and categorization details.

read the letter

This is a systematic review of reinforcement learning applied to vulnerability analysis in C/C++ source code, limited to static analysis approaches. It follows PRISMA 2020, pulls 21 studies from 2015-2026, and sorts them by task: 15 on fuzzing or exploration, 3 on direct detection, and 1 on statement-level localization. The authors also tabulate state-action-reward setups, code representations, datasets, and metrics, then offer a task-and-formulation taxonomy. The clearest takeaway is their observation that CFGs and ASTs are seldom used as agent states and that no work yet treats CFG nodes directly for localization.

What the paper does cleanly is lay out the distribution of existing efforts and name the structural-state gap in plain terms. That organization can save someone new to the area a few days of reading. The counts and breakdowns are the main deliverable.

The soft spot is exactly where the stress test points: the gap rests on the search having been complete within the stated scope and on the 21 papers having been classified correctly on state representation. The manuscript reports following PRISMA but does not reproduce the search strings or exclusion logs here, so an independent check of coverage is not possible from the text. If even one or two relevant papers were missed or mis-sorted, the “absence” claim shrinks. The note that benchmarks lack comparability is also left at a high level without a deeper comparison table or reproducibility audit.

The work is for researchers who need a current map of RL-for-vuln papers rather than a new algorithm or dataset. It does not contain original experiments or derivations. A serious editor should send it to peer review because the synthesis is narrow enough to be checkable and the gap statement, if it survives referee scrutiny on the search, is actionable for the subfield.

Referee Report

2 major / 2 minor

Summary. This systematic review follows PRISMA 2020 guidelines to synthesize 21 primary studies (2015–2026) on reinforcement learning for vulnerability analysis in C/C++ source code with emphasis on static analysis. It categorizes the studies by task (15 on fuzzing/exploration, 3 on direct detection, 1 on statement-level localization), examines state-action-reward formulations, code representations (noting rare use of CFGs/ASTs as states), datasets, and metrics, proposes a task- and formulation-oriented taxonomy, and identifies a key research gap: the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.

Significance. If the search completeness and categorization accuracy hold, the review would usefully highlight an underexplored intersection of RL with structural static representations for vulnerability localization, potentially guiding future work. The contribution is limited, however, by the absence of verifiable methodological details that would allow readers to assess whether the asserted gap is robust or an artifact of incomplete coverage.

major comments (2)

[Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.
[Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.

minor comments (2)

[Discussion] The proposed taxonomy is described at a high level but lacks a visual diagram or explicit decision tree that readers could use to classify new work.
[Results] Dataset and metric tables would benefit from explicit column headers indicating whether each entry is drawn from the 21 studies or from external benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that greater methodological transparency is needed to support verification of our search process and categorizations. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Methodology] Methodology (search strategy subsection): The manuscript claims adherence to PRISMA 2020 and reports identifying 21 studies from major databases plus complementary searches, but provides neither the exact Boolean search strings nor the full PRISMA flow diagram with exclusion counts at each stage. This omission directly undermines evaluation of whether the set is representative within the stated scope (C/C++ source code, static analysis, RL for vulnerability analysis), which is load-bearing for the central research-gap claim.

Authors: We agree that the exact Boolean search strings and the full PRISMA 2020 flow diagram with exclusion counts are required for readers to assess search completeness and representativeness. In the revised manuscript we will add the precise search queries employed for each database together with the complete PRISMA flow diagram. revision: yes
Referee: [Results] Results (task categorization and state-representation analysis): The breakdown (15 fuzzing, 3 detection, 1 localization) and the statement that CFGs/ASTs are rarely used as agent states rest on an unlisted mapping of the 21 studies. Without an appendix or table that assigns each study to its task, formulation, and state representation, it is impossible to verify the accuracy of these counts or to check for misclassifications that would falsify the gap.

Authors: We accept that an explicit mapping is necessary to allow independent verification of the task counts and state-representation claims. The revised version will include a supplementary table (or appendix) that lists each of the 21 studies together with its assigned task category, state-action-reward formulation, and code representation. revision: yes

Circularity Check

0 steps flagged

No circularity: systematic review with no derivations or fitted predictions

full rationale

The paper is a PRISMA-guided literature synthesis that reviews 21 existing studies, categorizes tasks/formulations, and identifies an absence of CFG-based RL agents. No equations, parameters, or predictions are present that could reduce to the paper's own inputs by construction. The gap claim rests on the empirical completeness of the search and categorization accuracy, which are external to any self-referential loop. No self-citation load-bearing steps, ansatzes, or renamings occur. This matches the default non-circular outcome for a review paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a systematic review the central claims rest on the completeness of the database search and the accuracy of manual categorization of the 21 studies; no free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)

domain assumption PRISMA 2020 guidelines provide a complete and unbiased method for identifying and synthesizing relevant primary studies
Invoked in the abstract as the review methodology; standard in the field but not proven within the paper.

pith-pipeline@v0.9.1-grok · 5767 in / 1314 out tokens · 35263 ms · 2026-06-30T01:14:19.619846+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 26 canonical work pages

[1]

https://doi.org/10.1109/TSE.2021.3087402

Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402

work page doi:10.1109/tse.2021.3087402 2022
[2]

In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)

Chen, C.: Grey-box fuzzing with deep reinforcement learning and process trace back. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). pp. 1167–1171 (2021). https://doi.org/10.1109/AEMCSE51986.2021.00238

work page doi:10.1109/aemcse51986.2021.00238 2021
[3]

In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Ding, A., Chan, M., Hass, A., Tippenhauer, N.O., Ma, S., Zonouz, S.: Get your cyber-physical tests done! data-driven vulnerability assessment of robotic aerial vehicles. In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 67–80 (2023). https://doi.org/10.1109/DSN58367.2023.00020

work page doi:10.1109/dsn58367.2023.00020 2023
[4]

In: 2016 IEEE Symposium on Security and Privacy (SP)

Dolan-Gavitt, B., Hulin, P., Kirda, E., Leek, T., Mambretti, A., Robertson, W.K., Ulrich, F., Whelan, R.: Lava: Large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP). pp. 110–121 (2016). https://doi.org/10.1109/SP.2016.15

work page doi:10.1109/sp.2016.15 2016
[5]

In: Proceedings of the 17th International Conference on Mining Software Repositories

Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories. p. 508–512. MSR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379597.3387501 14 Bruno Caro Vásquez, Carola Figu...

work page doi:10.1145/3379597.3387501 2020
[6]

ACM Comput

Gomes, D., Felix, E., Aires, F., Vieira, M.: Static code analysis for iot security: A systematic literature review. ACM Comput. Surv.58(3) (Sep 2025). https://doi.org/10.1145/3745019

work page doi:10.1145/3745019 2025
[7]

In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT)

Gong, K., Yang, W., Cui, B., Chen, C.: Drlfcfuzzer: fuzzing with deep-reinforcement-learning under format constraints. In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT). pp. 374–380 (2022). https://doi.org/10.1109/EIECT58010.2022.00080

work page doi:10.1109/eiect58010.2022.00080 2022
[8]

In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

Götz, R., Sendner, C., Ruck, N., Rostami, M., Dmitrienko, A., Sadeghi, A.R.: Rlfuzz: Accelerating hardware fuzzing with deep reinforcement learning. In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). pp. 358–369 (2025). https://doi.org/10.1109/HOST64725.2025.11050051

work page doi:10.1109/host64725.2025.11050051 2025
[9]

IEEE Transactions on Industrial Informatics pp

Huang, K., Yu, Y., Hao, X., Song, J., Li, Y.: Drl-fuzzer: A generative and lightweight approach for modbus vulnerability mining in industrial control systems. IEEE Transactions on Industrial Informatics pp. 1–12 (2026). https://doi.org/10.1109/TII.2026.3688830, early Access

work page doi:10.1109/tii.2026.3688830 2026
[10]

In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC)

Huang, Z., Song, X., Luo, Y., Yang, J., Cui, B.: Syzballer: Kernel fuzzing based on basic block weight and multi-armed bandit. In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC). pp. 2364–2369 (2022). https://doi.org/10.1109/ICCC56324.2022.10065711

work page doi:10.1109/iccc56324.2022.10065711 2022
[11]

In: 2024 International Computer Symposium (ICS)

Jhang, S.W., Huang, S.K.: Multi-argument fuzzing by reinforcement learning. In: 2024 International Computer Symposium (ICS). pp. 101–106 (2024). https://doi.org/10.1109/ICS64339.2024.00026

work page doi:10.1109/ics64339.2024.00026 2024
[12]

IEEE Transactions on Software Engineering51(10), 2900–2920 (2025)

Jiang, Y., Qu, Z., Treude, C., Su, X., Wang, T.: Enhancing fine-grained vulnerability detection with reinforcement learning. IEEE Transactions on Software Engineering51(10), 2900–2920 (2025). https://doi.org/10.1109/TSE.2025.3603400

work page doi:10.1109/tse.2025.3603400 2025
[13]

In: 2025 IEEE Conference on Dependable and Secure Computing (DSC)

Khan, H.M.S., Pashiourtides, K., Marnerides, A.K.: Adaptive fuzzing framework for embedded systems vulnerability detection using reinforcement and deep learning. In: 2025 IEEE Conference on Dependable and Secure Computing (DSC). pp. 1–8 (2025). https://doi.org/10.1109/DSC65356.2025.11260865

work page doi:10.1109/dsc65356.2025.11260865 2025
[14]

Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering2(01 2007)

2007
[15]

In: Proc

Kuznetsov, A., Shapoval, O., Chernov, K., Yeromin, Y., Popova, M., Syniavska, O.: Automated software vulnerability testing using in-depth training methods. In: Proc. 2nd Int. Workshop on Computer Modeling and Intelligent Systems (CMIS-2019). CEUR Workshop Proceedings, vol. 2353. CEUR-WS.org (2019), https://ceur-ws.org/Vol-2353/paper18.pdf

2019
[16]

In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON)

Kuznetsov, A., Yeromin, Y., Shapoval, O., Chernov, K., Popova, M., Serdukov, K.: Automated software vulnerability testing using deep learning methods. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON). pp. 837–841 (2019). https://doi.org/10.1109/UKRCON.2019.8879997

work page doi:10.1109/ukrcon.2019.8879997 2019
[17]

In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Li, L., Ding, S.H.H., Walenstein, A., Charland, P., Fung, B.C.M.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. p. 1215–1225. CIKM ’24, Association for Computing Machinery, New York, NY, US...

work page doi:10.1145/3627673.3679726 2024
[18]

In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR)

Liang, X., Xiao, T.: Rlf: Directed fuzzing based on deep reinforcement learning. In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR). pp. 127–133 (2022). https://doi.org/10.1109/MLCR57210.2022.00032 RL for Software Vulnerability Analysis 15

work page doi:10.1109/mlcr57210.2022.00032 2022
[19]

In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)

Miao, S., Wang, J., Zhang, C., Lin, Z., Gong, J., Zhang, X., Li, J.: Deep learning in fuzzing: A literature survey. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). pp. 220–223 (2022). https://doi.org/10.1109/ICETCI55101.2022.9832143

work page doi:10.1109/icetci55101.2022.9832143 2022
[20]

https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

MIT Lincoln Laboratory: Cyber grand challenge corpus. https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

2017
[21]

https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

National Institute of Standards and Technology: Juliet test suite for c/c++ version 1.3. https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

2017
[22]

In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Paduraru, C., Paduraru, M., Stefanescu, A.: Optimizing decision making in concolic execution using reinforcement learning. In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). pp. 52–61 (2020). https://doi.org/10.1109/ICSTW50294.2020.00025

work page doi:10.1109/icstw50294.2020.00025 2020
[23]

https://doi.org/10.1109/ICST49551.2021.00055

Paduraru, C., Paduraru, M., Stefanescu, A.: Riverfuzzrl - an open-source tool to experiment with reinforcement learning for fuzzing (04 2021). https://doi.org/10.1109/ICST49551.2021.00055

work page doi:10.1109/icst49551.2021.00055 2021
[24]

BMJ372(2021)

Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J., Grimshaw, J.M., Hróbjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moher, ...

work page doi:10.1136/bmj.n71 2020
[25]

IEEE Access 12, 129064–129080 (2024)

Pham, V.H., Thi Thu Hien, D., Phuc Chuong, N., Thanh Thai, P., The Duy, P.: A coverage-guided fuzzing method for automatic software vulnerability detection using reinforcement learning-enabled multi-level input mutation. IEEE Access 12, 129064–129080 (2024). https://doi.org/10.1109/ACCESS.2024.3421989

work page doi:10.1109/access.2024.3421989 2024
[26]

Automated Software Engineering31(04 2024)

Ren, Z., Ju, X., Chen, X., Shen, H.: Prorlearn: boosting prompt tuning-based vulnerability detection by reinforcement learning. Automated Software Engineering31(04 2024). https://doi.org/10.1007/s10515-024-00438-9

work page doi:10.1007/s10515-024-00438-9 2024
[27]

In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest)

Steenhoek, B., Tufano, M., Sundaresan, N., Svyatkovskiy, A.: Reinforcement learning from automatic feedback for high-quality unit test generation. In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest). p. 37–44. IEEE Press (2025). https://doi.org/10.1109/DeepTest66595.2025.00011

work page doi:10.1109/deeptest66595.2025.00011 2025
[28]

https://doi.org/10.14722/ndss.2021.24486

Wang, J., Song, C., Yin, H.: Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing (01 2021). https://doi.org/10.14722/ndss.2021.24486

work page doi:10.14722/ndss.2021.24486 2021
[29]

International Journal of Intelligent Systems2024(1), 7931792 (2024)

Xie, L., Zhao, Y., Yang, H., Zhao, Z., Hu, Z., Zhang, L., Cheng, X.: Docfuzz: A directed fuzzing method based on a feedback mechanism mutator. International Journal of Intelligent Systems2024(1), 7931792 (2024). https://doi.org/https://doi.org/10.1155/int/7931792, https://onlinelibrary.wiley.com/doi/abs/10.1155/int/7931792

work page doi:10.1155/int/7931792 2024
[30]

In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

Yu, X., Liang, H., Wang, C.: Multiple targets directed greybox fuzzing: From reachable to exploited. In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 907–917 (2024). https://doi.org/10.1109/SANER60148.2024.00099

work page doi:10.1109/saner60148.2024.00099 2024
[31]

In: Advances in Neural Information Processing Systems (NeurIPS)

Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)

2019

[1] [1]

https://doi.org/10.1109/TSE.2021.3087402

Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402

work page doi:10.1109/tse.2021.3087402 2022

[2] [2]

In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)

Chen, C.: Grey-box fuzzing with deep reinforcement learning and process trace back. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). pp. 1167–1171 (2021). https://doi.org/10.1109/AEMCSE51986.2021.00238

work page doi:10.1109/aemcse51986.2021.00238 2021

[3] [3]

In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Ding, A., Chan, M., Hass, A., Tippenhauer, N.O., Ma, S., Zonouz, S.: Get your cyber-physical tests done! data-driven vulnerability assessment of robotic aerial vehicles. In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). pp. 67–80 (2023). https://doi.org/10.1109/DSN58367.2023.00020

work page doi:10.1109/dsn58367.2023.00020 2023

[4] [4]

In: 2016 IEEE Symposium on Security and Privacy (SP)

Dolan-Gavitt, B., Hulin, P., Kirda, E., Leek, T., Mambretti, A., Robertson, W.K., Ulrich, F., Whelan, R.: Lava: Large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP). pp. 110–121 (2016). https://doi.org/10.1109/SP.2016.15

work page doi:10.1109/sp.2016.15 2016

[5] [5]

In: Proceedings of the 17th International Conference on Mining Software Repositories

Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories. p. 508–512. MSR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379597.3387501 14 Bruno Caro Vásquez, Carola Figu...

work page doi:10.1145/3379597.3387501 2020

[6] [6]

ACM Comput

Gomes, D., Felix, E., Aires, F., Vieira, M.: Static code analysis for iot security: A systematic literature review. ACM Comput. Surv.58(3) (Sep 2025). https://doi.org/10.1145/3745019

work page doi:10.1145/3745019 2025

[7] [7]

In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT)

Gong, K., Yang, W., Cui, B., Chen, C.: Drlfcfuzzer: fuzzing with deep-reinforcement-learning under format constraints. In: 2022 2nd International Conference on Electronic Information Engineering and Computer Technology (EIECT). pp. 374–380 (2022). https://doi.org/10.1109/EIECT58010.2022.00080

work page doi:10.1109/eiect58010.2022.00080 2022

[8] [8]

In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

Götz, R., Sendner, C., Ruck, N., Rostami, M., Dmitrienko, A., Sadeghi, A.R.: Rlfuzz: Accelerating hardware fuzzing with deep reinforcement learning. In: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). pp. 358–369 (2025). https://doi.org/10.1109/HOST64725.2025.11050051

work page doi:10.1109/host64725.2025.11050051 2025

[9] [9]

IEEE Transactions on Industrial Informatics pp

Huang, K., Yu, Y., Hao, X., Song, J., Li, Y.: Drl-fuzzer: A generative and lightweight approach for modbus vulnerability mining in industrial control systems. IEEE Transactions on Industrial Informatics pp. 1–12 (2026). https://doi.org/10.1109/TII.2026.3688830, early Access

work page doi:10.1109/tii.2026.3688830 2026

[10] [10]

In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC)

Huang, Z., Song, X., Luo, Y., Yang, J., Cui, B.: Syzballer: Kernel fuzzing based on basic block weight and multi-armed bandit. In: 2022 IEEE 8th International Conference on Computer and Communications (ICCC). pp. 2364–2369 (2022). https://doi.org/10.1109/ICCC56324.2022.10065711

work page doi:10.1109/iccc56324.2022.10065711 2022

[11] [11]

In: 2024 International Computer Symposium (ICS)

Jhang, S.W., Huang, S.K.: Multi-argument fuzzing by reinforcement learning. In: 2024 International Computer Symposium (ICS). pp. 101–106 (2024). https://doi.org/10.1109/ICS64339.2024.00026

work page doi:10.1109/ics64339.2024.00026 2024

[12] [12]

IEEE Transactions on Software Engineering51(10), 2900–2920 (2025)

Jiang, Y., Qu, Z., Treude, C., Su, X., Wang, T.: Enhancing fine-grained vulnerability detection with reinforcement learning. IEEE Transactions on Software Engineering51(10), 2900–2920 (2025). https://doi.org/10.1109/TSE.2025.3603400

work page doi:10.1109/tse.2025.3603400 2025

[13] [13]

In: 2025 IEEE Conference on Dependable and Secure Computing (DSC)

Khan, H.M.S., Pashiourtides, K., Marnerides, A.K.: Adaptive fuzzing framework for embedded systems vulnerability detection using reinforcement and deep learning. In: 2025 IEEE Conference on Dependable and Secure Computing (DSC). pp. 1–8 (2025). https://doi.org/10.1109/DSC65356.2025.11260865

work page doi:10.1109/dsc65356.2025.11260865 2025

[14] [14]

Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering2(01 2007)

2007

[15] [15]

In: Proc

Kuznetsov, A., Shapoval, O., Chernov, K., Yeromin, Y., Popova, M., Syniavska, O.: Automated software vulnerability testing using in-depth training methods. In: Proc. 2nd Int. Workshop on Computer Modeling and Intelligent Systems (CMIS-2019). CEUR Workshop Proceedings, vol. 2353. CEUR-WS.org (2019), https://ceur-ws.org/Vol-2353/paper18.pdf

2019

[16] [16]

In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON)

Kuznetsov, A., Yeromin, Y., Shapoval, O., Chernov, K., Popova, M., Serdukov, K.: Automated software vulnerability testing using deep learning methods. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON). pp. 837–841 (2019). https://doi.org/10.1109/UKRCON.2019.8879997

work page doi:10.1109/ukrcon.2019.8879997 2019

[17] [17]

In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Li, L., Ding, S.H.H., Walenstein, A., Charland, P., Fung, B.C.M.: Dynamic neural control flow execution: an agent-based deep equilibrium approach for binary vulnerability detection. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. p. 1215–1225. CIKM ’24, Association for Computing Machinery, New York, NY, US...

work page doi:10.1145/3627673.3679726 2024

[18] [18]

In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR)

Liang, X., Xiao, T.: Rlf: Directed fuzzing based on deep reinforcement learning. In: 2022 International Conference on Machine Learning, Control, and Robotics (MLCR). pp. 127–133 (2022). https://doi.org/10.1109/MLCR57210.2022.00032 RL for Software Vulnerability Analysis 15

work page doi:10.1109/mlcr57210.2022.00032 2022

[19] [19]

In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI)

Miao, S., Wang, J., Zhang, C., Lin, Z., Gong, J., Zhang, X., Li, J.: Deep learning in fuzzing: A literature survey. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). pp. 220–223 (2022). https://doi.org/10.1109/ICETCI55101.2022.9832143

work page doi:10.1109/icetci55101.2022.9832143 2022

[20] [20]

https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

MIT Lincoln Laboratory: Cyber grand challenge corpus. https://archive.ll.mit.edu/cgc/cgc-corpus/about/ (2017), accessed: 2026-06-01

2017

[21] [21]

https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

National Institute of Standards and Technology: Juliet test suite for c/c++ version 1.3. https://samate.nist.gov/SARD/test-suites/112 (2017), software Assurance Reference Dataset (SARD)

2017

[22] [22]

In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Paduraru, C., Paduraru, M., Stefanescu, A.: Optimizing decision making in concolic execution using reinforcement learning. In: 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). pp. 52–61 (2020). https://doi.org/10.1109/ICSTW50294.2020.00025

work page doi:10.1109/icstw50294.2020.00025 2020

[23] [23]

https://doi.org/10.1109/ICST49551.2021.00055

Paduraru, C., Paduraru, M., Stefanescu, A.: Riverfuzzrl - an open-source tool to experiment with reinforcement learning for fuzzing (04 2021). https://doi.org/10.1109/ICST49551.2021.00055

work page doi:10.1109/icst49551.2021.00055 2021

[24] [24]

BMJ372(2021)

Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron, I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L., Tetzlaff, J.M., Akl, E.A., Brennan, S.E., Chou, R., Glanville, J., Grimshaw, J.M., Hróbjartsson, A., Lalu, M.M., Li, T., Loder, E.W., Mayo-Wilson, E., McDonald, S., McGuinness, L.A., Stewart, L.A., Thomas, J., Tricco, A.C., Welch, V.A., Whiting, P., Moher, ...

work page doi:10.1136/bmj.n71 2020

[25] [25]

IEEE Access 12, 129064–129080 (2024)

Pham, V.H., Thi Thu Hien, D., Phuc Chuong, N., Thanh Thai, P., The Duy, P.: A coverage-guided fuzzing method for automatic software vulnerability detection using reinforcement learning-enabled multi-level input mutation. IEEE Access 12, 129064–129080 (2024). https://doi.org/10.1109/ACCESS.2024.3421989

work page doi:10.1109/access.2024.3421989 2024

[26] [26]

Automated Software Engineering31(04 2024)

Ren, Z., Ju, X., Chen, X., Shen, H.: Prorlearn: boosting prompt tuning-based vulnerability detection by reinforcement learning. Automated Software Engineering31(04 2024). https://doi.org/10.1007/s10515-024-00438-9

work page doi:10.1007/s10515-024-00438-9 2024

[27] [27]

In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest)

Steenhoek, B., Tufano, M., Sundaresan, N., Svyatkovskiy, A.: Reinforcement learning from automatic feedback for high-quality unit test generation. In: 2025 IEEE/ACM International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest). p. 37–44. IEEE Press (2025). https://doi.org/10.1109/DeepTest66595.2025.00011

work page doi:10.1109/deeptest66595.2025.00011 2025

[28] [28]

https://doi.org/10.14722/ndss.2021.24486

Wang, J., Song, C., Yin, H.: Reinforcement learning-based hierarchical seed scheduling for greybox fuzzing (01 2021). https://doi.org/10.14722/ndss.2021.24486

work page doi:10.14722/ndss.2021.24486 2021

[29] [29]

International Journal of Intelligent Systems2024(1), 7931792 (2024)

Xie, L., Zhao, Y., Yang, H., Zhao, Z., Hu, Z., Zhang, L., Cheng, X.: Docfuzz: A directed fuzzing method based on a feedback mechanism mutator. International Journal of Intelligent Systems2024(1), 7931792 (2024). https://doi.org/https://doi.org/10.1155/int/7931792, https://onlinelibrary.wiley.com/doi/abs/10.1155/int/7931792

work page doi:10.1155/int/7931792 2024

[30] [30]

In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

Yu, X., Liang, H., Wang, C.: Multiple targets directed greybox fuzzing: From reachable to exploited. In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 907–917 (2024). https://doi.org/10.1109/SANER60148.2024.00099

work page doi:10.1109/saner60148.2024.00099 2024

[31] [31]

In: Advances in Neural Information Processing Systems (NeurIPS)

Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)

2019