pith. sign in

arxiv: 2606.22647 · v1 · pith:KZRQ6HUFnew · submitted 2026-06-21 · 💻 cs.CR · cs.LG· cs.SE

RAVEN: Agentic RAG for Automated Vulnerability Repair

Pith reviewed 2026-06-26 09:55 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.SE
keywords automated vulnerability repairagentic RAGCVEcross-file dependenciesopen-source LLMssoftware securityiterative patchinggeneralization
0
0 comments X

The pith

RAVEN's agentic RAG framework repairs 83.13% of 160 real-world CVEs across vulnerability types and languages using only open-source local LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RAVEN as an autonomous system that pairs retrieval-augmented generation with agentic control and iterative patching to fix software vulnerabilities. A multi-faceted retrieval step pulls historically similar fixes while a separate Curator Agent gathers cross-file dependencies that local code alone cannot supply. The resulting pipeline runs entirely on open-source models in a local setting and is tested on 160 CVEs that include unseen categories, two languages, and out-of-distribution cases. It reports an 83.13% overall repair rate that exceeds prior frameworks while keeping compute cost low. A reader would care because existing repair tools stay confined to narrow vulnerability classes and often require proprietary models or single-language data.

Core claim

RAVEN integrates an agentic RAG pipeline with controlled iterative repair, utilizing open-source LLMs in a fully locally deployable setting with limited GPU requirements, a multi-faceted retrieval pipeline to retrieve historically relevant vulnerability fixes, and a dedicated Curator Agent that retrieves cross-file dependencies from the target repository to fix complex vulnerabilities that cannot be addressed using local vulnerable code alone. Evaluated on 160 real-world CVE vulnerabilities across diverse vulnerability types, two programming languages, unseen CWE categories, and out-of-distribution settings, RAVEN achieves an overall repair success rate of 83.13%, outperforming all existing

What carries the argument

Curator Agent that retrieves cross-file dependencies together with the multi-faceted retrieval pipeline that surfaces historical fixes to steer patch generation.

If this is right

  • RAVEN generalizes to unseen CWE categories and out-of-distribution vulnerability settings.
  • It delivers the reported performance using only open-source LLMs without proprietary models.
  • Repair cost stays negligible across the evaluated cases.
  • The approach handles vulnerabilities in two programming languages within one framework.
  • Complex cases that need repository-wide context are addressed by the dedicated Curator Agent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the retrieval step continues to locate useful prior fixes for entirely new vulnerability patterns, the framework could shorten the time from disclosure to patch in production codebases.
  • Local-only execution would allow deployment inside regulated or air-gapped environments where external model calls are disallowed.
  • The same retrieval-plus-curator structure might transfer to non-security code repair tasks such as API migration or performance tuning with only changes to the retrieval corpus.
  • Adding dynamic execution traces to the Curator Agent could further raise success on vulnerabilities whose patches depend on runtime behavior.

Load-bearing premise

The multi-faceted retrieval pipeline and Curator Agent can reliably surface historically relevant fixes and cross-file dependencies sufficient to guide patch generation for complex vulnerabilities beyond local code.

What would settle it

Evaluating the same 160 CVEs after removing all historical repair examples from the retrieval index would show whether success rate remains above 70% or collapses.

Figures

Figures reproduced from arXiv: 2606.22647 by Alexandra Dmitrienko, Varun Gadey, Zijie Liu.

Figure 1
Figure 1. Figure 1: CWE-862 Missing Authorization issue occurs at line 4, where moveAttachment(...) is directly invoked without any prior authorization check on the requesting user. The surrounding lines primarily manage execution flow, including setting the user context (lines 1–2), tracking progress (lines 3, 5, 7, 9), but do not perform access control checks before the sensitive operation is executed. Exploitation of this … view at source ↗
Figure 3
Figure 3. Figure 3: RAVEN’s Repair Workflow with its four phases. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Semantic Retriever Module [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Prompt template for Selection Agent 15 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt template for context retrieval 16 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prompt template for Patch Generator 17 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
read the original abstract

Automated vulnerability repair has emerged as a promising direction to mitigate the growing number of software vulnerabilities. Recent advances in Large Language Models (LLMs) have further accelerated research in automated repair. However, existing frameworks remain largely restricted to memory-related vulnerabilities and locally repairable vulnerability settings, leaving generalization to unseen vulnerability types underexplored. Their evaluations are often limited to a single programming language, and largely rely on proprietary models. In this paper, we propose RAVEN, a scalable, efficient and autonomous framework that integrates an agentic retrieval-augmented generation (RAG) pipeline with controlled iterative repair in a unified framework. The framework utilizes open-source LLMs in a fully locally deployable setting with limited GPU requirements, while building a multi-faceted retrieval pipeline to retrieve historically relevant vulnerability fixes and guide the patch generation. In addition, RAVEN introduces a dedicated Curator Agent that retrieves cross-file dependencies from the target repository, to fix complex vulnerabilities that cannot be addressed using local vulnerable code alone. We evaluate RAVEN on 160 real-world CVE vulnerabilities across diverse vulnerability types, two programming languages, unseen CWE categories, and out-of-distribution settings. RAVEN achieves an overall repair success rate of 83.13%, outperforming all existing state-of-the-art repair frameworks, while also demonstrating strong generalization capabilities and maintaining the repair cost negligible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes RAVEN, an agentic RAG framework for automated vulnerability repair that combines multi-faceted retrieval of historical fixes with a dedicated Curator Agent to handle cross-file dependencies. It evaluates the system on 160 real-world CVEs across two languages and unseen CWEs, claiming an 83.13% overall repair success rate that outperforms prior frameworks while using only open-source LLMs in a locally deployable setting with negligible repair cost.

Significance. If the performance and generalization claims are substantiated with supporting retrieval metrics and failure analysis, the work would be significant for extending automated repair beyond memory-related and locally fixable vulnerabilities to more complex, cross-file cases using accessible models.

major comments (3)
  1. [Evaluation] Evaluation section: the reported 83.13% success rate on 160 CVEs is presented without retrieval-precision metrics (e.g., precision@K or recall for the multi-faceted pipeline) or a breakdown of how many cases required the Curator Agent, which directly undermines assessment of whether the agentic-RAG components are responsible for the claimed generalization to unseen CWEs and complex vulnerabilities.
  2. [§4] §4 (Curator Agent description): no quantitative evaluation is supplied on the Curator Agent's success rate in surfacing non-local dependencies, despite this being the load-bearing mechanism for the claim that RAVEN handles vulnerabilities "that cannot be addressed using local vulnerable code alone."
  3. [Results] Results tables (e.g., Table 2 or equivalent): the outperformance claim over SOTA lacks per-CWE or per-language breakdowns that would allow verification of the "strong generalization capabilities" assertion, especially given the stress-test concern around cross-file retrieval reliability.
minor comments (2)
  1. [Abstract] The abstract states "repair cost negligible" without defining the cost metric (tokens, wall-clock time, or GPU hours); this should be clarified with explicit numbers in the evaluation.
  2. [§3] Notation for the multi-faceted retrieval pipeline (e.g., how the facets are combined) is introduced without a formal diagram or pseudocode, reducing reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the evaluation would be strengthened by additional metrics and breakdowns, and we will incorporate them in the revision.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the reported 83.13% success rate on 160 CVEs is presented without retrieval-precision metrics (e.g., precision@K or recall for the multi-faceted pipeline) or a breakdown of how many cases required the Curator Agent, which directly undermines assessment of whether the agentic-RAG components are responsible for the claimed generalization to unseen CWEs and complex vulnerabilities.

    Authors: We agree that retrieval-precision metrics and a breakdown of Curator Agent usage are absent from the current evaluation section. These additions would allow clearer attribution of results to the agentic-RAG components. In the revised manuscript we will report precision@K and recall for the multi-faceted retrieval pipeline as well as the number and percentage of cases that invoked the Curator Agent. revision: yes

  2. Referee: [§4] §4 (Curator Agent description): no quantitative evaluation is supplied on the Curator Agent's success rate in surfacing non-local dependencies, despite this being the load-bearing mechanism for the claim that RAVEN handles vulnerabilities "that cannot be addressed using local vulnerable code alone."

    Authors: We acknowledge that §4 provides no quantitative success rate for the Curator Agent on non-local dependencies. We will add such an evaluation in the revision, including accuracy figures obtained via manual inspection of the 160 CVEs to quantify how often the agent correctly surfaces cross-file dependencies. revision: yes

  3. Referee: [Results] Results tables (e.g., Table 2 or equivalent): the outperformance claim over SOTA lacks per-CWE or per-language breakdowns that would allow verification of the "strong generalization capabilities" assertion, especially given the stress-test concern around cross-file retrieval reliability.

    Authors: The current tables report only aggregate success rates. We agree that per-CWE and per-language breakdowns would better support the generalization claims. We will expand the results tables in the revised version to include these stratified results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluated on external CVE benchmark

full rationale

The paper describes an agentic RAG framework for vulnerability repair and reports an 83.13% success rate on 160 real-world CVEs across languages and unseen CWEs. No equations, fitted parameters, or derivation chain appear in the abstract or described claims. Performance is presented as a direct experimental outcome rather than a quantity derived from or equivalent to the framework's own inputs by construction. No self-citation load-bearing steps or ansatz smuggling are indicated. The central claim rests on retrieval and agent behavior, which the skeptic correctly flags as an assumption, but this is an empirical risk, not a circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no technical sections, equations, or implementation details are provided to identify free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5777 in / 1134 out tokens · 25092 ms · 2026-06-26T09:55:47.589100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 15 canonical work pages

  1. [1]

    46 Vulnerability Statistics 2026: Key Trends in Discovery, Exploitation, and Risk,

    “46 Vulnerability Statistics 2026: Key Trends in Discovery, Exploitation, and Risk,” 2026. [Online]. Available: https://security boulevard.com/2026/03/46-vulnerability-statistics-2026-key-trends-i n-discovery-exploitation-and-risk/

  2. [2]

    Context- aware patch generation for better automated program repair,

    M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung, “Context- aware patch generation for better automated program repair,” in Proceedings of the 40th International Conference on Software 11 Engineering, ser. ICSE ’18. New York, NY , USA: Association for Computing Machinery, 2018, p. 1–11. [Online]. Available: https://doi.org/10.1145/3180155.3180233

  3. [3]

    Better code search and reuse for better pro- gram repair,

    Q. Xin and S. Reiss, “Better code search and reuse for better pro- gram repair,” in2019 IEEE/ACM International Workshop on Genetic Improvement (GI), 2019, pp. 10–17

  4. [4]

    Sosrepair: Expressive semantic search for real-world program re- pair,

    A. Afzal, M. Motwani, K. T. Stolee, Y . Brun, and C. Le Goues, “Sosrepair: Expressive semantic search for real-world program re- pair,”IEEE Transactions on Software Engineering, vol. 47, no. 10, pp. 2162–2181, 2021

  5. [5]

    Automatic program repair using formal verification and expression templates,

    T.-T. Nguyen, Q.-T. Ta, and W.-N. Chin, “Automatic program repair using formal verification and expression templates,” inVerification, Model Checking, and Abstract Interpretation, C. Enea and R. Piskac, Eds. Cham: Springer International Publishing, 2019, pp. 70–91

  6. [6]

    Automatic patch generation learned from human-written patches,

    D. Kim, J. Nam, J. Song, and S. Kim, “Automatic patch generation learned from human-written patches,” in2013 35th International Conference on Software Engineering (ICSE), 2013, pp. 802–811

  7. [7]

    You cannot fix what you cannot find! an investiga- tion of fault localization bias in benchmarking automated program repair systems,

    K. Liu, A. Koyuncu, T. F. Bissyand ´e, D. Kim, J. Klein, and Y . Le Traon, “You cannot fix what you cannot find! an investiga- tion of fault localization bias in benchmarking automated program repair systems,” in2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019, pp. 102–113

  8. [8]

    Avatar: Fixing semantic bugs with fix patterns of static analysis violations,

    K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyand ´e, “Avatar: Fixing semantic bugs with fix patterns of static analysis violations,”2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 1–12, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:201043038

  9. [9]

    Sequencer: Sequence-to-sequence learning for end-to-end program repair,

    Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet, D. Poshyvanyk, and M. Monperrus, “Sequencer: Sequence-to-sequence learning for end-to-end program repair,”IEEE Transactions on Software Engi- neering, vol. 47, no. 9, pp. 1943–1959, 2021

  10. [10]

    Sorting and transforming program repair ingredients via deep learning code similarities,

    M. White, M. Tufano, M. Mart ´ınez, M. Monperrus, and D. Poshy- vanyk, “Sorting and transforming program repair ingredients via deep learning code similarities,” in2019 IEEE 26th International Confer- ence on Software Analysis, Evolution and Reengineering (SANER), 2019, pp. 479–490

  11. [11]

    Coconut: combining context-aware neural translation models using ensemble for program repair,

    T. Lutellier, H. V . Pham, L. Pang, Y . Li, M. Wei, and L. Tan, “Coconut: combining context-aware neural translation models using ensemble for program repair,” ser. ISSTA 2020. New York, NY , USA: Association for Computing Machinery, 2020, p. 101–114. [Online]. Available: https://doi.org/10.1145/3395363.3397369

  12. [12]

    Neural transfer learning for repairing security vulnerabilities in c code,

    Z. Chen, S. Kommrusch, and M. Monperrus, “Neural transfer learning for repairing security vulnerabilities in c code,”IEEE Transactions on Software Engineering, vol. 49, no. 1, pp. 147–165, 2023

  13. [13]

    Seqtrans: Automatic vulnerability fix via sequence to sequence learning,

    J. Chi, Y . Qu, T. Liu, Q. Zheng, and H. Yin, “Seqtrans: Automatic vulnerability fix via sequence to sequence learning,”IEEE Transac- tions on Software Engineering, vol. 49, no. 2, pp. 564–585, 2023

  14. [14]

    Vulrepair: a t5-based automated software vulnerability repair,

    M. Fu, C. Tantithamthavorn, T. Le, V . Nguyen, and D. Phung, “Vulrepair: a t5-based automated software vulnerability repair,” ser. ESEC/FSE 2022. New York, NY , USA: Association for Computing Machinery, 2022, p. 935–947. [Online]. Available: https://doi.org/10.1145/3540250.3549098

  15. [15]

    Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,

    X. Zhou, K. Kim, B. Xu, D. Han, and D. Lo, “Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145...

  16. [16]

    Ex- amining zero-shot vulnerability repair with large language models,

    H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Ex- amining zero-shot vulnerability repair with large language models,” in2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 2339–2356

  17. [17]

    Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,

    W. Wang, Y . Wang, S. Joty, and S. C. Hoi, “Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,” ser. ESEC/FSE 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 146–158. [Online]. Available: https://doi.org/10.1145/3611643.3616256

  18. [18]

    Y . Kim, S. Shin, H. Kim, and J. Yoon,Logs in, patches out: auto- mated vulnerability repair via tree-of-thought LLM analysis. USA: USENIX Association, 2025

  19. [19]

    Patchagent: a practical program repair agent mimicking human ex- pertise,

    Z. Yu, Z. Guo, Y . Wu, J. Yu, M. Xu, D. Mu, Y . Chen, and X. Xing, “Patchagent: a practical program repair agent mimicking human ex- pertise,” inProceedings of the 34th USENIX Conference on Security Symposium, ser. SEC ’25. USA: USENIX Association, 2025

  20. [20]

    Appatch: au- tomated adaptive prompting large language models for real-world software vulnerability patching,

    Y . Nong, H. Yang, L. Cheng, H. Hu, and H. Cai, “Appatch: au- tomated adaptive prompting large language models for real-world software vulnerability patching,” inProceedings of the 34th USENIX Conference on Security Symposium, ser. SEC ’25. USA: USENIX Association, 2025

  21. [21]

    Beyond tests: Program vulnerability repair via crash constraint extraction,

    X. Gao, B. Wang, G. J. Duck, R. Ji, Y . Xiong, and A. Roychoudhury, “Beyond tests: Program vulnerability repair via crash constraint extraction,”ACM Trans. Softw. Eng. Methodol., vol. 30, no. 2, Feb

  22. [22]

    Available: https://doi.org/10.1145/3418461

    [Online]. Available: https://doi.org/10.1145/3418461

  23. [23]

    Gemma 4,

    Google DeepMind, “Gemma 4,” 2026. [Online]. Available: https: //deepmind.google/models/gemma/gemma-4/

  24. [24]

    Nemotron-3-30b,

    NVIDIA Corporation, “Nemotron-3-30b,” 2025. [Online]. Available: https://build.nvidia.com/nvidia/nemotron-3-nano-30b-a3b/modelcard

  25. [25]

    CVE-2023-37910,

    “CVE-2023-37910,” 2023. [Online]. Available: https://www.cve.org/ CVERecord?id=CVE-2023-37910

  26. [26]

    XWiki Platform: An Open-Source Enterprise Wiki Framework

    “XWiki Platform: An Open-Source Enterprise Wiki Framework.”

  27. [27]

    CWE-862: Missing Authorization

    “CWE-862: Missing Authorization.” [Online]. Available: https: //cwe.mitre.org/data/definitions/862.html

  28. [28]

    CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software,

    G. Bhandari, A. Naseer, and L. Moonen, “Cvefixes: automated collection of vulnerabilities and their fixes from open-source software,” inProceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, ser. PROMISE 2021. New York, NY , USA: Association for Computing Machinery, 2021, p. 30–39. [Online]. Avail...

  29. [29]

    Joern: A tool for parsing and analyzing source code,

    J. Developers, “Joern: A tool for parsing and analyzing source code,” 2017. [Online]. Available: https://joern.io

  30. [30]

    The probabilistic relevance frame- work: Bm25 and beyond,

    S. Robertson and H. Zaragoza, “The probabilistic relevance frame- work: Bm25 and beyond,”Foundations and Trends in Information Retrieval, vol. 3, pp. 333–389, 09 2009

  31. [31]

    Semgrep: Lightweight static analysis for many lan- guages,

    Semgrep, Inc., “Semgrep: Lightweight static analysis for many lan- guages,” https://github.com/semgrep/semgrep

  32. [32]

    Codebleu: a method for automatic evaluation of code synthesis,

    S. Ren, D. Guo, S. Lu, L. Zhou, S. Liu, D. Tang, N. Sundaresan, M. Zhou, A. Blanco, and S. Ma, “Codebleu: a method for automatic evaluation of code synthesis,” 2020. [Online]. Available: https://arxiv.org/abs/2009.10297

  33. [33]

    CWE-787: Out-of-bounds Write

    “CWE-787: Out-of-bounds Write.” [Online]. Available: https: //cwe.mitre.org/data/definitions/787.html

  34. [34]

    CWE-125: Out-of-bounds Read

    “CWE-125: Out-of-bounds Read.” [Online]. Available: https: //cwe.mitre.org/data/definitions/125.html

  35. [35]

    CWE-79: Improper Neutralization of Input During Web Page Generation (’Cross-site Scripting’)

    “CWE-79: Improper Neutralization of Input During Web Page Generation (’Cross-site Scripting’).” [Online]. Available: https: //cwe.mitre.org/data/definitions/79.html

  36. [36]

    CWE-89: Improper Neutralization of Special Elements used in an SQL Command (’SQL Injection’)

    “CWE-89: Improper Neutralization of Special Elements used in an SQL Command (’SQL Injection’).” [Online]. Available: https://cwe.mitre.org/data/definitions/89.html

  37. [37]

    CWE-94: Improper Control of Generation of Code (’Code Injection’)

    “CWE-94: Improper Control of Generation of Code (’Code Injection’).” [Online]. Available: https://cwe.mitre.org/data/definitio ns/94.html

  38. [38]

    CWE-22: Improper Limitation of a Pathname to a Restricted Directory (’Path Traversal’)

    “CWE-22: Improper Limitation of a Pathname to a Restricted Directory (’Path Traversal’).” [Online]. Available: https://cwe.mitre. org/data/definitions/22.html

  39. [39]

    CWE-502: Deserialization of Untrusted Data

    “CWE-502: Deserialization of Untrusted Data.” [Online]. Available: https://cwe.mitre.org/data/definitions/502.html 12

  40. [40]

    CWE-352: Cross-Site Request Forgery (CSRF)

    “CWE-352: Cross-Site Request Forgery (CSRF).” [Online]. Available: https://cwe.mitre.org/data/definitions/352.html

  41. [41]

    CWE-863: Incorrect Authorization

    “CWE-863: Incorrect Authorization.” [Online]. Available: https: //cwe.mitre.org/data/definitions/863.html

  42. [42]

    Vul4j: A dataset of reproducible java vulnerabilities geared towards the study of program repair techniques,

    Q.-C. Bui, R. Scandariato, and N. E. D. Ferreyra, “Vul4j: A dataset of reproducible java vulnerabilities geared towards the study of program repair techniques,” in2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), 2022, pp. 464–468

  43. [43]

    CWE-835: Loop with Unreachable Exit Condition (’Infinite Loop’)

    “CWE-835: Loop with Unreachable Exit Condition (’Infinite Loop’).” [Online]. Available: https://cwe.mitre.org/data/definitions/835.html

  44. [44]

    CWE-828: Signal Handler with Functionality that is not Asynchronous-Safe

    “ CWE-828: Signal Handler with Functionality that is not Asynchronous-Safe.” [Online]. Available: https://cwe.mitre.org/data /definitions/828.html

  45. [45]

    CWE-770: Allocation of Resources Without Limits or Throttling

    “ CWE-770: Allocation of Resources Without Limits or Throttling.” [Online]. Available: https://cwe.mitre.org/data/definitions/770.html

  46. [46]

    CWE-476: NULL Pointer Dereference

    “CWE-476: NULL Pointer Dereference.” [Online]. Available: https://cwe.mitre.org/data/definitions/476.html

  47. [47]

    CWE-416: Use After Free

    “CWE-416: Use After Free.” [Online]. Available: https://cwe.mitre. org/data/definitions/416.html

  48. [48]

    CWE-281: Improper Preservation of Permissions

    “CWE-281: Improper Preservation of Permissions.” [Online]. Available: https://cwe.mitre.org/data/definitions/281.html

  49. [49]

    CWE-96: Improper Neutralization of Directives in Statically Saved Code (’Static Code Injection’)

    “CWE-96: Improper Neutralization of Directives in Statically Saved Code (’Static Code Injection’).” [Online]. Available: https: //cwe.mitre.org/data/definitions/96.html

  50. [50]

    Program vulnerability repair via inductive inference,

    Y . Zhang, X. Gao, G. J. Duck, and A. Roychoudhury, “Program vulnerability repair via inductive inference,” inProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2022. New York, NY , USA: Association for Computing Machinery, 2022, p. 691–702. [Online]. Available: https://doi.org/10.1145/3533767.3534387

  51. [51]

    Varfix: balancing edit expressiveness and search effectiveness in automated program repair,

    C.-P. Wong, P. Santiesteban, C. K ¨astner, and C. Le Goues, “Varfix: balancing edit expressiveness and search effectiveness in automated program repair,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2021. New York, NY , USA: Association for C...

  52. [52]

    An analysis of the search spaces for generate and validate patch generation systems,

    F. Long and M. Rinard, “An analysis of the search spaces for generate and validate patch generation systems,” inProceedings of the 38th International Conference on Software Engineering, ser. ICSE ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 702–713. [Online]. Available: https://doi.org/10 .1145/2884781.2884872

  53. [53]

    A survey of symbolic execution techniques,

    R. Baldoni, E. Coppa, D. C. D’elia, C. Demetrescu, and I. Finocchi, “A survey of symbolic execution techniques,” vol. 51, no. 3, May

  54. [54]

    Available: https://doi.org/10.1145/3182657

    [Online]. Available: https://doi.org/10.1145/3182657

  55. [55]

    relifix: automated repair of software regressions,

    S. H. Tan and A. Roychoudhury, “relifix: automated repair of software regressions,” inProceedings of the 37th International Conference on Software Engineering - Volume 1, ser. ICSE ’15. IEEE Press, 2015, p. 471–482

  56. [56]

    Automatic inference of code transforms for patch generation,

    F. Long, P. Amidon, and M. Rinard, “Automatic inference of code transforms for patch generation,” inProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 727–739. [Online]. Available: https://doi.org/10.1145/3106237.3106253

  57. [57]

    Tbar: revisiting template-based automated program repair,

    K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyand ´e, “Tbar: revisiting template-based automated program repair,” inProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY , USA: Association for Computing Machinery, 2019, p. 31–42. [Online]. Available: https://doi.org/10.1145/3293882.3330577

  58. [58]

    Dlfix: context-based code transformation learning for automated program repair,

    Y . Li, S. Wang, and T. N. Nguyen, “Dlfix: context-based code transformation learning for automated program repair,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 602–614. [Online]. Available: https://doi.org/10.1145/3377811.3380345

  59. [59]

    Review4repair: Code review aided automatic program repairing,

    F. Huq, M. Hasan, M. M. A. Haque, S. Mahbub, A. Iqbal, and T. Ahmed, “Review4repair: Code review aided automatic program repairing,”Inf. Softw. Technol., vol. 143, no. C, Mar. 2022. [Online]. Available: https://doi.org/10.1016/j.infsof.2021.106765

  60. [60]

    How effective are neural networks for fixing security vulnerabilities,

    Y . Wu, N. Jiang, H. V . Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1282–1294. [Online...

  61. [61]

    Retrieval-augmented generation for knowledge- intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Kuttler, M. Lewis, W.-t. Yih, T. Rocktaschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge- intensive nlp tasks,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474. [Online]. Available: https://proceedings.neurips.cc/paper/2020/...

  62. [62]

    Retrieval-augmented generation for large language models: A survey,

    Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, M. Wang, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, 2023. [Online]. Available: https://arxiv.org/abs/2312.10997 Appendix This appendix provides background on Retrieval Aug- mented Generation, prompt templates and rep...

  63. [63]

    Retrieval Augmented Generation Retrieval Augmented Generation (RAG) incorporates external knowledge at inference time by combining the internal knowledge of an LLM with external sources such as vector databases, knowledge bases, and code reposito- ries [59], [60]. This is particularly beneficial for knowledge- intensive tasks where information changes dyn...

  64. [64]

    Performance of RA VEN on C based CWEs The results in table 7 further demonstrate the effective- ness of RA VEN on C-based vulnerabilities across memory- safety and access-control categories. Formemory-based vul- nerabilities(CWE-125), RA VEN successfully repairs17out of20instances, achieving a repair success rate of85.00%, which shows its ability to mitig...

  65. [65]

    Select the single candidate patch whose fix strategy is the best semantic referencefor repairing the vulnerable code

    Prompt Templates 14 Selector Agent Prompt Template: You are a < language > code vulnerability patch selection agent . Select the single candidate patch whose fix strategy is the best semantic referencefor repairing the vulnerable code . You are given : - a vulnerable code snippet - a CWE description - several candidate patches Your task is to choose the c...

  66. [66]

    Prioritize the underlying vulnerability cause over file names , function names , or CVE IDs

  67. [67]

    Prefer the candidate with the most similar mitigation strategy and security checks

  68. [68]

    Focus on the security logic introduced by the patch (forexample : bounds validation , length checks , null checks , state validation , safe iteration limits )

  69. [69]

    Do not rely on superficial textual overlap alone

  70. [70]

    If the CWE description and vulnerable code appear inconsistent , prioritize the actual vulnerable code pattern and the implied root cause in the code

  71. [71]

    If a candidate appears to already match logic present in the vulnerable code , treat it cautiously ; it may indicate dataset noise or a pre - patched snippet

  72. [72]

    Assume candidate patches may come from related or unrelated files ; file identity is not a deciding factor . Return exactly : Candidate x vulnerable_code : < vulnerable_code > cwe_description : < cwe_description > given_patches : Candidate 1 CVE : < candidate_1_cve_id > Filename : < candidate_1_filename > Patch diff : < candidate_1_patch_diff > --- Candid...

  73. [73]

    { r ef e re n c e_ g u id a n ce }

    Prioritize the concrete root cause visible in the target code . { r ef e re n c e_ g u id a n ce }

  74. [74]

    Apply the smallest targeted fix that prevents the unsafe behaviorwhilepreserving existing logic

  75. [75]

    Do not refactor , rename symbols , change formatting unnecessarily , or modify unrelated behavior

  76. [76]

    Do not introduce new helper functions , macros , types , or dependencies unless clearly implied by the provided code context

  77. [77]

    Reuse existing validation style , control flow , and error handling patterns when possible

  78. [78]

    Output rules : - Return only one```{ code_fence }```code block and no other text

    Ensure the result is syntactically valid and consistent with the surrounding code . Output rules : - Return only one```{ code_fence }```code block and no other text . - Inside the code block , provide only the minimal fixed code snippet ( s )forthe target file . - Do not output a unified diff unless explicitly requested . { r e f e r e n c e _ o u t p u t...