pith. sign in

arxiv: 2504.16057 · v5 · submitted 2025-04-22 · 💻 cs.CR

Neuro-symbolic Static Analysis with LLM-generated Vulnerability Patterns

Pith reviewed 2026-05-22 17:52 UTC · model grok-4.3

classification 💻 cs.CR
keywords neuro-symbolic static analysisLLM-generated vulnerability patternsautomated pattern refinementtrace-driven validationsoftware vulnerability detectionmulti-language static analysis
0
0 comments X

The pith

A neuro-symbolic framework uses LLMs to generate vulnerability patterns that match expert static analysis performance after only hours of automated refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MoCQ to combine large language models with symbolic static analysis for automatically producing vulnerability detection patterns. This matters because expert-written patterns require weeks of manual labor while the new method completes the task in hours and surfaces additional patterns and vulnerabilities that humans missed. A sympathetic reader would see this as a way to scale precise code scanning across languages without expanding the pool of security specialists. The approach extracts a domain-specific language for patterns and then applies an iterative loop that uses execution traces to validate and correct LLM outputs. If the loop works reliably, static analysis tools could incorporate far more detection rules than are currently feasible by hand.

Core claim

MoCQ extracts domain-specific languages for expressing vulnerability patterns and employs an iterative refinement loop with trace-driven symbolic validation that provides precise feedback for pattern correction. Evaluated on 12 vulnerability types across C/C++, Java, PHP, and JavaScript, the system reaches detection performance comparable to expert-developed patterns, requires only hours instead of weeks, uncovers 46 new patterns missed by experts, and finds 25 previously unknown vulnerabilities in real-world applications.

What carries the argument

The iterative refinement loop with trace-driven symbolic validation, which feeds execution traces back to the LLM to correct and improve generated patterns until they pass symbolic checks.

If this is right

  • Static analysis coverage can expand to new vulnerability classes without proportional growth in expert authoring time.
  • Detection rules become feasible for languages or frameworks that currently lack dedicated manual patterns.
  • Security teams could periodically regenerate and refresh pattern sets to track evolving codebases and threat models.
  • The same neuro-symbolic loop could be applied to generate patterns for other analysis goals such as performance or correctness bugs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the validation loop generalizes cleanly, similar methods might accelerate development of static checkers for domain-specific languages or new programming paradigms.
  • Widespread use could lower the cost of maintaining high-quality static analyzers and allow smaller teams to produce enterprise-grade security tools.
  • A practical next test would measure how often the loop converges within a fixed number of iterations on previously unseen vulnerability categories.
  • The method might reveal systematic blind spots in current manual pattern libraries by surfacing the 46 additional patterns it discovered.

Load-bearing premise

The iterative refinement loop supplies precise enough feedback from symbolic validation to fix LLM-generated patterns without systematically overlooking subtle vulnerabilities or creating false positives that would degrade real-world detection rates.

What would settle it

Run both MoCQ-generated patterns and expert patterns on the same large open-source codebase containing known ground-truth vulnerabilities and count whether MoCQ misses any that experts catch or reports substantially more false positives on clean code.

Figures

Figures reproduced from arXiv: 2504.16057 by Changhua Luo, Jianjia Yu, Josef Sarfati Korich, Junfeng Yang, Penghui Li, Songchen Yao, Yinzhi Cao.

Figure 1
Figure 1. Figure 1: Example real-world query in Joern, simplified for clarity. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The workflow of MoCQ. high false negatives, whereas overly general patterns (i.e., underfit￾ting) can produce excessive false positives. Achieving this balance between precision and recall remains a fundamental challenge. 3 MoCQ We design MoCQ, a novel static neuro-symbolic system. Its work￾flow is outlined in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simplified BNF grammar for Joern, extracted by [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

In this work, we present MoCQ, a neuro-symbolic static analysis framework that leverages large language models (LLMs) to automatically generate vulnerability detection patterns. This approach combines the precision and scalability of pattern-based static analysis with the semantic understanding and automation capabilities of LLMs. MoCQ extracts the domain-specific languages for expressing vulnerability patterns and employs an iterative refinement loop with trace-driven symbolic validation that provides precise feedback for pattern correction. We evaluated MoCQ on 12 vulnerability types across four languages (C/C++, Java, PHP, JavaScript). MoCQ achieves detection performance comparable to expert-developed patterns while requiring only hours of generation versus weeks of manual effort. Notably, MoCQ uncovered 46 new vulnerability patterns that security experts had missed and discovered 25 previously unknown vulnerabilities in real-world applications. MoCQ also outperforms prior approaches with stronger analysis capabilities and broader applicability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MoCQ, a neuro-symbolic static analysis framework that uses LLMs to automatically generate vulnerability detection patterns. It extracts domain-specific languages for patterns and employs an iterative refinement loop with trace-driven symbolic validation to correct patterns. Evaluated on 12 vulnerability types across C/C++, Java, PHP, and JavaScript, the paper claims detection performance comparable to expert-developed patterns (but in hours rather than weeks), the discovery of 46 new patterns missed by experts, and 25 previously unknown vulnerabilities in real-world applications, while also outperforming prior approaches.

Significance. If the quantitative claims are substantiated, the work would be significant for the field of software security and static analysis. It offers a practical method to scale vulnerability pattern creation beyond manual expert effort, leveraging LLM semantic capabilities alongside symbolic precision to potentially improve coverage and reduce development time for detection rules.

major comments (3)
  1. [Abstract] Abstract: The central claims of 'comparable performance' to expert patterns and the discovery of 46 new patterns plus 25 unknown vulnerabilities are stated without any quantitative metrics, precision/recall values, baselines, statistical significance, or description of independent verification procedures for the new findings.
  2. [Method and Evaluation] Iterative refinement loop (method and evaluation sections): The trace-driven symbolic validation is presented as supplying precise corrective feedback, but the manuscript does not address how the approach handles incomplete traces (e.g., missing data-dependent branches, aliasing, or external library calls), which risks accepting overly narrow or loose patterns and directly affects both the performance comparability claim and the validity of the newly reported vulnerabilities.
  3. [Evaluation] Real-world evaluation: The report of 25 previously unknown vulnerabilities requires explicit details on the tested applications, the exact verification process used to confirm they are true positives not caught by existing tools, and controls to rule out false positives introduced by the LLM-generated patterns.
minor comments (2)
  1. [Related Work] The paper would benefit from a dedicated related-work section that positions MoCQ against recent LLM-based code analysis and vulnerability detection efforts with explicit comparisons.
  2. [Method] Notation for the extracted domain-specific languages and the feedback signals in the refinement loop could be formalized with small examples or pseudocode for improved clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight opportunities to strengthen the presentation of quantitative results, clarify methodological limitations, and provide greater transparency in the real-world evaluation. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of 'comparable performance' to expert patterns and the discovery of 46 new patterns plus 25 unknown vulnerabilities are stated without any quantitative metrics, precision/recall values, baselines, statistical significance, or description of independent verification procedures for the new findings.

    Authors: We agree that the abstract would be strengthened by including key quantitative metrics. The full paper reports per-vulnerability precision, recall, and F1 scores (Table 2), direct comparisons to expert-written patterns, and baseline results against prior neuro-symbolic and LLM-based approaches. The 25 new vulnerabilities were verified through manual expert review and cross-validation against existing tools. In the revision we will update the abstract to include average precision/recall across the 12 types, a brief statement on the expert comparison, and a note that new vulnerabilities were independently confirmed by security researchers. revision: yes

  2. Referee: [Method and Evaluation] Iterative refinement loop (method and evaluation sections): The trace-driven symbolic validation is presented as supplying precise corrective feedback, but the manuscript does not address how the approach handles incomplete traces (e.g., missing data-dependent branches, aliasing, or external library calls), which risks accepting overly narrow or loose patterns and directly affects both the performance comparability claim and the validity of the newly reported vulnerabilities.

    Authors: Section 3.3 explains that the iterative loop collects multiple concrete execution traces from test suites and uses symbolic validation to check pattern soundness across those traces. This design reduces the impact of any single incomplete trace. We acknowledge, however, that the manuscript lacks an explicit discussion of residual risks from aliasing, data-dependent branches, and external calls. We will add a dedicated limitations paragraph in the discussion section that describes these cases, explains how the multi-trace requirement and symbolic over-approximation provide safeguards, and notes that patterns are rejected if they fail on any available trace. This addition will not alter the reported performance numbers but will qualify the claims appropriately. revision: partial

  3. Referee: [Evaluation] Real-world evaluation: The report of 25 previously unknown vulnerabilities requires explicit details on the tested applications, the exact verification process used to confirm they are true positives not caught by existing tools, and controls to rule out false positives introduced by the LLM-generated patterns.

    Authors: Section 5.4 already lists the ten open-source applications evaluated (specific versions of projects in C/C++, Java, PHP, and JavaScript) and describes the verification workflow: running MoCQ patterns, comparing outputs against expert patterns and commercial tools, followed by manual inspection by two independent security researchers. To fully address the comment we will expand this section with a summary table that enumerates each application, the number of new vulnerabilities found per project, the step-by-step verification procedure (including false-positive controls on a held-out set of known-safe code), and confirmation that none were reported by existing tools. These additions will make the evidence for the 25 new vulnerabilities fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external symbolic validation and empirical testing.

full rationale

The paper's core chain—LLM pattern generation followed by iterative trace-driven symbolic validation and real-world evaluation—does not reduce any claimed result to its inputs by construction. Performance comparability and discovery of new patterns/vulnerabilities are measured against independent expert baselines and external codebases rather than fitted parameters or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes from prior author work are invoked to force the outcomes. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be steered to produce correct patterns via symbolic feedback and that the resulting patterns generalize beyond the evaluated cases.

axioms (1)
  • domain assumption LLMs supplied with appropriate prompts and symbolic feedback can generate vulnerability patterns whose precision and recall match those of expert-written rules.
    This premise is required for the performance-comparison claim in the abstract.

pith-pipeline@v0.9.0 · 5692 in / 1205 out tokens · 52543 ms · 2026-05-22T17:52:07.114210+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills

    cs.CR 2026-05 conditional novelty 7.0

    SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated ski...

  2. Generating Complex Code Analyzers from Natural Language Questions

    cs.SE 2026-05 unverdicted novelty 7.0

    Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studi...

  3. Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis

    cs.SE 2026-04 unverdicted novelty 6.0

    A structured JSON intermediate representation for LLM-generated static analysis queries outperforms both direct generation and agentic tool use, with gains of 15-25 percentage points on large models.

  4. BugScope: Learn to Find Bugs Like Human

    cs.SE 2025-07 conditional novelty 6.0

    BugScope structures LLM bug detection into three human-mirroring steps and distills guidelines from examples, reaching 0.87 F1 on 33 real bugs while outperforming Claude and Cursor tools and uncovering 184 new issues ...

  5. Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities

    cs.CR 2025-09 unverdicted novelty 5.0

    A systematic review of neuro-symbolic AI in cybersecurity finds that deeper integration and causal reasoning improve performance across intrusion detection and vulnerability tasks, while identifying barriers and a res...

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · cited by 5 Pith papers

  1. [1]

    , author =

    2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages", author = "Feng, Zhangyin and Guo, Daya and Tang, Duyu and Duan, Nan and Feng, Xiaocheng and Gong, Ming and Shou, Linjun and Qin, Bing and Liu, Ting and Jiang, Daxin and Zhou, Ming. InFindings of the Association for Computational Linguistics: EMNLP 2020. Online

  2. [2]

    Joern for Prototype Pollution

    2023. Joern for Prototype Pollution. https://github.com/Tobiasfro/joern/commits/ master/

  3. [3]

    CodeQL 2.14.2 Change Log

    2024. CodeQL 2.14.2 Change Log. https://codeql.github.com/docs/codeql- overview/codeql-changelog/codeql-cli-2.14.2/#javascript-typescript

  4. [4]

    Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs

    2025. Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. https://github.com/joernio/joern

  5. [5]

    Michael Backes, Konrad Rieck, Malte Skoruppa, Ben Stock, and Fabian Yamaguchi

  6. [6]

    In Pro- ceedings of the 2nd IEEE European Symposium on Security and Privacy (EuroS&P)

    Efficient and flexible discovery of php application vulnerabilities. In Pro- ceedings of the 2nd IEEE European Symposium on Security and Privacy (EuroS&P) . Paris, France

  7. [7]

    Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A survey of symbolic execution techniques.ACM Computing Surveys (CSUR) 51, 3 (2018), 1–39

  8. [8]

    Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2021. Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering 48, 9 (2021), 3280–3296

  9. [9]

    Wei Chang, Chunyang Ye, and Hui Zhou. 2024. Fine-Tuning Pre-trained Model with Optimizable Prompt Learning for Code Vulnerability Detection. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE) . 108–119

  10. [10]

    CodeQL. 2024. CodeQL. https://codeql.github.com/

  11. [11]

    CodeQL. 2024. CodeQL Hardware Requirements. https://docs.github. com/en/code-security/code-scanning/creating-an-advanced-setup-for-code- scanning/recommended-hardware-resources-for-running-codeql

  12. [12]

    Eric Cornelissen, Mikhail Shcherbakov, and Musard Balliu. 2024. GHunter: Universal Prototype Pollution Gadgets in JavaScript Runtimes. In Proceedings of the 33th USENIX Security Symposium (Security) . Philadelphia, PA, USA

  13. [13]

    Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages . 238–252

  14. [14]

    Johannes Dahse and Thorsten Holz. 2014. Static detection of 𝑆𝑒𝑐𝑜𝑛𝑑 − 𝑂𝑟𝑑𝑒𝑟 vulnerabilities in web applications. In 23rd USENIX Security Symposium (USENIX Security 14). 989–1003

  15. [15]

    Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, and Baishakhi Ray. 2023. CONCORD: Clone-Aware Con- trastive Learning for Source Code. In Proceedings of the 32nd ACM SIGSOFT Inter- national Symposium on Software Testing and Analysis (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 26–38....

  16. [16]

    Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. 2025. Vulnera- bility detection with code language models: How far are we?. In Proceedings of the 47th International Conference on Software Engineering (ICSE) . Ottawa, Ontario, Canada

  17. [17]

    IA Dudina and AA Belevantsev. 2017. Using static symbolic execution to detect buffer overflows. Programming and Computer Software 43, 5 (2017), 277–288

  18. [18]

    2023.Detection of Prototype Pollution Using Joern: Joern’s Detection Capability Compared to CodeQL’s

    Tobias Fröberg. 2023.Detection of Prototype Pollution Using Joern: Joern’s Detection Capability Compared to CodeQL’s. Master’s thesis

  19. [19]

    Osman Hasan and Sofiene Tahar. 2015. Formal verification methods. In Encyclo- pedia of Information Science and Technology, Third Edition . IGI Global Scientific Publishing, 7162–7170

  20. [20]

    Yuchen Ji, Ting Dai, Zhichao Zhou, Yutian Tang, and Jingzhu He. 2025. Artemis: Toward Accurate Detection of Server-Side Request Forgeries through LLM- Assisted Inter-Procedural Path-Sensitive Taint Analysis. InProceedings of the 2025 Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Singapore

  21. [21]

    Joern. 2024. Joern Hardware Requirements. https://docs.joern.io/installation/ #configuring-the-jvm-for-handling-large-codebases

  22. [22]

    Zifeng Kang, Song Li, and Yinzhi Cao. 2022. Probe the Proto: Measuring Client- Side Prototype Pollution Vulnerabilities of One Million Real-world Websites.. In Proceedings of the 2022 Annual Network and Distributed System Security Sympo- sium (NDSS). San Diego, CA, USA

  23. [23]

    Gwangmu Lee, Woochul Shim, and Byoungyoung Lee. 2021. Constraint-guided directed greybox fuzzing. In Proceedings of the 30th USENIX Security Symposium (Security). Virtual event

  24. [24]

    Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2024. Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. Proc. ACM Program. Lang. 8, OOPSLA1 (April 2024). doi:10.1145/3649828

  25. [25]

    Penghui Li and Wei Meng. 2021. LChecker: Detecting Loose Comparison Bugs in PHP. In Proceedings of the Web Conference (WWW) . Ljubljana, Slovenia

  26. [26]

    Song Li, Mingqing Kang, Jianwei Hou, and Yinzhi Cao. 2021. Detecting Node. js prototype pollution vulnerabilities via object lookup analysis. In Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) . Athens, Greece

  27. [27]

    Song Li, Mingqing Kang, Jianwei Hou, and Yinzhi Cao. 2022. Mining node. js vulnerabilities via object dependence graph and query. In 31st USENIX Security Symposium (USENIX Security 22) . 143–160

  28. [28]

    Tao Li, Gang Li, Zhiwei Deng, Bryan Wang, and Yang Li. 2023. A Zero-Shot Language Agent for Computer Control with Structured Reflection. In Findings of the Association for Computational Linguistics: EMNLP 2023 . Association for Computational Linguistics, Singapore. https://aclanthology.org/2023.findings- emnlp.753/

  29. [29]

    Ziyang Li, Saikat Dutta, and Mayur Naik. 2025. Llm-assisted static analysis for detecting security vulnerabilities. In Proceedings of the 13th International Conference on Learning Representations (ICLR) . Singapore

  30. [30]

    Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2021), 2244–2258

  31. [31]

    Zhengyu Liu, Kecheng An, and Yinzhi Cao. 2024. Undefined-oriented Program- ming: Detecting and Chaining Prototype Pollution Gadgets in Node.js Template Engines for Malicious Consequences. In 2024 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA

  32. [32]

    Guilong Lu, Xiaolin Ju, Xiang Chen, Wenlong Pei, and Zhilong Cai. 2024. GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning. Journal of Systems and Software (2024)

  33. [33]

    Changhua Luo, Penghui Li, and Wei Meng. 2022. TChecker: Precise Static Inter- Procedural Analysis for Detecting Taint-Style Vulnerabilities in PHP Applications. In Proceedings of the 29th ACM Conference on Computer and Communications Security (CCS). Los Angeles, CA, USA

  34. [34]

    Qiheng Mao, Zhenhao Li, Xing Hu, Kui Liu, Xin Xia, and Jianling Sun. 2024. Towards Explainable Vulnerability Detection with Large Language Models. https: //api.semanticscholar.org/CorpusID:270521866

  35. [35]

    Daniel D McCracken and Edwin D Reilly. 2003. Backus-naur form (bnf). In Encyclopedia of computer science . 129–131

  36. [36]

    Neves, and Miguel Correia

    Ibéria Medeiros, Nuno F. Neves, and Miguel Correia. 2011. Automatic detection and correction of web application vulnerabilities using data mining to predict false positives. In Proceedings of the 21st International World Wide Web Conference (WWW). Seoul, Korea

  37. [37]

    MITRE Corporation. 2021. CVE-2021-44228: Apache Log4j2 Remote Code Execu- tion Vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2021-44228. Accessed: 2025-06-22

  38. [38]

    Mohammad Mahdi Mohajer, Reem Aleithan, Nima Shiri Harzevili, Moshi Wei, Alvine Boaye Belle, Hung Viet Pham, and Song Wang. 2024. Effectiveness of ChatGPT for Static Analysis: How Far Are We?. In Proceedings of the 1st ACM In- ternational Conference on AI-Powered Software (AIware 2024). Association for Com- puting Machinery, New York, NY, USA, 151–160. doi...

  39. [39]

    N/A. 2024. CodeQL 2.16.3 Change Log. https://codeql.github.com/docs/codeql- overview/codeql-changelog/codeql-cli-2.16.3/#javascript-typescript

  40. [40]

    N/A. 2025. Content management system for eCommerce apps created on Sylius platform. Built with Sylius code quality, flexibility, BDD. https://github.com/ 11 Conference’17, July 2017, Washington, DC, USA Penghui Li, Songchen Yao, Josef Sarfati Korich, Changhua Luo, Jianjia Yu, Yinzhi Cao, and Junfeng Yang BitBagCommerce/SyliusCmsPlugin

  41. [41]

    N/A. 2025. Create and parse HTTP Content-Type header. https://www.npmjs. com/package/content-type

  42. [42]

    N/A. 2025. ES2015-proxy. https://www.npmjs.com/package/es2015-proxy? activeTab=readme

  43. [43]

    N/A. 2025. Fork is an easy to use open source CMS using Symfony Components. https://github.com/forkcms/forkcms

  44. [44]

    N/A. 2025. Gracefully parse ECMAScript static imports. https://www.npmjs. com/package/parse-static-imports

  45. [45]

    N/A. 2025. Joern Documentation: Node-Type Steps. https://docs.joern.io/cpgql/ reference-card/

  46. [46]

    N/A. 2025. Native cross-platform Web Workers. Works in published npm modules. https://www.npmjs.com/package/web-worker

  47. [47]

    N/A. 2025. Static file serving middleware. https://www.npmjs.com/package/koa- send

  48. [48]

    OpenAI. 2024. GPT-4o Technical Report. https://openai.com/index/gpt-4o. Ac- cessed: 2025-04-07

  49. [49]

    Terence Parr. 2013. The definitive ANTLR 4 reference. (2013)

  50. [50]

    Marcel Pham, Van Thuan, Manh Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS) . Dallas, TX

  51. [51]

    The Chromium Project. 2025. CodeQL Support in Chromium. https://chromium.googlesource.com/chromium/src/+/refs/tags/126.0.6436. 1/tools/codeql/README.md

  52. [52]

    Matthew Renze and Erhan Guven. 2024. The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models. In2024 2nd International Conference on Foundation and Large Language Models (FLLM)

  53. [53]

    Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In 2018 17th IEEE international conference on machine learning and applications (ICMLA) . IEEE, 757–762

  54. [54]

    Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. 2016. On fast large-scale program analysis in datalog. In Proceedings of the 25th International Conference on Compiler Construction . 196–206

  55. [55]

    Apache Logging Services. 2021. Log4Shell: RCE Vulnerability in Log4j (CVE-2021- 44228). https://logging.apache.org/log4j/2.x/security.html. Accessed: 2025-06-22

  56. [56]

    Amazon Web Services. 2025. Amazon CodeGuru. https://aws.amazon.com/ codeguru/

  57. [57]

    Mikhail Shcherbakov, Musard Balliu, and Cristian-Alexandru Staicu. 2023. Silent spring: Prototype pollution leads to remote code execution in Node. js. In Pro- ceedings of the 32nd USENIX Security Symposium (Security) . Anaheim, CA, USA

  58. [58]

    Youkun Shi, Yuan Zhang, Tianhao Bai, Lei Zhang, Xin Tan, and Min Yang. 2024. RecurScan: Detecting Recurring Vulnerabilities in PHP Web Applications. In Proceedings of the Web Conference (WWW) . Singapore

  59. [59]

    Nima Shiri Harzevili, Alvine Boaye Belle, Junjie Wang, Song Wang, Zhen Ming Jiang, and Nachiappan Nagappan. 2024. A systematic literature review on auto- mated software vulnerability detection using machine learning. Comput. Surveys 57, 3 (2024), 1–36

  60. [60]

    Wei Su, Yifei Liu, Gomathi Ganesan, Gerard Holzmann, Scott Smolka, Erez Zadok, and Geoff Kuenning. 2021. Model-checking support for file system development. In Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems. 103–110

  61. [61]

    Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction

  62. [62]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine El Hage, Baptiste Roziere, Jie Ren, Laurent Sifre, Jean-Rémi King, Thomas Scialom, Gabriel Synnaeve, Nicolas Usunier, Hervé Jégou, and Edouard Grave

  63. [63]

    https://github.com/meta- llama/codellama

    Code Llama: Open Foundation Models for Code. https://github.com/meta- llama/codellama

  64. [64]

    Erik Trickel, Fabio Pagani, Chang Zhu, Lukas Dresel, Giovanni Vigna, Christopher Kruegel, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé. 2023. Toss a fault to your witcher: Applying grey-box coverage-guided mutational fuzzing to detect sql and command injection vulnerabilities. In Proceedings of the 44th IEEE Symposium on Security and Priva...

  65. [65]

    Coskun, and Gianluca Stringhini

    Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse K. Coskun, and Gianluca Stringhini. 2024. LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. In Proceedings of the 45th IEEE Symposium on Security and Privacy (S&P). San Francisco, CA, USA

  66. [66]

    Saurous, and Yoon Kim

    Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, and Yoon Kim

  67. [67]

    In Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS)

    Grammar Prompting for Domain-Specific Language Generation with Large Language Models. In Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS) . New Orleans, LA, USA

  68. [68]

    Chengpeng Wang, Yifei Gao, Wuqi Zhang, Xuwei Liu, Qingkai Shi, and Xi- angyu Zhang. 2024. LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis. In Proceedings of the 2024 Empirical Methods in Natural Language Processing (EMNLP) . Miami, FL, USA

  69. [69]

    Rongcun Wang, Senlei Xu, Yuan Tian, Xingyu Ji, Xiaobing Sun, and Shujuang Jiang. 2024. SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT. Computers & Security (2024). doi:10.1016/j.cose. 2024.103994

  70. [70]

    Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:258685677

  71. [71]

    Joty, and Steven C

    Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Under- standing and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . Online and Punta Cana, Dominican Republic

  72. [72]

    Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceedings of the 35th IEEE Symposium on Security and Privacy (S&P) . San Jose, CA, USA

  73. [73]

    Hua Yan, Yulei Sui, Shiping Chen, and Jingling Xue. 2018. Spatio-temporal context reduction: A pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In Proceedings of the 40th International Conference on Software Engineering. 327–337

  74. [74]

    Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. 2006. Using model checking to find serious file system errors. ACM Transactions on Computer Systems (TOCS) 24, 4 (2006), 393–423

  75. [75]

    Xin Zhou, Ting Zhang, and David Lo. 2024. Large language model for vulner- ability detection: Emerging results and future directions. In Proceedings of the 46th International Conference on Software Engineering (ICSE) . Lisbon, Portugal

  76. [76]

    Jin Zhu, Hui Ge, Yun Zhou, Xiao Jin, Rui Luo, and Yanchen Sun. 2024. Detecting Source Code Vulnerabilities Using Fine-Tuned Pre-Trained LLMs. In2024 IEEE 17th International Conference on Signal Processing (ICSP) . 238–242. 12