Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.
QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Large Language Models (LLMs) have demonstrated exceptional progress in multiple domains of software engineering including software vulnerability detection. Using LLMs to automate vulnerability detection in the wild is an important and relatively under-explored problem. In this paper we propose QuiLL, the first comprehensive evaluation framework for real-world vulnerability detection. Our solution consists of an end-to-end pipeline that draws together cutting-edge LLM optimization techniques and strategies specifically catering to the complexities of real-world vulnerability detection. Our specific contributions include (i) diverse prompt designs for vulnerability detection and reasoning (ii) a real-world vector data store constructed from the National Vulnerability Database to provide dynamic in-context learning, and (iii) a novel scoring metric which quantifies accuracy and reasoning quality of model predictions. QuiLL enables researchers to easily and systematically benchmark and compare the vulnerability detection capabilities of various LLMs and assess their readiness for deployment in actual code production pipelines.
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models
Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.