QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild

· 2025 · cs.CR · arXiv 2510.04056

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) have demonstrated exceptional progress in multiple domains of software engineering including software vulnerability detection. Using LLMs to automate vulnerability detection in the wild is an important and relatively under-explored problem. In this paper we propose QuiLL, the first comprehensive evaluation framework for real-world vulnerability detection. Our solution consists of an end-to-end pipeline that draws together cutting-edge LLM optimization techniques and strategies specifically catering to the complexities of real-world vulnerability detection. Our specific contributions include (i) diverse prompt designs for vulnerability detection and reasoning (ii) a real-world vector data store constructed from the National Vulnerability Database to provide dynamic in-context learning, and (iii) a novel scoring metric which quantifies accuracy and reasoning quality of model predictions. QuiLL enables researchers to easily and systematically benchmark and compare the vulnerability detection capabilities of various LLMs and assess their readiness for deployment in actual code production pipelines.

representative citing papers

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

cs.SE · 2026-06-03 · unverdicted · novelty 3.0

Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models cs.SE · 2026-06-03 · unverdicted · none · ref 16 · internal anchor
Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.

QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild

fields

years

verdicts

representative citing papers

citing papers explorer