A new queryable binary dataset combining cross-build diversity, temporal history, and CVE labels with linked metadata for vulnerability research.
CVEfixes: automated collection of vulner- abilities and their fixes from open-source software
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9representative citing papers
CodeQL detected 171 CVEs total, with 83 caught by a prior version before the fix; detections were often actionable within the vulnerable file but not stable across tool versions.
VulKey introduces hierarchical expert knowledge abstractions to guide LLMs in vulnerability repair, reporting 31.5% accuracy on PrimeVul (7.6% above best baseline) and strong results on Vul4J.
CrossCommitVuln-Bench shows that 87% of 15 multi-commit Python CVEs are invisible to per-commit static analysis, with only 13% detection rate.
PromptAudit evaluates five prompting strategies across five LLMs on 1000 CVEs and finds chain-of-thought prompting yields the strongest overall performance while adaptive chain-of-thought and self-consistency reduce effective results.
Reconstructing 6946 syzbot bug-fix lifecycles reveals that accepted kernel patches are non-local and reviewer-constrained, enabling PatchAdvisor to improve automated repair quality over baselines via retrieval and diagnostic guidance.
RAVEN combines agentic RAG, iterative repair, and a cross-file Curator Agent to achieve 83.13% repair success on diverse real-world CVEs using local open-source LLMs.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
The authors present a registered report outlining their planned large-scale empirical study of vulnerability communication in pull requests by different account types.
citing papers explorer
-
PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection
PromptAudit evaluates five prompting strategies across five LLMs on 1000 CVEs and finds chain-of-thought prompting yields the strongest overall performance while adaptive chain-of-thought and self-consistency reduce effective results.