RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.
Rival Labs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code
RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.