RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.
Teo, Yiling Lou, Yebo Feng, Chong Wang, and Dinil Mon Divakaran
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
UBRI case study abstracts recurring blockchain design tensions into a framework linking academic research to deployment constraints and policy adaptation.
citing papers explorer
-
RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code
RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.
-
Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
-
Systematizing Blockchain Research Themes and Design Patterns: Insights from the University Blockchain Research Initiative (UBRI)
UBRI case study abstracts recurring blockchain design tensions into a framework linking academic research to deployment constraints and policy adaptation.