BioDefect is a new dataset for defect detection in bioinformatics software that improves average F1-scores by 29.61% to 38.04% over existing datasets when evaluated on nine language models.
In: Proceedings of the 17th International Conference on Mining Software Repositories
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
VulKey introduces hierarchical expert knowledge abstractions to guide LLMs in vulnerability repair, reporting 31.5% accuracy on PrimeVul (7.6% above best baseline) and strong results on Vul4J.
CrossCommitVuln-Bench shows that 87% of 15 multi-commit Python CVEs are invisible to per-commit static analysis, with only 13% detection rate.
LLM vulnerability detection in Gemma-2-2b relies on sparse safety-detector circuits in early layers rather than direct vulnerability signatures, identified via circuit tracing and ablation on 472 C/C++ samples.
PromptAudit evaluates five prompting strategies across five LLMs on 1000 CVEs and finds chain-of-thought prompting yields the strongest overall performance while adaptive chain-of-thought and self-consistency reduce effective results.
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
A PRISMA-guided review of 21 papers shows RL work on C/C++ vulnerabilities focuses on fuzzing rather than detection or localization, proposes a taxonomy, and flags the lack of CFG-based state representations for vulnerable node identification.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
HYDRA is a hybrid model that uses heuristics plus deep embeddings and a VAE to predict latent zero-day vulnerabilities in patched functions from Chrome, Android, and ImageMagick.
citing papers explorer
-
CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis
CrossCommitVuln-Bench shows that 87% of 15 multi-commit Python CVEs are invisible to per-commit static analysis, with only 13% detection rate.