BioDefect is a new dataset for defect detection in bioinformatics software that improves average F1-scores by 29.61% to 38.04% over existing datasets when evaluated on nine language models.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.
CrossCommitVuln-Bench shows that 87% of 15 multi-commit Python CVEs are invisible to per-commit static analysis, with only 13% detection rate.
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
HYDRA is a hybrid model that uses heuristics plus deep embeddings and a VAE to predict latent zero-day vulnerabilities in patched functions from Chrome, Android, and ImageMagick.
citing papers explorer
-
BioDefect: The First Dataset for Defect Detection in Bioinformatics Software
BioDefect is a new dataset for defect detection in bioinformatics software that improves average F1-scores by 29.61% to 38.04% over existing datasets when evaluated on nine language models.
-
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns
VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.
-
CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis
CrossCommitVuln-Bench shows that 87% of 15 multi-commit Python CVEs are invisible to per-commit static analysis, with only 13% detection rate.
-
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
-
UntrustVul: An Automated Approach for Identifying Untrustworthy Alerts in Vulnerability Detection Models
UntrustVul identifies untrustworthy vulnerability predictions by marking lines that neither match historical vulnerability patterns nor influence vulnerable lines through dependencies, reporting AUC 70-88% and F1 82-94% on 115K predictions.
-
Benchmarking Mythos-Linked Bug Rediscovery
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
-
HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions
HYDRA is a hybrid model that uses heuristics plus deep embeddings and a VAE to predict latent zero-day vulnerabilities in patched functions from Chrome, Android, and ImageMagick.