ExploitBench decomposes LLM exploitation into 16 oracle-verified capability flags and finds public frontier models trigger crashes but rarely reach arbitrary code execution on 41 V8 bugs.
ARVO: Atlas of Re- producible Vulnerabilities for Open Source Software, August 2024
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 1polarities
background 1representative citing papers
Veritas detects memory corruption vulnerabilities in stripped binaries by combining static value-flow slicing, dual-view LLM reasoning, and multi-agent runtime validation, reporting 90% recall, zero false positives on 623 exhaustive cases, and discovery of a real Apple CVE.
A queueing framework segments vulnerability data with Gaussian mixture models, fits arrival/service/resource parameters by KL-divergence minimization, and reports 91-96% accuracy in estimating organizational cyber resources from timestamps.
PAGENT integrates static and dynamic program analysis guidance with an LLM agent to improve automated proof-of-concept generation success by 132% over prior agentic methods.
PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.
Reconstructing 6946 syzbot bug-fix lifecycles reveals that accepted kernel patches are non-local and reviewer-constrained, enabling PatchAdvisor to improve automated repair quality over baselines via retrieval and diagnostic guidance.
A queueing model of attack surfaces validated on supply-chain data shows AI automation can raise exploit rates and an RL policy cuts active vulnerabilities by over 90% without extra budget.
citing papers explorer
-
ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents
ExploitBench decomposes LLM exploitation into 16 oracle-verified capability flags and finds public frontier models trigger crashes but rarely reach arbitrary code execution on 41 V8 bugs.
-
Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries
Veritas detects memory corruption vulnerabilities in stripped binaries by combining static value-flow slicing, dual-view LLM reasoning, and multi-agent runtime validation, reporting 90% recall, zero false positives on 623 exhaustive cases, and discovery of a real Apple CVE.
-
Organizational Security Resource Estimation via Vulnerability Queueing
A queueing framework segments vulnerability data with Gaussian mixture models, fits arrival/service/resource parameters by KL-divergence minimization, and reports 91-96% accuracy in estimating organizational cyber resources from timestamps.
-
Program Analysis Guided LLM Agent for Proof-of-Concept Generation
PAGENT integrates static and dynamic program analysis guidance with an LLM agent to improve automated proof-of-concept generation success by 132% over prior agentic methods.
-
PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy
PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.
-
Beyond Crash-to-Patch: Patch Evolution for Linux Kernel Repair
Reconstructing 6946 syzbot bug-fix lifecycles reveals that accepted kernel patches are non-local and reviewer-constrained, enabling PatchAdvisor to improve automated repair quality over baselines via retrieval and diagnostic guidance.
-
A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense
A queueing model of attack surfaces validated on supply-chain data shows AI automation can raise exploit rates and an RL policy cuts active vulnerabilities by over 90% without extra budget.