PoCGen: Generating proof-of-concept exploits for vulnerabilities in Npm packages.CoRR, abs/2506.04962

URLhttps://openreview · 2025 · cs.CR · arXiv 2506.04962

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

open full Pith review browse 13 citing papers arXiv PDF

abstract

Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are informal and often incomplete, and because it requires a detailed understanding of how inputs passed to potentially vulnerable APIs may reach security-relevant sinks. In this paper, we present PoCGen, a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages. The approach is the first to address this task by combining the complementary strengths of large language models (LLMs), e.g., to understand informal vulnerability reports, with static analysis, e.g., to identify taint paths, and dynamic analysis, e.g., to validate generated exploits. PoCGen successfully generates exploits for 77% of the vulnerabilities in the SecBench$.$js dataset. This success rate significantly outperforms a recent baseline (by 45 absolute percentage points), while imposing an average cost of only $0.02 per generated exploit. Moreover, PoCGen generates six successful exploits for recent real-world vulnerabilities, five of which are now included in their respective vulnerability reports.

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

cs.CR · 2026-05-26 · unverdicted · novelty 7.0

SEC-bench Pro benchmark with 183 real vulnerabilities shows frontier LLM coding agents achieve at most 38.8% success on SpiderMonkey and 32% on V8.

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).

Refploit: Facilitating Exploit Construction via Code-Agent Trajectory Repair

cs.SE · 2026-07-02 · unverdicted · novelty 6.0

Refploit repairs code-agent trajectories for Java exploit reproduction via differential validation and focused recovery constraints, achieving 80.2% success on 172 references with 64.3% relative improvement.

VeriPort: Automated and Verified Patch Backporting at Scale

cs.CR · 2026-06-21 · unverdicted · novelty 6.0

VeriPort is an end-to-end agentic system that backports vulnerability patches to all affected versions of a package at scale while producing verification evidence, achieving 95.3% success on 128 benchmark tasks and generating over 5,000 verified patches across 169 CVEs.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

cs.SE · 2026-04-13 · conditional · novelty 6.0

AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.

Program Analysis Guided LLM Agent for Proof-of-Concept Generation

cs.SE · 2026-04-08 · unverdicted · novelty 6.0

PAGENT integrates static and dynamic program analysis guidance with an LLM agent to improve automated proof-of-concept generation success by 132% over prior agentic methods.

PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

cs.CR · 2026-04-08 · unverdicted · novelty 6.0

PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.

Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing

cs.CR · 2026-04-05 · conditional · novelty 6.0

LiveFuzz extends directed greybox fuzzing with abstract path mapping and risk-based mutation to expose library vulnerabilities from client programs on a 61-case dataset, reaching more target paths and triggering three vulnerabilities no baseline found.

FORGE: Multi-Agent Graduated Exploitation and Detection Engineering

cs.CR · 2026-06-02 · unverdicted · novelty 5.0

FORGE deploys a fixed five-agent pipeline on 603 CVEs to achieve 67.8% L1+ exploitation success at $1.50 per CVE while generating detection rules whose grounding improves with deeper exploitation traces.

ContraFix: Skill-Enhanced Contrastive Runtime Analysis for Vulnerability Repair

cs.SE · 2026-05-17 · unverdicted · novelty 5.0 · 2 refs

ContraFix uses contrastive runtime analysis plus a dual-track skill base to reach 92% resolution on SEC-Bench and 73.8% on PatchEval while improving semantic correctness of patches.

V2E: Validating Smart Contract Vulnerabilities through Profit-driven Exploit Generation and Execution

cs.SE · 2026-04-15 · unverdicted · novelty 5.0

V2E automates PoC generation, triggerability and profitability validation, and iterative refinement using LLMs to confirm exploitable smart contract vulnerabilities, outperforming baselines on 264 labeled contracts.

A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection

cs.SE · 2026-04-06 · unverdicted · novelty 5.0

Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.

citing papers explorer

Showing 13 of 13 citing papers.

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks? cs.CR · 2026-05-26 · unverdicted · none · ref 13 · internal anchor
SEC-bench Pro benchmark with 183 real vulnerabilities shows frontier LLM coding agents achieve at most 38.8% success on SpiderMonkey and 32% on V8.
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning cs.CR · 2026-04-22 · unverdicted · none · ref 48 · internal anchor
LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
Refploit: Facilitating Exploit Construction via Code-Agent Trajectory Repair cs.SE · 2026-07-02 · unverdicted · none · ref 25 · internal anchor
Refploit repairs code-agent trajectories for Java exploit reproduction via differential validation and focused recovery constraints, achieving 80.2% success on 172 references with 64.3% relative improvement.
VeriPort: Automated and Verified Patch Backporting at Scale cs.CR · 2026-06-21 · unverdicted · none · ref 35 · internal anchor
VeriPort is an end-to-end agentic system that backports vulnerability patches to all affected versions of a package at scale while producing verification evidence, achieving 95.3% success on 128 benchmark tasks and generating over 5,000 verified patches across 169 CVEs.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs cs.CR · 2026-05-15 · unverdicted · none · ref 46 · internal anchor
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection cs.SE · 2026-04-13 · conditional · none · ref 52 · internal anchor
AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.
Program Analysis Guided LLM Agent for Proof-of-Concept Generation cs.SE · 2026-04-08 · unverdicted · none · ref 38 · internal anchor
PAGENT integrates static and dynamic program analysis guidance with an LLM agent to improve automated proof-of-concept generation success by 132% over prior agentic methods.
PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy cs.CR · 2026-04-08 · unverdicted · none · ref 12 · internal anchor
PoC-Adapt improves automated PoC exploit generation reliability by 25% and lowers cost using semantic state validation and RL adaptive policies, verifying 12 PoCs from 80 recent CVE attempts at $0.42 each.
Triggering and Detecting Exploitable Library Vulnerability from the Client by Directed Greybox Fuzzing cs.CR · 2026-04-05 · conditional · none · ref 60 · internal anchor
LiveFuzz extends directed greybox fuzzing with abstract path mapping and risk-based mutation to expose library vulnerabilities from client programs on a 61-case dataset, reaching more target paths and triggering three vulnerabilities no baseline found.
FORGE: Multi-Agent Graduated Exploitation and Detection Engineering cs.CR · 2026-06-02 · unverdicted · none · ref 28 · internal anchor
FORGE deploys a fixed five-agent pipeline on 603 CVEs to achieve 67.8% L1+ exploitation success at $1.50 per CVE while generating detection rules whose grounding improves with deeper exploitation traces.
ContraFix: Skill-Enhanced Contrastive Runtime Analysis for Vulnerability Repair cs.SE · 2026-05-17 · unverdicted · none · ref 43 · 2 links · internal anchor
ContraFix uses contrastive runtime analysis plus a dual-track skill base to reach 92% resolution on SEC-Bench and 73.8% on PatchEval while improving semantic correctness of patches.
V2E: Validating Smart Contract Vulnerabilities through Profit-driven Exploit Generation and Execution cs.SE · 2026-04-15 · unverdicted · none · ref 46 · internal anchor
V2E automates PoC generation, triggerability and profitability validation, and iterative refinement using LLMs to confirm exploitable smart contract vulnerabilities, outperforming baselines on 264 labeled contracts.
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection cs.SE · 2026-04-06 · unverdicted · none · ref 47 · internal anchor
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.

PoCGen: Generating proof-of-concept exploits for vulnerabilities in Npm packages.CoRR, abs/2506.04962

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer