PoVSmith automates PoV test generation for library vulnerabilities in apps via call paths and LLM feedback, correctly identifying 96% of entry points and producing effective attack tests in 55% of 33 evaluated Java pairs.
Unleashing Mayhem on Binary Code
5 Pith papers cite this work, alongside 347 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 4polarities
background 4representative citing papers
An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
citing papers explorer
-
Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software
PoVSmith automates PoV test generation for library vulnerabilities in apps via call paths and LLM feedback, correctly identifying 96% of entry points and producing effective attack tests in 55% of 33 evaluated Java pairs.
-
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.
-
CuLifter: Lifting GPU Binaries to Typed IR
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
-
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
-
Benchmarking Mythos-Linked Bug Rediscovery
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.