PoVSmith automates PoV test generation for library vulnerabilities in apps via call paths and LLM feedback, correctly identifying 96% of entry points and producing effective attack tests in 55% of 33 evaluated Java pairs.
Unleashing Mayhem on Binary Code
7 Pith papers cite this work, alongside 347 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 4polarities
background 4representative citing papers
An agentic pipeline localizes the security-relevant function in 10 of 20 Ubuntu binary security updates and produces an accepted root-cause classification in 11 of 20, limited mainly by binary differencing coverage.
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
AFSAT realizes FastFourierSAT as a production GPU solver for heterogeneous symmetric pseudo-Boolean SAT via JAX-compiled continuous local search, with tailored DFT for stability and near-linear multi-accelerator scaling.
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
Empirical study of parallel continuous local search for SAT finds redundant constraints can slow convergence, CLS works as a hybrid sub-solver, and search stabilizes quickly due to saddle-dense objectives.
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.
citing papers explorer
-
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
Vulnsage, a multi-agent framework, generates 34.64% more exploits than prior tools and verified 146 zero-day vulnerabilities in real-world open-source libraries.
-
Benchmarking Mythos-Linked Bug Rediscovery
A benchmarking experiment finds low rediscovery rates for three models on six Mythos-linked bug tasks, with only six target matches across 54 attempts under controlled prompting.