LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.
AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.
Prompt injection attacks on agentic AI coding editors like Cursor and GitHub Copilot reach up to 84% success in executing malicious commands by poisoning external development resources.
Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not core craft.
citing papers explorer
-
Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning
LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
-
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges
LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.
-
Agentic Verification of Software Systems
AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.
-
"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors
Prompt injection attacks on agentic AI coding editors like Cursor and GitHub Copilot reach up to 84% success in executing malicious commands by poisoning external development resources.
-
To Copilot and Beyond: 22 AI Systems Developers Want Built
Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not core craft.