Title resolution pending

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri · 2025

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.

Agentic Verification of Software Systems

cs.SE · 2025-11-21 · unverdicted · novelty 6.0

AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors

cs.CR · 2025-09-26 · unverdicted · novelty 6.0

Prompt injection attacks on agentic AI coding editors like Cursor and GitHub Copilot reach up to 84% success in executing malicious commands by poisoning external development resources.

To Copilot and Beyond: 22 AI Systems Developers Want Built

cs.SE · 2026-04-09 · unverdicted · novelty 5.0

Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not core craft.

citing papers explorer

Showing 5 of 5 citing papers.

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning cs.CR · 2026-04-22 · unverdicted · none · ref 42
LLMVD.js uses LLM agents to confirm 84% of taint-style vulnerabilities on public benchmarks (vs. <22% for prior tools) and generates validated exploits for 36 of 260 new packages (vs. ≤2 for traditional tools).
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges cs.AI · 2026-04-21 · unverdicted · none · ref 14
LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.
Agentic Verification of Software Systems cs.SE · 2025-11-21 · unverdicted · none · ref 40
AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.
"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors cs.CR · 2025-09-26 · unverdicted · none · ref 37
Prompt injection attacks on agentic AI coding editors like Cursor and GitHub Copilot reach up to 84% success in executing malicious commands by poisoning external development resources.
To Copilot and Beyond: 22 AI Systems Developers Want Built cs.SE · 2026-04-09 · unverdicted · none · ref 42
Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not core craft.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer