D-CIPHER: Dynamic collaborative intel- ligent multi-agent system with planner and heterogeneous executors for offensive security.CoRR, abs/2502.10931

URLhttps://arxiv · 2025 · arXiv 2502.10931

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

cs.CR · 2026-05-28 · unverdicted · novelty 7.0

Honeyval evaluates LLM HTTP honeypots with AI attackers and shows they produce longer interactions, lower detection rates, and cost advantages over rule-based baselines.

Cybersecurity AI (CAI) Dataset

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

cs.CR · 2026-05-25 · unverdicted · novelty 7.0

CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.

CrackMeBench: Binary Reverse Engineering for Agents

cs.SE · 2026-05-11 · accept · novelty 7.0

CrackMeBench introduces 20 deterministic binary validation tasks and reports GPT-5.5 solving 11/12 generated ones at pass@3 while Claude and Kimi lag, especially on harder tasks.

Dynamic Cyber Ranges

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

Dynamic Cyber Ranges with LLM defender agents reduce attacker success to 0-55% and preserve evaluation headroom as models advance by using comparable capabilities on both sides.

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks

cs.CR · 2026-04-18 · unverdicted · novelty 5.0

Claude 4.5 Opus reaches 59% solve rate on offensive cyber CTF tasks, with a Kali Linux environment adding 9.5 percentage points over Ubuntu while prompt engineering often hurts performance in equipped setups.

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

cs.CR · 2026-04-20

citing papers explorer

Showing 9 of 9 citing papers after filters.

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots cs.CR · 2026-05-28 · unverdicted · none · ref 29
Honeyval evaluates LLM HTTP honeypots with AI attackers and shows they produce longer interactions, lower detection rates, and cost advantages over rule-based baselines.
Cybersecurity AI (CAI) Dataset cs.CR · 2026-05-27 · unverdicted · none · ref 23
CAI Dataset is presented as the largest described corpus of LLM-driven hacker trajectories, with the claim that operator data concentration in frontier-model providers creates a major security risk best addressed by on-premise specialized LLMs.
CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly cs.CR · 2026-05-25 · unverdicted · none · ref 62
CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.
CrackMeBench: Binary Reverse Engineering for Agents cs.SE · 2026-05-11 · accept · none · ref 19
CrackMeBench introduces 20 deterministic binary validation tasks and reports GPT-5.5 solving 11/12 generated ones at pass@3 while Claude and Kimi lag, especially on harder tasks.
Dynamic Cyber Ranges cs.CR · 2026-04-27 · unverdicted · none · ref 51
Dynamic Cyber Ranges with LLM defender agents reduce attacker success to 0-55% and preserve evaluation headroom as models advance by using comparable capabilities on both sides.
CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge cs.CR · 2026-04-22 · unverdicted · none · ref 24
CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs cs.CR · 2026-05-15 · unverdicted · none · ref 48
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks cs.CR · 2026-04-18 · unverdicted · none · ref 7
Claude 4.5 Opus reaches 59% solve rate on offensive cyber CTF tasks, with a Kali Linux environment adding 9.5 percentage points over Ubuntu while prompt engineering often hurts performance in equipped setups.
RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs cs.CR · 2026-04-20 · unreviewed · ref 24

D-CIPHER: Dynamic collaborative intel- ligent multi-agent system with planner and heterogeneous executors for offensive security.CoRR, abs/2502.10931

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer