Introduces the first open multi-host cyber range benchmark AgentCyberRange with Cage toolchain and evaluates six frontier AI systems on web exploitation and post-exploitation tasks across 110 vulnerabilities.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.
citing papers explorer
-
AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges
Introduces the first open multi-host cyber range benchmark AgentCyberRange with Cage toolchain and evaluates six frontier AI systems on web exploitation and post-exploitation tasks across 110 vulnerabilities.
-
CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly
CyberEvolver introduces a four-layer self-evolving agent architecture with trace-to-diagnosis and population beam search that raises seed agent success rates by 13.6% on CTF, exploitation, and penetration tasks across four LLMs.
-
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.