and Sekar, V

· 2025 · arXiv 2501.16466

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents

cs.CR · 2026-05-13 · conditional · novelty 8.0

ExploitBench decomposes LLM exploitation into 16 oracle-verified capability flags and finds public frontier models trigger crashes but rarely reach arbitrary code execution on 41 V8 bugs.

Beyond Collection: Measuring the Detection Efficacy of Modern Security Logging Standards

cs.CR · 2026-05-07 · unverdicted · novelty 7.0

SETC framework provides the first systematic comparison of CIM, OCSF, and ECS logging standards by running 50 RCE exploits and measuring how well each captures attack indicators.

PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

cs.CR · 2026-05-20 · unverdicted · novelty 6.0

PocketAgents introduces a manifest-driven library for LLM-based autonomous defense agents, evaluated in 18 closed-loop trials against a DarkSide-inspired attack where 13 trials produced validated blocking actions.

The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation

cs.CR · 2025-12-12 · conditional · novelty 6.0

Structured CTI standards like ATT&CK describe adversary actions but lack the ordering, preconditions, and environmental details needed for direct multi-stage emulation, and a translation method can bridge this gap when assumptions are recorded.

Autonomous Adversary: Red-Teaming in the age of LLM

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

Expert-defined action plans for LLM agents achieve higher task completion in lateral-movement scenarios than fully autonomous or self-scaffolded modes, but failures remain common due to brittle commands and state handling.

citing papers explorer

Showing 5 of 5 citing papers.

ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents cs.CR · 2026-05-13 · conditional · none · ref 8
ExploitBench decomposes LLM exploitation into 16 oracle-verified capability flags and finds public frontier models trigger crashes but rarely reach arbitrary code execution on 41 V8 bugs.
Beyond Collection: Measuring the Detection Efficacy of Modern Security Logging Standards cs.CR · 2026-05-07 · unverdicted · none · ref 30
SETC framework provides the first systematic comparison of CIM, OCSF, and ECS logging standards by running 50 RCE exploits and measuring how well each captures attack indicators.
PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents cs.CR · 2026-05-20 · unverdicted · none · ref 18
PocketAgents introduces a manifest-driven library for LLM-based autonomous defense agents, evaluated in 18 closed-loop trials against a DarkSide-inspired attack where 13 trials produced validated blocking actions.
The Procedural Semantics Gap in Structured CTI: A Measurement-Driven STIX Analysis for APT Emulation cs.CR · 2025-12-12 · conditional · none · ref 39
Structured CTI standards like ATT&CK describe adversary actions but lack the ordering, preconditions, and environmental details needed for direct multi-stage emulation, and a translation method can bridge this gap when assumptions are recorded.
Autonomous Adversary: Red-Teaming in the age of LLM cs.CR · 2026-05-07 · unverdicted · none · ref 2
Expert-defined action plans for LLM agents achieve higher task completion in lateral-movement scenarios than fully autonomous or self-scaffolded modes, but failures remain common due to brittle commands and state handling.

and Sekar, V

fields

years

verdicts

representative citing papers

citing papers explorer