PentestGPT: Evaluating and harnessing large language models for automated penetration testing,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks

cs.CR · 2026-04-18 · unverdicted · novelty 5.0

Claude 4.5 Opus reaches 59% solve rate on offensive cyber CTF tasks, with a Kali Linux environment adding 9.5 percentage points over Ubuntu while prompt engineering often hurts performance in equipped setups.

citing papers explorer

Showing 1 of 1 citing paper.

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks cs.CR · 2026-04-18 · unverdicted · none · ref 3
Claude 4.5 Opus reaches 59% solve rate on offensive cyber CTF tasks, with a Kali Linux environment adding 9.5 percentage points over Ubuntu while prompt engineering often hurts performance in equipped setups.

PentestGPT: Evaluating and harnessing large language models for automated penetration testing,

fields

years

verdicts

representative citing papers

citing papers explorer