Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, and Daniel Kang

Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, Daniel Kang · 2025 · arXiv 2503.17332

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.

RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code

cs.CR · 2026-04-15 · unverdicted · novelty 7.0

RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

A practical evaluation protocol for AI pentesting agents that uses validated vulnerability discovery, LLM semantic matching, and bipartite scoring to assess performance in realistic, complex targets.

xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models

cs.CR · 2025-09-16 · unverdicted · novelty 5.0

xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.

citing papers explorer

Showing 4 of 4 citing papers.

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios cs.CR · 2026-05-08 · unverdicted · none · ref 36
LLM agents exhibit persistent attack-selection biases as fixed traits independent of success rates, with a bias momentum effect that resists steering and yields no performance gain.
RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code cs.CR · 2026-04-15 · unverdicted · none · ref 26
RealVuln benchmark finds security-specialized scanners outperform general-purpose LLMs and rule-based SAST tools on hand-labeled vulnerable Python code under F3 scoring, with all artifacts released.
From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World cs.AI · 2026-05-11 · unverdicted · none · ref 51
A practical evaluation protocol for AI pentesting agents that uses validated vulnerability discovery, LLM semantic matching, and bipartite scoring to assess performance in realistic, complex targets.
xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models cs.CR · 2025-09-16 · unverdicted · none · ref 26
xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.

Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, and Daniel Kang

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer