Defensive refusal bias: How safety alignment fails cyber defenders

· 2026 · arXiv 2603.01246

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

ClawdGo uses a self-play training loop with weakest-first scheduling and cross-session memory to raise AI agents' security awareness scores from 80.9 to 96.9 across 12 taxonomy dimensions.

citing papers explorer

Showing 1 of 1 citing paper.

Poster: ClawdGo: Endogenous Security Awareness Training for Autonomous AI Agents cs.CR · 2026-04-27 · unverdicted · none · ref 13
ClawdGo uses a self-play training loop with weakest-first scheduling and cross-session memory to raise AI agents' security awareness scores from 80.9 to 96.9 across 12 taxonomy dimensions.

Defensive refusal bias: How safety alignment fails cyber defenders

fields

years

verdicts

representative citing papers

citing papers explorer