Active attacks: Red-teaming LLMs via adaptive environments

Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, Yoshua Bengio, Minsu Kim · 2025 · arXiv 2509.21947

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

cs.LG · 2026-05-01

citing papers explorer

Showing 2 of 2 citing papers.

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems cs.CR · 2026-05-12 · unverdicted · none · ref 38
Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance cs.LG · 2026-05-01 · unreviewed · ref 7

Active attacks: Red-teaming LLMs via adaptive environments

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer