AutoRISE evolves red-teaming attack strategies as editable executable programs via an agent, yielding 17-point higher average attack success rates than baselines across 11 models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
BOT-MOD uncovers hidden agent intent in multi-agent environments like Moltbook through guided multi-turn dialogue and Gibbs-based sampling over intent hypotheses.
citing papers explorer
-
AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
AutoRISE evolves red-teaming attack strategies as editable executable programs via an agent, yielding 17-point higher average attack success rates than baselines across 11 models.
-
Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue
BOT-MOD uncovers hidden agent intent in multi-agent environments like Moltbook through guided multi-turn dialogue and Gibbs-based sampling over intent hypotheses.