EvoSynth evolves code-based jailbreak algorithms via multi-agent self-correction, reaching 85.5% ASR on Claude-Sonnet-4.5 and 95.9% average across targets with greater diversity.
A reinforced generation of adversarial examples for neural machine translation.arXiv preprint arXiv:1911.03677,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1roles
baseline 1polarities
baseline 1representative citing papers
citing papers explorer
-
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
EvoSynth evolves code-based jailbreak algorithms via multi-agent self-correction, reaching 85.5% ASR on Claude-Sonnet-4.5 and 95.9% average across targets with greater diversity.