Adaptive attackers using optimization techniques bypass 12 recent LLM defenses with >90% success, showing that prior robustness claims relied on weak evaluations.
This usually involves multiple smaller design choices such as the database, how candidates are stored, ranked, and sampled at each step
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections
Adaptive attackers using optimization techniques bypass 12 recent LLM defenses with >90% success, showing that prior robustness claims relied on weak evaluations.