(2024), hyper- parameters that we need to set for Zhu’s AutoDAN are the iteration numberTin each step, objective weightsw 1 andw 2, the top-BparameterB, and the temperatureτ

•Zhu’s AutoDAN:According to Algorithm 1, Algorithm 2 in Zhu et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motivates a new regularizer that improves real LLM jailbreak robustness-utility tradeoff

citing papers explorer

Showing 1 of 1 citing paper.

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory cs.LG · 2026-04-14 · unverdicted · none · ref 40
Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motivates a new regularizer that improves real LLM jailbreak robustness-utility tradeoff

(2024), hyper- parameters that we need to set for Zhu’s AutoDAN are the iteration numberTin each step, objective weightsw 1 andw 2, the top-BparameterB, and the temperatureτ

fields

years

verdicts

representative citing papers

citing papers explorer