Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.
Advances in neural information processing systems36, 72044–72057 (2023)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Dr. Jekyll and Mr. Hyde: Two Faces of LLMs
Impersonating complex misaligned personas via biographies and role-play bypasses safety in ChatGPT, Gemini, and Deepseek, succeeding on 38-40 out of 40 illicit questions across tested models.