A multi-turn intention-deception jailbreak achieves high success on GPT-5 and Claude models while exposing para-jailbreaking where models leak harmful information without direct refusal.
By studying how adversaries can ex- ploit benign-looking multi-turn interactions to bypass safe- guards, we hope to inform the design of more robust safety systems
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Jailbreaking Frontier Foundation Models Through Intention Deception
A multi-turn intention-deception jailbreak achieves high success on GPT-5 and Claude models while exposing para-jailbreaking where models leak harmful information without direct refusal.