Abstract operator framing with few-shot examples bypasses safety alignment in GPT-5.4 and other OpenAI models, reaching 24% success on HarmBench where direct harmful queries achieve 0%.
Proceedings of the 41st International Conference on Machine Learning , series =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Involuntary In-Context Learning: Exploiting Few-Shot Pattern Completion to Bypass Safety Alignment in GPT-5.4
Abstract operator framing with few-shot examples bypasses safety alignment in GPT-5.4 and other OpenAI models, reaching 24% success on HarmBench where direct harmful queries achieve 0%.