Metis achieves 89.2% average attack success rate across 10 LLMs by reformulating jailbreaking as inference-time policy optimization with a self-evolving metacognitive loop, cutting token costs by 8.2x on average.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization
Metis achieves 89.2% average attack success rate across 10 LLMs by reformulating jailbreaking as inference-time policy optimization with a self-evolving metacognitive loop, cutting token costs by 8.2x on average.