pith. sign in
Pith Number

pith:UQQL6U5V

pith:2026:UQQL6U5VZKK34XL6CPH7I6BZNJ
not attested not anchored not stored refs pending

Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization

Chi Zhang, Huilin Zhou, Jian Zhao, Lan Zhang, Tianle Zhang, Xiuyuan Chen, Xuelong Li, YiLu Zhong, Yuchen Yuan, Zhen Liang

Metis reformulates jailbreaking as inference-time policy optimization in a POMDP that uses a metacognitive loop to diagnose defenses and steer attacks.

arxiv:2605.10067 v3 · 2026-05-11 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UQQL6U5VZKK34XL6CPH7I6BZNJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Metis achieves the strongest average Attack Success Rate (ASR) among compared methods at 89.2%, maintaining high efficacy on resilient frontier models (e.g., 76.0% on O1 and 78.0% on GPT-5-chat) where traditional baselines exhibit substantial performance degradation.

C2weakest assumption

That the structured feedback extracted from target responses can reliably serve as a semantic gradient capable of steering the policy toward successful jailbreaks without the optimization collapsing into ineffective local patterns on advanced aligned models.

C3one line summary

Metis achieves 89.2% average attack success rate across 10 LLMs including 76% on o1 and 78% on GPT-5-chat while cutting token cost by 8.2x on average through metacognitive policy optimization in a POMDP.

Receipt and verification
First computed 2026-05-22T01:04:05.886052Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a420bf53b5ca95be5d7e13cff478396a558af6ddbcfdbcb40b3b67efff57bcef

Aliases

arxiv: 2605.10067 · arxiv_version: 2605.10067v3 · doi: 10.48550/arxiv.2605.10067 · pith_short_12: UQQL6U5VZKK3 · pith_short_16: UQQL6U5VZKK34XL6 · pith_short_8: UQQL6U5V
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UQQL6U5VZKK34XL6CPH7I6BZNJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a420bf53b5ca95be5d7e13cff478396a558af6ddbcfdbcb40b3b67efff57bcef
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2682fa13fe16c3987ca46434221cf99691dfc0bb2c306059243a4274b1f4f7f0",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-11T06:45:00Z",
    "title_canon_sha256": "35b42fd74c524fb2ae56d482e309e89ca36e4a775c83acce36a2282667a89675"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.10067",
    "kind": "arxiv",
    "version": 3
  }
}