pith. sign in
Pith Number

pith:K5RN43FM

pith:2025:K5RN43FMGDESN5B5UASIRR6C3W
not attested not anchored not stored refs resolved

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Abhradeep Thakurta, Andreas Terzis, Chawin Sitawarin, Florian Tram\`er, Harsh Chaudhari, Ilia Shumailov, Jamie Hayes, Juliette Pluto, Kai Yuanqing Xiao, Michael Ilie, Milad Nasr, Nicholas Carlini, Sander V. Schulhoff, Shuang Song

Adaptive optimization methods bypass 12 recent defenses against LLM jailbreaks and prompt injections with over 90% success.

arxiv:2510.09023 v1 · 2025-10-10 · cs.LG · cs.CR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{K5RN43FMGDESN5B5UASIRR6C3W}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By systematically tuning and scaling general optimization techniques—gradient descent, reinforcement learning, random search, and human-guided exploration—we bypass 12 recent defenses with attack success rate above 90% for most; importantly, the majority of defenses originally reported near-zero attack success rates.

C2weakest assumption

That the adaptive optimization methods described fairly represent realistic attacker capabilities and were not over-optimized post-hoc against the specific defenses tested.

C3one line summary

Adaptive attackers using optimization techniques bypass 12 recent LLM defenses with >90% success, showing that prior robustness claims relied on weak evaluations.

References

12 extracted · 12 resolved · 3 Pith anchors

[1] AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents 2025 · doi:10.18653/v1/n19-1423
[2] Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection 2024 · doi:10.1109/sp61157.2025.00250
[3] Ignore Previous Prompt: Attack Techniques For Language Models 2025 · doi:10.18653/v1/2023.emnlp-main.302
[4] Similarly to prior works, we use this benchmark to evaluate the jailbreak defenses 2024
[5] We follow Chen et al 2023

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.920915Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5762de6cac30c926f43da02488c7c2ddb885f3ec2002b4c0ef4b6e038b1bce74

Aliases

arxiv: 2510.09023 · arxiv_version: 2510.09023v1 · doi: 10.48550/arxiv.2510.09023 · pith_short_12: K5RN43FMGDES · pith_short_16: K5RN43FMGDESN5B5 · pith_short_8: K5RN43FM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/K5RN43FMGDESN5B5UASIRR6C3W \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5762de6cac30c926f43da02488c7c2ddb885f3ec2002b4c0ef4b6e038b1bce74
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5489c59026daf924dd2a56d570d092f4274c87bdfd22204cabcaa3d10fac6a9b",
    "cross_cats_sorted": [
      "cs.CR"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-10-10T05:51:04Z",
    "title_canon_sha256": "b174234e6403764ed3fc1a0c33e209a434fec54ea215d443cfe73f4c4e31d5f1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.09023",
    "kind": "arxiv",
    "version": 1
  }
}