Pith Number

pith:K5RN43FM

pith:2025:K5RN43FMGDESN5B5UASIRR6C3W

not attested not anchored not stored refs resolved

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Abhradeep Thakurta, Andreas Terzis, Chawin Sitawarin, Florian Tram\`er, Harsh Chaudhari, Ilia Shumailov, Jamie Hayes, Juliette Pluto, Kai Yuanqing Xiao, Michael Ilie, Milad Nasr, Nicholas Carlini, Sander V. Schulhoff, Shuang Song

Adaptive optimization methods bypass 12 recent defenses against LLM jailbreaks and prompt injections with over 90% success.

arxiv:2510.09023 v1 · 2025-10-10 · cs.LG · cs.CR

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{K5RN43FMGDESN5B5UASIRR6C3W}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By systematically tuning and scaling general optimization techniques—gradient descent, reinforcement learning, random search, and human-guided exploration—we bypass 12 recent defenses with attack success rate above 90% for most; importantly, the majority of defenses originally reported near-zero attack success rates.

C2weakest assumption

That the adaptive optimization methods described fairly represent realistic attacker capabilities and were not over-optimized post-hoc against the specific defenses tested.

C3one line summary

Adaptive attackers using optimization techniques bypass 12 recent LLM defenses with >90% success, showing that prior robustness claims relied on weak evaluations.

References

12 extracted · 12 resolved · 3 Pith anchors

[1] AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents 2025 · doi:10.18653/v1/n19-1423

[2] Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection 2024 · doi:10.1109/sp61157.2025.00250

[3] Ignore Previous Prompt: Attack Techniques For Language Models 2025 · doi:10.18653/v1/2023.emnlp-main.302

[4] Similarly to prior works, we use this benchmark to evaluate the jailbreak defenses 2024

[5] We follow Chen et al 2023

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Security, Privacy, and Ethical Risks in OpenClaw

On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation

Agent Security is a Systems Problem

Adaptive Probe-based Steering for Robust LLM Jailbreaking

Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks

Receipt and verification

First computed	2026-05-17T23:38:46.920915Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

5762de6cac30c926f43da02488c7c2ddb885f3ec2002b4c0ef4b6e038b1bce74

Aliases

arxiv: 2510.09023 · arxiv_version: 2510.09023v1 · doi: 10.48550/arxiv.2510.09023 · pith_short_12: K5RN43FMGDES · pith_short_16: K5RN43FMGDESN5B5 · pith_short_8: K5RN43FM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/K5RN43FMGDESN5B5UASIRR6C3W \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5762de6cac30c926f43da02488c7c2ddb885f3ec2002b4c0ef4b6e038b1bce74

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "5489c59026daf924dd2a56d570d092f4274c87bdfd22204cabcaa3d10fac6a9b",
    "cross_cats_sorted": [
      "cs.CR"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-10-10T05:51:04Z",
    "title_canon_sha256": "b174234e6403764ed3fc1a0c33e209a434fec54ea215d443cfe73f4c4e31d5f1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.09023",
    "kind": "arxiv",
    "version": 1
  }
}