Pith Number

pith:JUXQCDAH

pith:2023:JUXQCDAH6DD6QOU5S5GRAHY5CV

not attested not anchored not stored refs resolved

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Michael Backes, Xinyue Shen, Yang Zhang, Yun Shen, Zeyuan Chen

Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.

arxiv:2308.03825 v2 · 2023-08-07 · cs.CR · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{JUXQCDAH6DD6QOU5S5GRAHY5CV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4

C2weakest assumption

The 1,405 collected prompts and the 107,250-question set across 13 scenarios are representative enough to support broad conclusions about the inadequacy of safeguards on all LLMs.

C3one line summary

Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.

References

98 extracted · 98 resolved · 9 Pith anchors

[1] https: //assets.publishing.service.gov.uk/government/ uploads/system/uploads/attachment_data/file/ 1146542/a_pro-innovation_approach_to_AI_ regulation.pdf

[2] https://www.aiprm.com/

[3] https://huggingface.co/ datasets/fka/awesome-chatgpt-prompts

[4] https://chat.openai.com/chat

[5] https://disboard.org/

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

OpenAI o1 System Card

ACE: A Security Architecture for LLM-Integrated App Systems

Mobile GUI Agents under Real-world Threats: Are We There Yet?

GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs

Receipt and verification

First computed	2026-05-17T23:38:14.560748Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e

Aliases

arxiv: 2308.03825 · arxiv_version: 2308.03825v2 · doi: 10.48550/arxiv.2308.03825 · pith_short_12: JUXQCDAH6DD6 · pith_short_16: JUXQCDAH6DD6QOU5 · pith_short_8: JUXQCDAH

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "2ec07f8bc40c9d26f66dc397dafce609b4911c07d8b55a00594e0c5e0e44747c",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2023-08-07T16:55:20Z",
    "title_canon_sha256": "c5717085b3c9718de16aa4cd67ced5c7a868ded0d8bb8e4113999ea62b24d0ef"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2308.03825",
    "kind": "arxiv",
    "version": 2
  }
}