pith. sign in
Pith Number

pith:JUXQCDAH

pith:2023:JUXQCDAH6DD6QOU5S5GRAHY5CV
not attested not anchored not stored refs resolved

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Michael Backes, Xinyue Shen, Yang Zhang, Yun Shen, Zeyuan Chen

Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.

arxiv:2308.03825 v2 · 2023-08-07 · cs.CR · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JUXQCDAH6DD6QOU5S5GRAHY5CV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4

C2weakest assumption

The 1,405 collected prompts and the 107,250-question set across 13 scenarios are representative enough to support broad conclusions about the inadequacy of safeguards on all LLMs.

C3one line summary

Real-world jailbreak prompts collected from the wild achieve up to 0.95 attack success rates against major LLMs including GPT-4, with some persisting for over 240 days.

References

98 extracted · 98 resolved · 9 Pith anchors

[1] https: //assets.publishing.service.gov.uk/government/ uploads/system/uploads/attachment_data/file/ 1146542/a_pro-innovation_approach_to_AI_ regulation.pdf
[2] https://www.aiprm.com/
[3] https://huggingface.co/ datasets/fka/awesome-chatgpt-prompts
[4] https://chat.openai.com/chat
[5] https://disboard.org/

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.560748Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e

Aliases

arxiv: 2308.03825 · arxiv_version: 2308.03825v2 · doi: 10.48550/arxiv.2308.03825 · pith_short_12: JUXQCDAH6DD6 · pith_short_16: JUXQCDAH6DD6QOU5 · pith_short_8: JUXQCDAH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JUXQCDAH6DD6QOU5S5GRAHY5CV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4d2f010c07f0c7e83a9d974d101f1d15410a4d776d87aa93b28d1d8f8b213c7e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2ec07f8bc40c9d26f66dc397dafce609b4911c07d8b55a00594e0c5e0e44747c",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2023-08-07T16:55:20Z",
    "title_canon_sha256": "c5717085b3c9718de16aa4cd67ced5c7a868ded0d8bb8e4113999ea62b24d0ef"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2308.03825",
    "kind": "arxiv",
    "version": 2
  }
}