pith. sign in
Pith Number

pith:URDEUTTP

pith:2023:URDEUTTPEZSNEKU74DMPXLITWV
not attested not anchored not stored refs resolved

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Jiahao Yu, Xingwei Lin, Xinyu Xing, Zheng Yu

Automated fuzzing of human-written jailbreak seeds produces templates that succeed against ChatGPT and Llama-2 at rates above 90 percent.

arxiv:2309.10253 v4 · 2023-09-19 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{URDEUTTPEZSNEKU74DMPXLITWV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GPTFuzz achieves over 90% attack success rates against ChatGPT and Llama-2 models, even with suboptimal initial seed templates.

C2weakest assumption

The judgment model reliably determines jailbreak success without significant false positives or negatives that would inflate reported rates.

C3one line summary

GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.

References

79 extracted · 79 resolved · 19 Pith anchors

[1] PaLM 2 Technical Report 2023 · arXiv:2305.10403
[2] Introducing claude 2023
[3] Finite-time analysis of the multiarmed bandit problem 2002
[4] Efficient greybox fuzzing to detect memory errors 2022
[5] Spinning language models: Risks of propaganda-as-a-service and countermeasures 2022

Cited by

45 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.271145Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a4464a4e6f2664d22a9fe0d8fbad13b5702b39cc81a492dc2df5e2984b365536

Aliases

arxiv: 2309.10253 · arxiv_version: 2309.10253v4 · doi: 10.48550/arxiv.2309.10253 · pith_short_12: URDEUTTPEZSN · pith_short_16: URDEUTTPEZSNEKU7 · pith_short_8: URDEUTTP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/URDEUTTPEZSNEKU74DMPXLITWV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a4464a4e6f2664d22a9fe0d8fbad13b5702b39cc81a492dc2df5e2984b365536
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "69e45c26e863bbee04523b9590fe0cf2cbca3f95364bb2b45816b96e13050533",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2023-09-19T02:19:48Z",
    "title_canon_sha256": "f270ad3b4049ff8714ccc1bdce3f02e3b264f619e828ae2c1dcfe6a3327c3c48"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.10253",
    "kind": "arxiv",
    "version": 4
  }
}