pith:24JPX53P
A StrongREJECT for Empty Jailbreaks
The StrongREJECT benchmark and evaluator match human judgments on jailbreak effectiveness more closely than prior methods and show that existing evaluations overstate success rates.
arxiv:2402.10260 v2 · 2024-02-15 · cs.LG · cs.CL · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{24JPX53PX6EE2BI7P5VBMAC4IY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The StrongREJECT evaluator achieves state-of-the-art agreement with human judgments of jailbreak effectiveness, and existing evaluation methods significantly overstate jailbreak effectiveness compared to human judgments and the StrongREJECT evaluator.
That the chosen dataset of forbidden prompts is representative enough of real-world harmful queries and that the automated evaluator's scoring rules capture the full notion of 'useful harmful information' without introducing new biases.
StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.519126Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d712fbf76fbf884d051f7f6a16005c462a4b5c0178c08fb4ceb8a3814444ef34
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/24JPX53PX6EE2BI7P5VBMAC4IY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d712fbf76fbf884d051f7f6a16005c462a4b5c0178c08fb4ceb8a3814444ef34
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "3feb38ad9a6d4b8a115403d4d6c3460070d9069053358a27d297f587a84c0f97",
"cross_cats_sorted": [
"cs.CL",
"cs.CR"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2024-02-15T18:58:09Z",
"title_canon_sha256": "991e809fe481e050656d5a79c357f8a24e4f0c2f9ac32ef723a66c8f72f1efd9"
},
"schema_version": "1.0",
"source": {
"id": "2402.10260",
"kind": "arxiv",
"version": 2
}
}