pith:GBIXPOE6
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP benchmark publicly measures hazardous knowledge in LLMs, and the RMU unlearning method reduces performance on it while preserving general capabilities.
arxiv:2403.03218 v7 · 2024-03-05 · cs.LG · cs.AI · cs.CL · cs.CY
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GBIXPOE6A3GVA43FYXMVF2ZOWL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs.
That WMDP questions serve as a reliable proxy for real-world hazardous capabilities and that the unlearning effect generalizes without introducing new vulnerabilities or degrading performance on untested domains.
WMDP is a public benchmark measuring hazardous LLM knowledge across biosecurity, cybersecurity, and chemical security, paired with RMU unlearning that reduces WMDP performance without degrading general capabilities.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.345849Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
305177b89e06cd507365c5d952eb2eb2defd87a1022827c16353163f2402c6ee
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GBIXPOE6A3GVA43FYXMVF2ZOWL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 305177b89e06cd507365c5d952eb2eb2defd87a1022827c16353163f2402c6ee
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "18dd989ed4fd400b0ec8c83ee1c46ffa9c0899ceb189bfa75903904593d13afa",
"cross_cats_sorted": [
"cs.AI",
"cs.CL",
"cs.CY"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2024-03-05T18:59:35Z",
"title_canon_sha256": "e6fc6505fcb49572ec9067b46d99a5088ece716bfa6282c0f93405bafd7b62cb"
},
"schema_version": "1.0",
"source": {
"id": "2403.03218",
"kind": "arxiv",
"version": 7
}
}