pith:PNUMF4I5
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
BenchJack automatically uncovers reward-hacking exploits that let agents score near-perfect on popular benchmarks without completing tasks.
arxiv:2605.12673 v1 · 2026-05-12 · cs.AI · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PNUMF4I5F4UBAAWJNAIM5YHKXE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
BenchJack synthesizes reward-hacking exploits that achieve near-perfect scores on most of the benchmarks without solving a single task, surfacing 219 distinct flaws across the eight classes. Moreover, BenchJack's extended pipeline reduces the hackable-task ratio from near 100% to under 10% on four benchmarks without fatal design flaws, fully patching WebArena and OSWorld within three iterations.
The assumption that exploits discovered by BenchJack using its own auditing agents represent genuine, transferable reward hacks that would succeed on standard frontier models rather than being artifacts of the clairvoyant auditing setup or specific model choices.
BenchJack audits 10 AI agent benchmarks, synthesizes exploits achieving near-perfect scores without task completion, surfaces 219 flaws, and reduces hackable-task ratios to under 10% on four benchmarks via iterative patching.
References
Receipt and verification
| First computed | 2026-05-18T03:09:50.151516Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7b68c2f11d2f281002c96810cee0eab925709e41a21c18d9313fdf305699d6e9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PNUMF4I5F4UBAAWJNAIM5YHKXE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7b68c2f11d2f281002c96810cee0eab925709e41a21c18d9313fdf305699d6e9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1a83aa8499adde69f08fde59df96c5575c4fc34c55b4c84509cb5fc20e9e858c",
"cross_cats_sorted": [
"cs.CR"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2026-05-12T19:22:45Z",
"title_canon_sha256": "94e7076a81333e03e389c3bb835a57a5466d36b95c4b2bbf9fba0687072f6530"
},
"schema_version": "1.0",
"source": {
"id": "2605.12673",
"kind": "arxiv",
"version": 1
}
}