pith:65VHDTLI
CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging
A benchmark for LLM CUDA debugging shows that models often pass tests by degenerating optimized code into slower versions.
arxiv:2605.08455 v2 · 2026-05-08 · cs.LG · cs.PL · cs.SE
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{65VHDTLINNMD6H475BMA32EYDI}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
protocol-aware evaluation gives a more faithful view of CUDA debugging ability: when performance-loss tolerance is high, fixers appear much stronger, but even a minor stricter performance requirement can sharply reduce measured success, shifting scores by up to 40 percentage points.
The 213 tasks drawn from LLM-generated failing workspaces are representative of real-world CUDA debugging needs and that the chosen performance preservation metric correctly identifies degeneration without missing other failure modes.
CUDABeaver shows LLM CUDA debuggers often degenerate code for test-passing at the cost of speed, with protocol-aware metrics shifting success rates by up to 40 percentage points.
Formal links
Receipt and verification
| First computed | 2026-05-27T01:05:56.585972Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
f76a71cd686b583f1f9fe8580de8981a3a2d7120868c5df9cacdbb10a0213113
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/65VHDTLINNMD6H475BMA32EYDI \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f76a71cd686b583f1f9fe8580de8981a3a2d7120868c5df9cacdbb10a0213113
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "213e4bb5c58efe52714ec68081465d86dd99c520e80dfe3eb8cd3442bd239ba1",
"cross_cats_sorted": [
"cs.PL",
"cs.SE"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-08T20:24:32Z",
"title_canon_sha256": "5a70305896490d12842b97da5c332be1488b0430c6be110b5cec7415eafeed26"
},
"schema_version": "1.0",
"source": {
"id": "2605.08455",
"kind": "arxiv",
"version": 2
}
}