pith. sign in
Pith Number

pith:65VHDTLI

pith:2026:65VHDTLINNMD6H475BMA32EYDI
not attested not anchored not stored refs pending

CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

Caiwen Ding, Haoyang Chen, Mattia Fazzini, Shiyang Li

A benchmark for LLM CUDA debugging shows that models often pass tests by degenerating optimized code into slower versions.

arxiv:2605.08455 v2 · 2026-05-08 · cs.LG · cs.PL · cs.SE

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{65VHDTLINNMD6H475BMA32EYDI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

protocol-aware evaluation gives a more faithful view of CUDA debugging ability: when performance-loss tolerance is high, fixers appear much stronger, but even a minor stricter performance requirement can sharply reduce measured success, shifting scores by up to 40 percentage points.

C2weakest assumption

The 213 tasks drawn from LLM-generated failing workspaces are representative of real-world CUDA debugging needs and that the chosen performance preservation metric correctly identifies degeneration without missing other failure modes.

C3one line summary

CUDABeaver shows LLM CUDA debuggers often degenerate code for test-passing at the cost of speed, with protocol-aware metrics shifting success rates by up to 40 percentage points.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-27T01:05:56.585972Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f76a71cd686b583f1f9fe8580de8981a3a2d7120868c5df9cacdbb10a0213113

Aliases

arxiv: 2605.08455 · arxiv_version: 2605.08455v2 · doi: 10.48550/arxiv.2605.08455 · pith_short_12: 65VHDTLINNMD · pith_short_16: 65VHDTLINNMD6H47 · pith_short_8: 65VHDTLI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/65VHDTLINNMD6H475BMA32EYDI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f76a71cd686b583f1f9fe8580de8981a3a2d7120868c5df9cacdbb10a0213113
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "213e4bb5c58efe52714ec68081465d86dd99c520e80dfe3eb8cd3442bd239ba1",
    "cross_cats_sorted": [
      "cs.PL",
      "cs.SE"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-08T20:24:32Z",
    "title_canon_sha256": "5a70305896490d12842b97da5c332be1488b0430c6be110b5cec7415eafeed26"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.08455",
    "kind": "arxiv",
    "version": 2
  }
}