pith:BOO7GNKC
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents
A new benchmark reveals AI agents deliver up to 6.89x speedups on GPU kernels but show major generalization failures when translating from PyTorch to HIP.
arxiv:2605.16819 v1 · 2026-05-16 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BOO7GNKCM4J2AI3ZU7JLUR6W22}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Across production agents including Cursor Agent, Claude Code, and Codex Agent, we find near-perfect compilation and high correctness rates on most task categories, with the strongest configurations achieving mean speedups of up to 6.89x on PyTorch-to-HIP, 6.69x on HIP-to-HIP, and 2.13x on Triton-to-Triton tasks. Our unseen-configuration evaluation shows that HIP-to-HIP and Triton-to-Triton optimizations largely transfer to unseen input shapes, while PyTorch-to-HIP exhibits substantial correctness drops.
The 196 tasks and the specific unseen-configuration protocol are representative enough of real production kernel optimization work that measured agent performance and generalization behavior will predict usefulness outside the benchmark.
AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.
References
Receipt and verification
| First computed | 2026-05-20T00:03:24.262668Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0b9df335426713a02379a7d2ba47d6d6894882fef377e38610e6c97c2434d783
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BOO7GNKCM4J2AI3ZU7JLUR6W22 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0b9df335426713a02379a7d2ba47d6d6894882fef377e38610e6c97c2434d783
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "0ea00e0b2a9e01ef7d6d0a28af106ee714e10a732d26acc9da59d33a1e9f3b8f",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-16T05:25:11Z",
"title_canon_sha256": "533466e1b876effde3916a428972f65933d0fd9c47249c8e8403477bf3701b51"
},
"schema_version": "1.0",
"source": {
"id": "2605.16819",
"kind": "arxiv",
"version": 1
}
}