pith. sign in
Pith Number

pith:BOO7GNKC

pith:2026:BOO7GNKCM4J2AI3ZU7JLUR6W22
not attested not anchored not stored refs resolved

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Dong Li, Emad Barsoum, Hao Li, Ji Liu, Mehdi Rezagholizadeh, Sharareh Younesian, Sharon Zhou, Sina Rafati, Vikram Appia, Wenwen Ouyang, Yuchen Yang, Yue Liu, Zhenyu Gu, Ziqiong Liu

A new benchmark reveals AI agents deliver up to 6.89x speedups on GPU kernels but show major generalization failures when translating from PyTorch to HIP.

arxiv:2605.16819 v1 · 2026-05-16 · cs.CL · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BOO7GNKCM4J2AI3ZU7JLUR6W22}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across production agents including Cursor Agent, Claude Code, and Codex Agent, we find near-perfect compilation and high correctness rates on most task categories, with the strongest configurations achieving mean speedups of up to 6.89x on PyTorch-to-HIP, 6.69x on HIP-to-HIP, and 2.13x on Triton-to-Triton tasks. Our unseen-configuration evaluation shows that HIP-to-HIP and Triton-to-Triton optimizations largely transfer to unseen input shapes, while PyTorch-to-HIP exhibits substantial correctness drops.

C2weakest assumption

The 196 tasks and the specific unseen-configuration protocol are representative enough of real production kernel optimization work that measured agent performance and generalization behavior will predict usefulness outside the benchmark.

C3one line summary

AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.

References

19 extracted · 19 resolved · 4 Pith anchors

[1] Anthropic. Claude code, 2026. URL https://www.anthropic.com/claude-code. Software product 2026
[2] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732
[3] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374
[4] Cursor. Cursor agent, 2026. URLhttps://cursor.com/agents. Software product 2026
[5] AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation 2026 · arXiv:2604.16625
Receipt and verification
First computed 2026-05-20T00:03:24.262668Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0b9df335426713a02379a7d2ba47d6d6894882fef377e38610e6c97c2434d783

Aliases

arxiv: 2605.16819 · arxiv_version: 2605.16819v1 · doi: 10.48550/arxiv.2605.16819 · pith_short_12: BOO7GNKCM4J2 · pith_short_16: BOO7GNKCM4J2AI3Z · pith_short_8: BOO7GNKC
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BOO7GNKCM4J2AI3ZU7JLUR6W22 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0b9df335426713a02379a7d2ba47d6d6894882fef377e38610e6c97c2434d783
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0ea00e0b2a9e01ef7d6d0a28af106ee714e10a732d26acc9da59d33a1e9f3b8f",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-16T05:25:11Z",
    "title_canon_sha256": "533466e1b876effde3916a428972f65933d0fd9c47249c8e8403477bf3701b51"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16819",
    "kind": "arxiv",
    "version": 1
  }
}