Towards robust agentic cuda kernel benchmarking, verification, and optimization.arXiv preprint arXiv:2509.14279

Robert Tjarko Lange, Qi Sun, Aaditya Prasad, Maxence Faldor, Yujin Tang, David Ha · 2025 · arXiv 2509.14279

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

FastKernels: Benchmarking GPU Kernel Generation in Production

cs.LG · 2026-05-22 · conditional · novelty 8.0

FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

cs.LG · 2026-05-06 · conditional · novelty 7.0 · 2 refs

KernelBenchX benchmark shows task category explains nearly three times more variance in LLM kernel correctness than method choice, iterative refinement boosts correctness but reduces performance, and quantization remains unsolved.

Kernel Contracts: A Specification Language for ML Kernel Correctness Across Heterogeneous Silicon

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Kernel Contracts is a specification language that formalizes correctness requirements for ML kernels to ensure consistent results across heterogeneous silicon platforms.

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Metal-Sci is a benchmark and harness for LLM evolutionary optimization of Apple Silicon Metal kernels that uses held-out sizes to detect silent regressions missed by in-distribution scores.

KEET: Explaining Performance of GPU Kernels Using LLM Agents

cs.PF · 2026-05-06 · unverdicted · novelty 5.0

KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.

citing papers explorer

Showing 7 of 7 citing papers.

FastKernels: Benchmarking GPU Kernel Generation in Production cs.LG · 2026-05-22 · conditional · none · ref 8
FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs cs.LG · 2026-05-19 · unverdicted · none · ref 10 · 2 links
CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.
AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents cs.CL · 2026-05-16 · unverdicted · none · ref 8
AgentKernelArena is a new open benchmark that measures complete AI agent workflows on 196 GPU kernel tasks with correctness, performance, and generalization checks to unseen configurations.
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels cs.LG · 2026-05-06 · conditional · none · ref 4 · 2 links
KernelBenchX benchmark shows task category explains nearly three times more variance in LLM kernel correctness than method choice, iterative refinement boosts correctness but reduces performance, and quantization remains unsolved.
Kernel Contracts: A Specification Language for ML Kernel Correctness Across Heterogeneous Silicon cs.LG · 2026-04-23 · unverdicted · none · ref 5
Kernel Contracts is a specification language that formalizes correctness requirements for ML kernels to ensure consistent results across heterogeneous silicon platforms.
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon cs.LG · 2026-05-10 · unverdicted · none · ref 18
Metal-Sci is a benchmark and harness for LLM evolutionary optimization of Apple Silicon Metal kernels that uses held-out sizes to detect silent regressions missed by in-distribution scores.
KEET: Explaining Performance of GPU Kernels Using LLM Agents cs.PF · 2026-05-06 · unverdicted · none · ref 22
KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.

Towards robust agentic cuda kernel benchmarking, verification, and optimization.arXiv preprint arXiv:2509.14279

fields

years

verdicts

representative citing papers

citing papers explorer