pith. sign in

Towards robust agentic cuda kernel benchmarking, verification, and optimization.arXiv preprint arXiv:2509.14279

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

years

2026 7

representative citing papers

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

cs.LG · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CODA re-expresses most non-attention Transformer computations as GEMM-plus-epilogue programs using a constrained set of composable primitives to keep intermediate results on-chip and cut global memory traffic.

KEET: Explaining Performance of GPU Kernels Using LLM Agents

cs.PF · 2026-05-06 · unverdicted · novelty 5.0

KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.

citing papers explorer

Showing 7 of 7 citing papers.