AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

· 2025 · cs.LG · arXiv 2511.15915

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from $49\%$ to $61\%$ on Trainium 1 and from $45\%$ to $59\%$ on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being $26\times$ cheaper. The code is open-sourced at https://github.com/zhang677/AccelOpt.

representative citing papers

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

cs.LG · 2026-05-27 · conditional · novelty 7.0

A two-stage agent skill system enables autonomous end-to-end deployment of eight decoder-only LLMs on AMD XDNA 2 NPU with numerical correctness in 0.5-4 hours each, generalizing from a human-guided Llama-3.2-1B reference.

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

citing papers explorer

Showing 2 of 2 citing papers.

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs cs.LG · 2026-05-27 · conditional · none · ref 15 · internal anchor
A two-stage agent skill system enables autonomous end-to-end deployment of eight decoder-only LLMs on AMD XDNA 2 NPU with numerical correctness in 0.5-4 hours each, generalizing from a human-guided Llama-3.2-1B reference.
Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages cs.AI · 2026-05-27 · unverdicted · none · ref 28 · internal anchor
KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

fields

years

verdicts

representative citing papers

citing papers explorer