CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

· 2026 · cs.LG · arXiv 2605.05023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention. We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.

representative citing papers

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Learning When to Optimize: Verified Optimization Skills from Expert GPU-Kernel Lineages cs.AI · 2026-05-27 · unverdicted · none · ref 18 · internal anchor
KLineage derives verified optimization skills from backward lineages of expert GPU kernels to guide LLM agents toward higher-quality and more efficient kernels than memory-based baselines.

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

fields

years

verdicts

representative citing papers

citing papers explorer