DRTriton: Large-Scale Synthetic Data Driven Reinforcement Learning for Triton Kernel Generation

Ming Lin; Siqi Guo; Tianbao Yang

arxiv: 2603.21465 · v2 · pith:7H57ADSFnew · submitted 2026-03-23 · 💻 cs.CL · cs.LG

DRTriton: Large-Scale Synthetic Data Driven Reinforcement Learning for Triton Kernel Generation

Siqi Guo , Ming Lin , Tianbao Yang This is my paper

classification 💻 cs.CL cs.LG

keywords kernelscudadrtritonllmspytorchtritonalgorithmchallenging

0 comments

read the original abstract

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent research leverages Large Language Models (LLMs) to automatically convert PyTorch reference implementations to CUDA kernels, significantly reducing engineering effort. State-of-the-art LLMs, such as GPT-5.2 and Claude-Sonnet-4.5, still struggle with this task. To address this challenge, we propose DRTriton, a scalable learning framework for training LLMs to convert PyTorch programs into highly optimized Triton kernels, which are then compiled to CUDA kernels at runtime. DRTriton consists of three key components: (i) a data synthetic algorithm CSP-DAG that guarantees full coverage and unbiased uniform sampling over the operator space with controlled difficulty; (ii) a curriculum RL framework with decoupled rewards that jointly optimizes conversion success rate and execution speed; and (iii) a test-time search algorithm that further improves the execution speed of the generated Triton kernels. With a warmup stage of SFT on limited PyTorch-Triton pairs curated using existing LLMs, DRTriton trained by RL on synthesized PyTorch programs generalizes effectively to real-world CUDA kernels that are challenging even for human experts. Experimental results show that DRTriton-7B achieves speedup over PyTorch on 92% of KernelBench Level 2 tasks, compared to 23% for GPT-5.2 and 19% for Claude-Sonnet-4.5.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization
cs.LG 2026-06 unverdicted novelty 7.0

daVinci-kernel is a multi-agent RL system that co-evolves skill selection, policy generation, and summarization via shared LLM and REINFORCE to optimize GPU kernels, reporting higher KernelBench scores than prior RL models.
SpecGen: Accelerating Agentic Kernel Optimization with Speculative Generation
cs.DC 2026-06 unverdicted novelty 6.0

SpecGen introduces speculative generation to fork non-reasoning kernel candidates during LLM reasoning traces, enabling early termination and parallel profiling to reduce end-to-end optimization time on H200 GPUs.