Characterizing and Detecting CUDA Program Bugs

Cong Liu; Husheng Zhou; Lingming Zhang; Mingyuan Wu; Yuqun Zhang

arxiv: 1905.01833 · v3 · pith:JU4YE6KInew · submitted 2019-05-06 · 💻 cs.SE

Characterizing and Detecting CUDA Program Bugs

Mingyuan Wu , Husheng Zhou , Lingming Zhang , Cong Liu , Yuqun Zhang This is my paper

classification 💻 cs.SE

keywords bugscudasynchronizationdetectprogramsimuleebeencomputing

0 comments

read the original abstract

While CUDA has become a major parallel computing platform and programming model for general-purpose GPU computing, CUDA-induced bug patterns have not yet been well explored. In this paper, we conduct the first empirical study to reveal important categories of CUDA program bug patterns based on 319 bugs identified within 5 popular CUDA projects in GitHub. Our findings demonstrate that CUDA-specific characteristics may cause program bugs such as synchronization bugs that are rather difficult to detect. To efficiently detect such synchronization bugs, we establish the first lightweight general CUDA bug detection framework, namely Simulee, to simulate CUDA program execution by interpreting the corresponding llvm bytecode and collecting the memory-access information to automatically detect CUDA synchronization bugs. To evaluate the effectiveness and efficiency of Simulee, we conduct a set of experiments and the experimental results suggest that Simulee can detect 20 out of the 27 studied synchronization bugs and successfully detects 26 previously unknown synchronization bugs, 10 of which have been confirmed by the developers.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision
cs.CR 2026-06 unverdicted novelty 7.0

The paper defines Collective Semantic Corruption (CSC) as attacks corrupting participation metadata in CUDA collectives, reports 102/102 mismatch cases in an evaluation suite, and proposes Collective Integrity Contrac...
CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging
cs.LG 2026-05 unverdicted novelty 7.0

CUDABeaver shows LLM CUDA debuggers often degenerate code for test-passing at the cost of speed, with protocol-aware metrics shifting success rates by up to 40 percentage points.
CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging
cs.LG 2026-05 unverdicted novelty 6.0

CUDABEAVER benchmark and pass@k(M,C,A) metric show LLM CUDA debugging success drops by up to 40 percentage points under strict performance requirements.