Warp-tiled CUDA kernel for depthwise convolution delivers 3.26x runtime reduction versus naive baseline and 1.29x end-to-end training speedup using counter-free analysis in cloud settings.
The ASHRAE great energy predictor III competition: Overview and results
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments
Warp-tiled CUDA kernel for depthwise convolution delivers 3.26x runtime reduction versus naive baseline and 1.29x end-to-end training speedup using counter-free analysis in cloud settings.