Minimizing gpu kernel launch overhead in deep learning inference on mobile gpus

· 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

cs.AR · 2026-04-11 · unverdicted · novelty 6.0

FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.

citing papers explorer

Showing 1 of 1 citing paper.

Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge cs.AR · 2026-04-11 · unverdicted · none · ref 16
FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.

Minimizing gpu kernel launch overhead in deep learning inference on mobile gpus

fields

years

verdicts

representative citing papers

citing papers explorer