FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.
Pantheon: Preemptible multi-dnn inference on mobile edge gpus,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge
FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.