FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.
Nn-meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge
FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.