Hardware-in-the-loop tests on Snapdragon 8 Elite show NPU phase-dependent speedups (1.64x prefill, 1.18x decode) and 2.52x energy reduction for FastVLM-0.5B plus a graph rewrite enabling unsupported encoders.
MLC-LLM: Universal LLM deployment engine with ML compilation,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Phase Matters: Characterizing Heterogeneous Vision-Language Inference on a Mobile SoC
Hardware-in-the-loop tests on Snapdragon 8 Elite show NPU phase-dependent speedups (1.64x prefill, 1.18x decode) and 2.52x energy reduction for FastVLM-0.5B plus a graph rewrite enabling unsupported encoders.