FlexNPU is a transparent virtualization system for Ascend NPUs that supports dynamic prefill-decode co-location in LLM serving and reports throughput gains plus large TTFT reductions versus static baselines.
Singularity: Planet-scale, Preemptive and Elastic Scheduling of AI Workloads
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location
FlexNPU is a transparent virtualization system for Ascend NPUs that supports dynamic prefill-decode co-location in LLM serving and reports throughput gains plus large TTFT reductions versus static baselines.