EdgeServing schedules multi-DNN inference on edge GPUs via time-division sharing and early exits, using a stability score to minimize system-wide SLO violations and P95 latency.
Real-time, work-conserving gpu scheduling for concurrent dnn inference
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
EdgeServing schedules multi-DNN inference on edge GPUs via time-division sharing and early exits, using a stability score to minimize system-wide SLO violations and P95 latency.