EdgeServing schedules multi-DNN inference on edge GPUs via time-division sharing and early exits, using a stability score to minimize system-wide SLO violations and P95 latency.
Shallow-deep networks: Understanding and mitigating network overthinking
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Combining pruning, quantization, and early exits in CNNs reduces inference latency and memory on real edge devices with minimal accuracy loss.
citing papers explorer
-
EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
EdgeServing schedules multi-DNN inference on edge GPUs via time-division sharing and early exits, using a stability score to minimize system-wide SLO violations and P95 latency.
-
A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits
Combining pruning, quantization, and early exits in CNNs reduces inference latency and memory on real edge devices with minimal accuracy loss.