SwapLess reduces mean inference latency by up to 63.8% single-tenant and 77.4% multi-tenant on Edge TPUs by online adjustment of model partition points and CPU allocation using a queueing model that accounts for swapping costs.
Harchol-Balter,Performance Modeling and Design of Computer Systems: Queueing Theory in Action
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A risk-aware edge server selection method using latency mean, uncertainty, Normal approximation plus Cantelli bound for SLO violation risk, and hysteresis-based percentile scoring reduces deadline misses by 5 points and switching by 88% in testbed experiments.
citing papers explorer
-
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
SwapLess reduces mean inference latency by up to 63.8% single-tenant and 77.4% multi-tenant on Edge TPUs by online adjustment of model partition points and CPU allocation using a queueing model that accounts for swapping costs.
-
Risk-Aware and Stable Edge Server Selection Under Network Latency SLOs
A risk-aware edge server selection method using latency mean, uncertainty, Normal approximation plus Cantelli bound for SLO violation risk, and hysteresis-based percentile scoring reduces deadline misses by 5 points and switching by 88% in testbed experiments.