DriftSched adds online calibration to correct token budget estimates in multi-tenant GPU inference and reports that SJF reduces median latency by ~42% versus FIFO.
GDEV-AI: A Generalized Evaluation of Deep Learn- ing Inference Scaling and Architectural Saturation,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.PF 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference
DriftSched adds online calibration to correct token budget estimates in multi-tenant GPU inference and reports that SJF reduces median latency by ~42% versus FIFO.