A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
New Berry-Esseen bounds for multivariate martingale difference sequences achieve n^{-1/4} rate and polylog(d) dimension dependence in Kolmogorov distance.
citing papers explorer
-
A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints
A queueing model derives stability conditions for LLM inference services under combined compute and KV cache memory limits, with experimental validation showing typical deviations under 10%.
-
Berry-Esseen bounds for multivariate martingale difference sequences in the Kolmogorov distance
New Berry-Esseen bounds for multivariate martingale difference sequences achieve n^{-1/4} rate and polylog(d) dimension dependence in Kolmogorov distance.