BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.
arXiv preprint arXiv:2011.06327 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Tackling the Data-Parallel Load Balancing Bottleneck in LLM Serving: Practical Online Routing at Scale
BalanceRoute uses a piecewise-linear F-score (with optional short lookahead) for sticky request routing in LLM serving, reducing DP imbalance and raising end-to-end throughput versus vLLM baselines on production and Azure traces.