Joint resource allocation and routing for multi-model LLM serving can produce up to 87% variation in achievable output quality across setups on the same GPU cluster.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving
Joint resource allocation and routing for multi-model LLM serving can produce up to 87% variation in achievable output quality across setups on the same GPU cluster.