Albireo overlaps non-scalable overheads with compute in tensor-parallel LLM inference to raise the empirical optimal TP degree, delivering up to 1.9x throughput and 48% lower latency versus vLLM.
Importance of a search strategy in neural dialogue modelling.arXiv preprint arXiv:1811.00907, 2, 2018
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads
Albireo overlaps non-scalable overheads with compute in tensor-parallel LLM inference to raise the empirical optimal TP degree, delivering up to 1.9x throughput and 48% lower latency versus vLLM.