Preserve: Prefetching model weights and kv-cache in distributed llm serving.arXiv preprint arXiv:2501.08192, 2025

Ahmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli · 2025 · arXiv 2501.08192

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads

cs.DC · 2026-06-01 · unverdicted · novelty 6.0

Albireo overlaps non-scalable overheads with compute in tensor-parallel LLM inference to raise the empirical optimal TP degree, delivering up to 1.9x throughput and 48% lower latency versus vLLM.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads cs.DC · 2026-06-01 · unverdicted · none · ref 53
Albireo overlaps non-scalable overheads with compute in tensor-parallel LLM inference to raise the empirical optimal TP degree, delivering up to 1.9x throughput and 48% lower latency versus vLLM.

Preserve: Prefetching model weights and kv-cache in distributed llm serving.arXiv preprint arXiv:2501.08192, 2025

fields

years

verdicts

representative citing papers

citing papers explorer