Taming throughput-latency tradeoff in llm inference with sarathi.18th USENIX Symposium on Operating Systems Design and Imple- mentation, 2024

Animesh Agrawal et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.DC · 2026-05-06 · unverdicted · novelty 6.0

Nitsum dynamically adapts tensor parallelism and GPU splits in LLM serving to raise SLO-compliant goodput by up to 5.3 times over prior systems.

Showing 1 of 1 citing paper.

Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism cs.DC · 2026-05-06 · unverdicted · none · ref 1
Nitsum dynamically adapts tensor parallelism and GPU splits in LLM serving to raise SLO-compliant goodput by up to 5.3 times over prior systems.