Accelerating LLM inference throughput via asynchronous KV cache prefetching,

· 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles

cs.DC · 2026-05-19 · unverdicted · novelty 5.0

Reasoning workloads shift LLM inference to a capacity-bound regime where KV-cache fragmentation limits data parallelism, tensor parallelism unlocks memory at the 32B scale, and MoE models require hybrid strategies to avoid routing latency.

citing papers explorer

Showing 1 of 1 citing paper.

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles cs.DC · 2026-05-19 · unverdicted · none · ref 18
Reasoning workloads shift LLM inference to a capacity-bound regime where KV-cache fragmentation limits data parallelism, tensor parallelism unlocks memory at the 32B scale, and MoE models require hybrid strategies to avoid routing latency.

Accelerating LLM inference throughput via asynchronous KV cache prefetching,

fields

years

verdicts

representative citing papers

citing papers explorer