ObjectCache enables KV cache storage in object storage via layerwise retrieval and custom scheduling, adding 5.6% latency for 64K contexts over local DRAM on a 100 Gbps RoCE cluster.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2verdicts
UNVERDICTED 2representative citing papers
Amoeba adaptively adjusts tensor parallelism at runtime for LLM inference services to handle mixed short and long context requests, delivering 1.75x-6.57x throughput gains over prior solutions in real-world trace evaluations.
citing papers explorer
-
ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse
ObjectCache enables KV cache storage in object storage via layerwise retrieval and custom scheduling, adding 5.6% latency for 64K contexts over local DRAM on a 100 Gbps RoCE cluster.
-
Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
Amoeba adaptively adjusts tensor parallelism at runtime for LLM inference services to handle mixed short and long context requests, delivering 1.75x-6.57x throughput gains over prior solutions in real-world trace evaluations.