CacheGen: KV cache compression and streaming for fast large language model serving.Proceedings of the ACM SIGCOMM 2024 Conference, pages 38–56, 2024

Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines cs.AI · 2026-05-20 · unverdicted · none · ref 9
Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.

CacheGen: KV cache compression and streaming for fast large language model serving.Proceedings of the ACM SIGCOMM 2024 Conference, pages 38–56, 2024

fields

years

verdicts

representative citing papers

citing papers explorer