CacheBlend: Fast large language model serving for RAG with cached knowledge fusion

Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines cs.AI · 2026-05-20 · unverdicted · none · ref 7
Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.

CacheBlend: Fast large language model serving for RAG with cached knowledge fusion

fields

years

verdicts

representative citing papers

citing papers explorer