Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

· 2026 · cs.LG · arXiv 2605.06472

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

LLM-based workflows compose specialized agents to execute complex tasks, and these agents usually share substantial context, allowing KV-Cache reuse to save computation. Existing approaches either manage KV-Cache at agent level and fail to exploit the reuse opportunities within workflows, or manage cache at the workflow level but assume that each workflow calls a static sequence of agents. However, practical workflows are typically dynamic, where the sequence of invoked agents and thus induced cache reuse opportunities depend on the context of each task. To serve such dynamic workflows efficiently, we build a system dubbed PBKV (\textbf{P}rediction-\textbf{B}ased \textbf{KV}-Cache Management). For each workflow, PBKV predicts the agent invocations in several future steps by fusing the guidance from historical workflows and context of the target workflow. Based on the predictions, PBKV estimates the reuse potential of cache entries and keeps the high-potential entries in GPU memory. To be robust to prediction errors, PBKV utilizes the predictions conservatively during both cache eviction and prefetching. Experiments on three workflow benchmarks show that PBKV achieves up to $1.85\times$ speedup over LRU on dynamic workflows, and up to $1.26\times$ speedup over the SOTA baseline KVFlow on the static workflow.

representative citing papers

MiCU: End-to-End Smart Home Command Understanding with Large Language Model

cs.CL · 2026-05-31 · unverdicted · novelty 4.0

MiCU is a domain-adapted LLM for smart-home command understanding that reports 20% average accuracy gains over baselines and is deployed in the Xiaomi Home app.

citing papers explorer

Showing 1 of 1 citing paper.

MiCU: End-to-End Smart Home Command Understanding with Large Language Model cs.CL · 2026-05-31 · unverdicted · none · ref 40 · internal anchor
MiCU is a domain-adapted LLM for smart-home command understanding that reports 20% average accuracy gains over baselines and is deployed in the Xiaomi Home app.

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

fields

years

verdicts

representative citing papers

citing papers explorer