Dualpath: Breaking the storage bandwidth bottleneck in agentic llm inference

Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang · 2026 · arXiv 2602.21548

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

Hive is a multi-agent infrastructure with a logits cache for reducing cross-path redundancy in sampling and agent-aware scheduling for better compute and KV-cache allocation, shown to deliver 1.11x-1.76x speedups and 33%-51% lower hotspot miss rates.

ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

cs.DC · 2026-04-07 · unverdicted · novelty 6.0

ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.

KernelFlume: Elastic Core-Attention Scaling for Agentic Long-Context Decoding

cs.DC · 2026-06-28 · unverdicted · novelty 5.0

KernelFlume presents a disaggregated decode architecture that separates core attention from projection/FFN paths to enable elastic scaling of attention nodes, reporting up to 61% lower cost per million tokens versus full-instance scaling on H100 hardware for Llama-3.1-8B under dynamic long-context w

A Token/KV-Cache Communication Media Selection and Resource Allocation Strategy for Multi-Agent Collaboration

eess.SP · 2026-05-25 · unverdicted · novelty 5.0

A joint media selection and resource allocation algorithm (JMSRA) adaptively chooses token or KV-cache transmission and bandwidth allocation to reduce E2E latency compared to fixed baselines in wireless multi-agent systems.

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

citing papers explorer

Showing 5 of 5 citing papers.

Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling cs.AI · 2026-04-19 · unverdicted · none · ref 40
Hive is a multi-agent infrastructure with a logits cache for reducing cross-path redundancy in sampling and agent-aware scheduling for better compute and KV-cache allocation, shown to deliver 1.11x-1.76x speedups and 33%-51% lower hotspot miss rates.
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache cs.DC · 2026-04-07 · unverdicted · none · ref 63
ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.
KernelFlume: Elastic Core-Attention Scaling for Agentic Long-Context Decoding cs.DC · 2026-06-28 · unverdicted · none · ref 37
KernelFlume presents a disaggregated decode architecture that separates core attention from projection/FFN paths to enable elastic scaling of attention nodes, reporting up to 61% lower cost per million tokens versus full-instance scaling on H100 hardware for Llama-3.1-8B under dynamic long-context w
A Token/KV-Cache Communication Media Selection and Resource Allocation Strategy for Multi-Agent Collaboration eess.SP · 2026-05-25 · unverdicted · none · ref 27
A joint media selection and resource allocation algorithm (JMSRA) adaptively chooses token or KV-cache transmission and bandwidth allocation to reduce E2E latency compared to fixed baselines in wireless multi-agent systems.
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management cs.LG · 2026-05-07 · unverdicted · none · ref 7
PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

Dualpath: Breaking the storage bandwidth bottleneck in agentic llm inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer