Title resolution pending

Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, Lili Qiu · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache

cs.DC · 2026-04-07 · unverdicted · novelty 6.0

ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

cs.DC · 2026-04-03 · unverdicted · novelty 6.0

TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.

DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS

cs.DC · 2026-02-21 · unverdicted · novelty 5.0

DualScale reduces energy by up to 39% in prefill and 48% in decode for disaggregated LLM serving while meeting TTFT and TPOT SLOs on a 16x H100 cluster.

citing papers explorer

Showing 3 of 3 citing papers.

ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache cs.DC · 2026-04-07 · unverdicted · none · ref 32
ForkKV uses copy-on-write disaggregated KV cache with DualRadixTree and ResidualAttention kernels to deliver up to 3x throughput over prior multi-LoRA serving systems with negligible quality loss.
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing cs.DC · 2026-04-03 · unverdicted · none · ref 22
TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.
DualScale: Energy-Efficient Disaggregated LLM Serving via Phase-Aware Placement and DVFS cs.DC · 2026-02-21 · unverdicted · none · ref 24
DualScale reduces energy by up to 39% in prefill and 48% in decode for disaggregated LLM serving while meeting TTFT and TPOT SLOs on a 16x H100 cluster.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer