KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM- based Multi-agent Systems

Ye, H · 2025 · arXiv 2510.12872

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

QKVShare enables efficient quantized KV-cache handoff for on-device multi-agent LLMs, cutting TTFT versus re-prefill across tested contexts while adaptive quantization stays competitive with uniform baselines on GSM8K.

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

cs.LG · 2026-04-27 · conditional · novelty 6.0

A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

cs.DC · 2026-04-03 · unverdicted · novelty 6.0

TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.

Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey

eess.SY · 2026-04-09 · unverdicted · novelty 4.0

The paper surveys energy efficiency strategies for Agentic AI inference by proposing a new accounting framework and taxonomy that spans model simplification, computation control, input optimization, and cross-layer co-design with wireless networks.

citing papers explorer

Showing 5 of 5 citing papers.

LACO: Adaptive Latent Communication for Collaborative Driving cs.AI · 2026-05-21 · unverdicted · none · ref 41
LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.
QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs cs.AI · 2026-05-05 · unverdicted · none · ref 1
QKVShare enables efficient quantized KV-cache handoff for on-device multi-agent LLMs, cutting TTFT versus re-prefill across tested contexts while adaptive quantization stays competitive with uniform baselines on GSM8K.
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference cs.LG · 2026-04-27 · conditional · none · ref 8
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing cs.DC · 2026-04-03 · unverdicted · none · ref 41
TokenDance scales multi-agent LLM serving to 2.7x more concurrent agents by collective KV cache reuse and block-sparse diff encoding that achieves 11-17x compression.
Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey eess.SY · 2026-04-09 · unverdicted · none · ref 125
The paper surveys energy efficiency strategies for Agentic AI inference by proposing a new accounting framework and taxonomy that spans model simplification, computation control, input optimization, and cross-layer co-design with wireless networks.

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM- based Multi-agent Systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer