Orthogonal Backfill compression for latent KV caches in multi-agent LLMs reduces communication by 79.8-89.4% while achieving comparable or superior performance to full relay on 7 of 9 benchmarks.
Razorattention: Efficient kv cache compression through retrieval heads, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
Orthogonal Backfill compression for latent KV caches in multi-agent LLMs reduces communication by 79.8-89.4% while achieving comparable or superior performance to full relay on 7 of 9 benchmarks.