GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.
Reducing transformer key-value cache size with cross-layer attention,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.SP 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Graph-Guided Adaptive Channel Elimination for KV Cache Compression
GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.