GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.
Smoothquant: Accurate and efficient post-training quantization for large language models,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.SP 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Graph-Guided Adaptive Channel Elimination for KV Cache Compression
GRACE reframes KV cache channel pruning as graph optimization to find a near-optimal subset, achieving 60% compression with negligible degradation and outperforming prior methods.