HACK++ is a head-aware KV cache compression framework for VAR models that decouples current-scale attention from historical cache under adaptive per-head budgets to achieve near-lossless generation at 30% attention and 10% cache budgets.
Collaborative decod- ing makes visual auto-regressive modeling efficient,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling
HACK++ is a head-aware KV cache compression framework for VAR models that decouples current-scale attention from historical cache under adaptive per-head budgets to achieve near-lossless generation at 30% attention and 10% cache budgets.