HACK++ is a head-aware KV cache compression framework for VAR models that decouples current-scale attention from historical cache under adaptive per-head budgets to achieve near-lossless generation at 30% attention and 10% cache budgets.
Sparvar: Exploring sparsity in visual autoregressive modeling for training-free acceleration
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
HeatKV doubles KV-cache compression ratios over prior methods for VAR models by creating static head-specific pruning schedules from attention rankings on a calibration set, while preserving image quality on Infinity-2B.
citing papers explorer
-
HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling
HeatKV doubles KV-cache compression ratios over prior methods for VAR models by creating static head-specific pruning schedules from attention rankings on a calibration set, while preserving image quality on Infinity-2B.