QAPruner introduces a hybrid sensitivity metric that combines group-wise quantization error simulation and outlier intensity with semantic scores to prune visual tokens, yielding 2.24% higher accuracy than naive baselines at 12.5% token retention on LLaVA models while surpassing dense low-bit models
In: NIPS (2022)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
QAPruner introduces a hybrid sensitivity metric that combines group-wise quantization error simulation and outlier intensity with semantic scores to prune visual tokens, yielding 2.24% higher accuracy than naive baselines at 12.5% token retention on LLaVA models while surpassing dense low-bit models