Per-head quantization assigns different scales to each head, capturing head-wise distribution differences and providing a balanced trade-off between accuracy and overhead

Per-tensor quantization applies a single scale to the entire input tensor, offering the simplest implementation, the lowest scale storage cost · arXiv 0120.0240

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

QFlash implements end-to-end integer FlashAttention with integer-only softmax, delivering up to 8.69x speedup and 18.8% energy savings on ViT models while preserving accuracy under per-tensor quantization.

citing papers explorer

Showing 1 of 1 citing paper.

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention cs.LG · 2026-04-28 · unverdicted · none · ref 26
QFlash implements end-to-end integer FlashAttention with integer-only softmax, delivering up to 8.69x speedup and 18.8% energy savings on ViT models while preserving accuracy under per-tensor quantization.

Per-head quantization assigns different scales to each head, capturing head-wise distribution differences and providing a balanced trade-off between accuracy and overhead

fields

years

verdicts

representative citing papers

citing papers explorer