Int-flashattention: Enabling flash attention for int8 quantization.arXiv preprint arXiv:2409.16997

[Chenet al · 2024 · arXiv 2409.16997

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

QFlash implements end-to-end integer FlashAttention with integer-only softmax, delivering up to 8.69x speedup and 18.8% energy savings on ViT models while preserving accuracy under per-tensor quantization.

citing papers explorer

Showing 1 of 1 citing paper.

QFlash: Bridging Quantization and Memory Efficiency in Vision Transformer Attention cs.LG · 2026-04-28 · unverdicted · none · ref 3
QFlash implements end-to-end integer FlashAttention with integer-only softmax, delivering up to 8.69x speedup and 18.8% energy savings on ViT models while preserving accuracy under per-tensor quantization.

Int-flashattention: Enabling flash attention for int8 quantization.arXiv preprint arXiv:2409.16997

fields

years

verdicts

representative citing papers

citing papers explorer