FlashAttention-2 achieves roughly 2x speedup over FlashAttention by parallelizing attention across thread blocks and distributing work within blocks, reaching 50-73% of theoretical peak FLOPs/s on A100 GPUs.
Reformer: The efficient transformer
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2023 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
FlashAttention-2 achieves roughly 2x speedup over FlashAttention by parallelizing attention across thread blocks and distributing work within blocks, reaching 50-73% of theoretical peak FLOPs/s on A100 GPUs.