Scatterbrain: Unifying sparse and low-rank attention

Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

cs.LG · 2024-07-11 · accept · novelty 7.0

FlashAttention-3 achieves 1.5-2x speedup on H100 GPUs for attention, reaching 740 TFLOPs/s (75% utilization) in FP16 and near 1.2 PFLOPs/s in FP8 while cutting numerical error by 2.6x versus baseline FP8 attention.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

cs.LG · 2022-05-27 · accept · novelty 7.0

FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on sequences up to 64K long.

citing papers explorer

Showing 2 of 2 citing papers.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision cs.LG · 2024-07-11 · accept · none · ref 10
FlashAttention-3 achieves 1.5-2x speedup on H100 GPUs for attention, reaching 740 TFLOPs/s (75% utilization) in FP16 and near 1.2 PFLOPs/s in FP8 while cutting numerical error by 2.6x versus baseline FP8 attention.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness cs.LG · 2022-05-27 · accept · none · ref 9
FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on sequences up to 64K long.

Scatterbrain: Unifying sparse and low-rank attention

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer