Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

Tri Dao, Daniel Y · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.DS · 2026-05-07 · unverdicted · novelty 8.0

ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

cs.LG · 2023-07-17 · accept · novelty 6.0

FlashAttention-2 achieves roughly 2x speedup over FlashAttention by parallelizing attention across thread blocks and distributing work within blocks, reaching 50-73% of theoretical peak FLOPs/s on A100 GPUs.

citing papers explorer

Showing 2 of 2 citing papers.

Nearly Optimal Attention Coresets cs.DS · 2026-05-07 · unverdicted · none · ref 17
ε-coresets for attention exist of size O(√d e^{ρ+o(ρ)}/ε) for unit-norm keys/values and queries of norm ≤ρ, nearly matching the Ω(√d e^ρ/ε) lower bound.
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning cs.LG · 2023-07-17 · accept · none · ref 5
FlashAttention-2 achieves roughly 2x speedup over FlashAttention by parallelizing attention across thread blocks and distributing work within blocks, reaching 50-73% of theoretical peak FLOPs/s on A100 GPUs.

Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

fields

years

verdicts

representative citing papers

citing papers explorer