Flash A ttention-3: Fast and accurate attention with asynchrony and low-precision

Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

cs.LG · 2026-05-14 · conditional · novelty 8.0

DualKV is a new FlashAttention variant that shares prompt KV across multiple rollouts in RL training, delivering 1.63-3.82x speedups on 8B-30B models while remaining mathematically identical to standard attention.

Why Attend to Everything? Focus is the Key

cs.CL · 2026-03-12 · conditional · novelty 6.0

Focus learns a few centroids to gate long-range token attention, producing sparse attention that matches or beats full attention quality with up to 8.6x speedup at million-token lengths.

citing papers explorer

Showing 2 of 2 citing papers.

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts cs.LG · 2026-05-14 · conditional · none · ref 19
DualKV is a new FlashAttention variant that shares prompt KV across multiple rollouts in RL training, delivering 1.63-3.82x speedups on 8B-30B models while remaining mathematically identical to standard attention.
Why Attend to Everything? Focus is the Key cs.CL · 2026-03-12 · conditional · none · ref 28
Focus learns a few centroids to gate long-range token attention, producing sparse attention that matches or beats full attention quality with up to 8.6x speedup at million-token lengths.

Flash A ttention-3: Fast and accurate attention with asynchrony and low-precision

fields

years

verdicts

representative citing papers

citing papers explorer