D† =1 means the attention operator or block without the recurrence

The latency (ms) is tested on 262K sequences of tokens · 2048

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

cs.LG · 2026-02-20 · unverdicted · novelty 6.0 · 2 refs

RAT+ pretrains a dense recurrent-augmented attention model once and enables flexible switching to dilated or hybrid sparse attention at inference after short adaptation, with small accuracy loss at high dilation factors.

citing papers explorer

Showing 1 of 1 citing paper.

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference cs.LG · 2026-02-20 · unverdicted · none · ref 41 · 2 links
RAT+ pretrains a dense recurrent-augmented attention model once and enables flexible switching to dilated or hybrid sparse attention at inference after short adaptation, with small accuracy loss at high dilation factors.

D† =1 means the attention operator or block without the recurrence

fields

years

verdicts

representative citing papers

citing papers explorer