For evaluation, we report perplexity on 2.5B tokens from the Book-3 dataset (Gao et al., 2020)

configuration · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Test-Time Training with KV Binding Is Secretly Linear Attention

cs.LG · 2026-02-24 · conditional · novelty 8.0

Test-time training with KV binding reduces to learned linear attention.

citing papers explorer

Showing 1 of 1 citing paper.

Test-Time Training with KV Binding Is Secretly Linear Attention cs.LG · 2026-02-24 · conditional · none · ref 23
Test-time training with KV binding reduces to learned linear attention.