pith. sign in

Just read twice: closing the recall gap for recurrent language models, 2024 b

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.CL 2 cs.LG 2

verdicts

UNVERDICTED 4

roles

dataset 1

polarities

use dataset 1

representative citing papers

Transformers with Selective Access to Early Representations

cs.LG · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

citing papers explorer

Showing 4 of 4 citing papers.