The LAMBADA dataset: Word prediction requiring a broad discourse context

Paperno, D · 2016

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Position-Agnostic Pre-Projection for Transformer Attention: Nonlinear Feature Construction and Content Skip Before Q/K/V

cs.CL · 2026-04-12 · unverdicted · novelty 6.0

A position-agnostic nonlinear pre-projection MLP plus content skip connection in transformer attention improves LAMBADA accuracy by 40.6% and reduces perplexity by 39% on 160M-scale models.

citing papers explorer

Showing 1 of 1 citing paper.

Position-Agnostic Pre-Projection for Transformer Attention: Nonlinear Feature Construction and Content Skip Before Q/K/V cs.CL · 2026-04-12 · unverdicted · none · ref 8
A position-agnostic nonlinear pre-projection MLP plus content skip connection in transformer attention improves LAMBADA accuracy by 40.6% and reduces perplexity by 39% on 160M-scale models.

The LAMBADA dataset: Word prediction requiring a broad discourse context

fields

years

verdicts

representative citing papers

citing papers explorer