The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations

URL http://arxiv · 2015 · cs.CL · arXiv 1511.02301

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled. Further, the attention over such window-based memories can be trained effectively through self-supervision. We then assess the generality of this principle by applying it to the CNN QA benchmark, which involves identifying named entities in paraphrased summaries of news articles, and achieve state-of-the-art performance.

representative citing papers

Reformer: The Efficient Transformer

cs.LG · 2020-01-13 · accept · novelty 8.0

Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

cs.CL · 2017-05-09 · accept · novelty 8.0

TriviaQA is a new large-scale dataset for reading comprehension that features complex compositional questions, high lexical variability, and cross-sentence reasoning requirements, where current baselines reach only 40% while humans reach 80%.

Language Models as Knowledge Bases?

cs.CL · 2019-09-03 · accept · novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.

Compressive Transformers for Long-Range Sequence Modelling

cs.LG · 2019-11-13 · unverdicted · novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training

cs.AR · 2026-04-12 · unverdicted · novelty 6.0

LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.

EQuANt (Enhanced Question Answer Network)

cs.CL · 2019-06-24 · unverdicted · novelty 4.0

EQuANt extends QANet to SQuAD 2, achieving nearly twice the performance of a lightweight QANet baseline while also improving SQuAD 1.1 results via multi-task learning.

Machine Reading Comprehension: a Literature Review

cs.CL · 2019-06-30 · unverdicted · novelty 1.0

A 2019 survey of machine reading comprehension corpora and methods.

citing papers explorer

Showing 7 of 7 citing papers.

Reformer: The Efficient Transformer cs.LG · 2020-01-13 · accept · none · ref 9
Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension cs.CL · 2017-05-09 · accept · none · ref 12
TriviaQA is a new large-scale dataset for reading comprehension that features complex compositional questions, high lexical variability, and cross-sentence reasoning requirements, where current baselines reach only 40% while humans reach 80%.
Language Models as Knowledge Bases? cs.CL · 2019-09-03 · accept · none · ref 220 · internal anchor
BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
Compressive Transformers for Long-Range Sequence Modelling cs.LG · 2019-11-13 · unverdicted · none · ref 125 · internal anchor
Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.
LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training cs.AR · 2026-04-12 · unverdicted · none · ref 31
LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.
EQuANt (Enhanced Question Answer Network) cs.CL · 2019-06-24 · unverdicted · none · ref 3 · internal anchor
EQuANt extends QANet to SQuAD 2, achieving nearly twice the performance of a lightweight QANet baseline while also improving SQuAD 1.1 results via multi-task learning.
Machine Reading Comprehension: a Literature Review cs.CL · 2019-06-30 · unverdicted · none · ref 17 · internal anchor
A 2019 survey of machine reading comprehension corpora and methods.

The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations

fields

years

verdicts

representative citing papers

citing papers explorer