hub

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican · 2021 · cs.CL · arXiv 2112.04426

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

open full Pith review browse 21 citing papers arXiv PDF

abstract

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine

q-bio.MN · 2026-01-31 · unverdicted · novelty 6.0

RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

cs.CL · 2023-05-03 · conditional · novelty 6.0

Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.

REPLUG: Retrieval-Augmented Black-Box Language Models

cs.CL · 2023-01-30 · conditional · novelty 6.0

REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.

Atlas: Few-shot Learning with Retrieval Augmented Language Models

cs.CL · 2022-08-05 · unverdicted · novelty 6.0 · 2 refs

Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

cs.CL · 2022-04-12 · unverdicted · novelty 6.0

RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0 · 2 refs

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks

cs.SE · 2026-04-19 · unverdicted · novelty 4.0

KnowPilot integrates knowledge retrieval and memory systems into generative agents to achieve better results on domain-specific tasks such as text generation.

Less LLM, More Documents: Searching for Improved RAG

cs.IR · 2025-10-03 · unverdicted · novelty 4.0

Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

cs.AI · 2026-05-21

citing papers explorer

Showing 21 of 21 citing papers.

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation cs.IR · 2026-04-15 · unverdicted · none · ref 4 · internal anchor
A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.
A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 11 · internal anchor
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 272 · internal anchor
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 3 · internal anchor
SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.
When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 100 · internal anchor
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine q-bio.MN · 2026-01-31 · unverdicted · none · ref 21 · internal anchor
RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval cs.CL · 2024-01-31 · unverdicted · none · ref 7 · internal anchor
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes cs.CL · 2023-05-03 · conditional · none · ref 22 · internal anchor
Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.
REPLUG: Retrieval-Augmented Black-Box Language Models cs.CL · 2023-01-30 · conditional · none · ref 21 · internal anchor
REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.
Atlas: Few-shot Learning with Retrieval Augmented Language Models cs.CL · 2022-08-05 · unverdicted · none · ref 74 · 2 links · internal anchor
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 30 · internal anchor
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 12 · internal anchor
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback cs.CL · 2022-04-12 · unverdicted · none · ref 4 · internal anchor
RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 19 · internal anchor
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 41 · internal anchor
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
Small Language Models are the Future of Agentic AI cs.AI · 2025-06-02 · unverdicted · none · ref 10 · internal anchor
Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 25 · 2 links · internal anchor
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 4 · internal anchor
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks cs.SE · 2026-04-19 · unverdicted · none · ref 2 · internal anchor
KnowPilot integrates knowledge retrieval and memory systems into generative agents to achieve better results on domain-specific tasks such as text generation.
Less LLM, More Documents: Searching for Improved RAG cs.IR · 2025-10-03 · unverdicted · none · ref 2 · internal anchor
Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning cs.AI · 2026-05-21 · unreviewed · ref 1 · internal anchor

Improving language models by retrieving from trillions of tokens

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer