pith. machine review for the scientific record.
sign in
Pith Number

pith:VVRFZRCW

pith:2021:VVRFZRCWXHIAKSEPM3RDDG3WQR
not attested not anchored not stored refs resolved

Improving language models by retrieving from trillions of tokens

Aidan Clark, Albin Cassirer, Andy Brock, Arthur Mensch, Aurelia Guy, Bogdan Damoc, Chris Jones, Diego de las Casas, Eliza Rutherford, Erich Elsen, Geoffrey Irving, George van den Driessche, Jack W. Rae, Jacob Menick, Jean-Baptiste Lespiau, Jordan Hoffmann, Karen Simonyan, Katie Millican, Laurent Sifre, Loren Maggiore, Michela Paganini, Oriol Vinyals, Roman Ring, Saffron Huang, Sebastian Borgeaud, Simon Osindero, Tom Hennigan, Trevor Cai

Retrieval from a 2 trillion token database lets language models match GPT-3 performance with 25 times fewer parameters.

arxiv:2112.04426 v3 · 2021-12-08 · cs.CL · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters.

C2weakest assumption

That nearest-neighbor retrieval based on local similarity with preceding tokens supplies sufficiently relevant and non-redundant information to improve next-token prediction at scale.

C3one line summary

RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.

References

115 extracted · 115 resolved · 8 Pith anchors

[1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security, 2016 2016
[3] A. Baevski and M. Auli. Adaptive input representations for neural language modeling. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ByxZX20qFQ 2019
[5] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In ACM Conference on Fairness, Accountability, and Transparency, 202 2021
[6] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation . Journal of Machine Learning Research, 3 0 (Jan): 0 993--1022, 2003. URL https://jmlr.csail.mit.edu/papers/v3/blei03a.html 2003
[7] J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. V. der P las, S. Wanderman- M ilne, and Q. Zhang. JAX : composable transformations of P ython+ N um 2018

Formal links

2 machine-checked theorem links

Cited by

20 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.036066Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ad625cc456b9d005488f66e2319b76847985dc5137ffa8a42f1d904bb69eb402

Aliases

arxiv: 2112.04426 · arxiv_version: 2112.04426v3 · doi: 10.48550/arxiv.2112.04426 · pith_short_12: VVRFZRCWXHIA · pith_short_16: VVRFZRCWXHIAKSEP · pith_short_8: VVRFZRCW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ad625cc456b9d005488f66e2319b76847985dc5137ffa8a42f1d904bb69eb402
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ade89e09005f15a41d8858000ecff09dbd9391c8d6d3e9684a62c81a75e28f91",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2021-12-08T17:32:34Z",
    "title_canon_sha256": "0662414c808cd1a0033652368760a3b092198299791993413f188906e62b7270"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2112.04426",
    "kind": "arxiv",
    "version": 3
  }
}