{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2021:VVRFZRCWXHIAKSEPM3RDDG3WQR","short_pith_number":"pith:VVRFZRCW","schema_version":"1.0","canonical_sha256":"ad625cc456b9d005488f66e2319b76847985dc5137ffa8a42f1d904bb69eb402","source":{"kind":"arxiv","id":"2112.04426","version":3},"attestation_state":"computed","paper":{"title":"Improving language models by retrieving from trillions of tokens","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Retrieval from a 2 trillion token database lets language models match GPT-3 performance with 25 times fewer parameters.","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Aidan Clark, Albin Cassirer, Andy Brock, Arthur Mensch, Aurelia Guy, Bogdan Damoc, Chris Jones, Diego de las Casas, Eliza Rutherford, Erich Elsen, Geoffrey Irving, George van den Driessche, Jack W. Rae, Jacob Menick, Jean-Baptiste Lespiau, Jordan Hoffmann, Karen Simonyan, Katie Millican, Laurent Sifre, Loren Maggiore, Michela Paganini, Oriol Vinyals, Roman Ring, Saffron Huang, Sebastian Borgeaud, Simon Osindero, Tom Hennigan, Trevor Cai","submitted_at":"2021-12-08T17:32:34Z","abstract_excerpt":"We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an o"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2112.04426","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2021-12-08T17:32:34Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"0662414c808cd1a0033652368760a3b092198299791993413f188906e62b7270","abstract_canon_sha256":"ade89e09005f15a41d8858000ecff09dbd9391c8d6d3e9684a62c81a75e28f91"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:14.036733Z","signature_b64":"TFJ9nriGEvalInB3XZEJi7GXRUQzuR1mL9l427S1oDW+jTL+x/NSz+U1c+DvvBZSvEEQfXdc1x0WwA58gwInAQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ad625cc456b9d005488f66e2319b76847985dc5137ffa8a42f1d904bb69eb402","last_reissued_at":"2026-05-17T23:38:14.036066Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:14.036066Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Improving language models by retrieving from trillions of tokens","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Retrieval from a 2 trillion token database lets language models match GPT-3 performance with 25 times fewer parameters.","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Aidan Clark, Albin Cassirer, Andy Brock, Arthur Mensch, Aurelia Guy, Bogdan Damoc, Chris Jones, Diego de las Casas, Eliza Rutherford, Erich Elsen, Geoffrey Irving, George van den Driessche, Jack W. Rae, Jacob Menick, Jean-Baptiste Lespiau, Jordan Hoffmann, Karen Simonyan, Katie Millican, Laurent Sifre, Loren Maggiore, Michela Paganini, Oriol Vinyals, Roman Ring, Saffron Huang, Sebastian Borgeaud, Simon Osindero, Tom Hennigan, Trevor Cai","submitted_at":"2021-12-08T17:32:34Z","abstract_excerpt":"We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an o"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That nearest-neighbor retrieval based on local similarity with preceding tokens supplies sufficiently relevant and non-redundant information to improve next-token prediction at scale.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Retrieval from a 2 trillion token database lets language models match GPT-3 performance with 25 times fewer parameters.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"9ce38e4199f9da8e8672722e3267f603abfb28767d58094123200af4a60bc16b"},"source":{"id":"2112.04426","kind":"arxiv","version":3},"verdict":{"id":"c2b40f6c-01fa-43a8-8dc8-c5829ccd16a7","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T12:50:10.584771Z","strongest_claim":"With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters.","one_line_summary":"RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That nearest-neighbor retrieval based on local similarity with preceding tokens supplies sufficiently relevant and non-redundant information to improve next-token prediction at scale.","pith_extraction_headline":"Retrieval from a 2 trillion token database lets language models match GPT-3 performance with 25 times fewer parameters."},"references":{"count":115,"sample":[{"doi":"","year":2016,"title":"M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security, 2016","work_id":"58cb949a-d4d0-4d36-8e64-cbaf72a383bc","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"A. Baevski and M. Auli. Adaptive input representations for neural language modeling. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ByxZX20qFQ","work_id":"0eba6dc3-5447-4995-bde9-4e9d4b16d8c9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In ACM Conference on Fairness, Accountability, and Transparency, 202","work_id":"06746417-8384-498d-a63b-83b9dd5cd00f","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2003,"title":"D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation . Journal of Machine Learning Research, 3 0 (Jan): 0 993--1022, 2003. URL https://jmlr.csail.mit.edu/papers/v3/blei03a.html","work_id":"4a09d8db-3a2b-4bde-bac9-434a09d5e807","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. V. der P las, S. Wanderman- M ilne, and Q. Zhang. JAX : composable transformations of P ython+ N um","work_id":"a93701d4-1bcc-419f-9557-2f43fff982f9","ref_index":7,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":115,"snapshot_sha256":"f9f1ce20b866c542128dfe2338a6588cb76fca73be7f4c37657a5c69bc562389","internal_anchors":8},"formal_canon":{"evidence_count":2,"snapshot_sha256":"54c2ae5bef4b69ed7234539379aa6620b02e179153438447c02c01c348f23770"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2112.04426","created_at":"2026-05-17T23:38:14.036158+00:00"},{"alias_kind":"arxiv_version","alias_value":"2112.04426v3","created_at":"2026-05-17T23:38:14.036158+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2112.04426","created_at":"2026-05-17T23:38:14.036158+00:00"},{"alias_kind":"pith_short_12","alias_value":"VVRFZRCWXHIA","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"VVRFZRCWXHIAKSEP","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"VVRFZRCW","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":20,"internal_anchor_count":20,"sample":[{"citing_arxiv_id":"2510.02657","citing_title":"Less LLM, More Documents: Searching for Improved RAG","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2301.12652","citing_title":"REPLUG: Retrieval-Augmented Black-Box Language Models","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2208.03299","citing_title":"Atlas: Few-shot Learning with Retrieval Augmented Language Models","ref_index":175,"is_internal_anchor":true},{"citing_arxiv_id":"2208.03299","citing_title":"Atlas: Few-shot Learning with Retrieval Augmented Language Models","ref_index":74,"is_internal_anchor":true},{"citing_arxiv_id":"2506.02153","citing_title":"Small Language Models are the Future of Agentic AI","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2602.00586","citing_title":"RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2401.18059","citing_title":"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2205.06175","citing_title":"A Generalist Agent","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2211.09085","citing_title":"Galactica: A Large Language Model for Science","ref_index":147,"is_internal_anchor":true},{"citing_arxiv_id":"2211.09085","citing_title":"Galactica: A Large Language Model for Science","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2201.08239","citing_title":"LaMDA: Language Models for Dialog Applications","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24334","citing_title":"Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2206.07682","citing_title":"Emergent Abilities of Large Language Models","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2204.02311","citing_title":"PaLM: Scaling Language Modeling with Pathways","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2205.01068","citing_title":"OPT: Open Pre-trained Transformer Language Models","ref_index":272,"is_internal_anchor":true},{"citing_arxiv_id":"2207.05221","citing_title":"Language Models (Mostly) Know What They Know","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2204.05862","citing_title":"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14403","citing_title":"A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19820","citing_title":"KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23593","citing_title":"When AI reviews science: Can we trust the referee?","ref_index":100,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR","json":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR.json","graph_json":"https://pith.science/api/pith-number/VVRFZRCWXHIAKSEPM3RDDG3WQR/graph.json","events_json":"https://pith.science/api/pith-number/VVRFZRCWXHIAKSEPM3RDDG3WQR/events.json","paper":"https://pith.science/paper/VVRFZRCW"},"agent_actions":{"view_html":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR","download_json":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR.json","view_paper":"https://pith.science/paper/VVRFZRCW","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2112.04426&json=true","fetch_graph":"https://pith.science/api/pith-number/VVRFZRCWXHIAKSEPM3RDDG3WQR/graph.json","fetch_events":"https://pith.science/api/pith-number/VVRFZRCWXHIAKSEPM3RDDG3WQR/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR/action/timestamp_anchor","attest_storage":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR/action/storage_attestation","attest_author":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR/action/author_attestation","sign_citation":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR/action/citation_signature","submit_replication":"https://pith.science/pith/VVRFZRCWXHIAKSEPM3RDDG3WQR/action/replication_record"}},"created_at":"2026-05-17T23:38:14.036158+00:00","updated_at":"2026-05-17T23:38:14.036158+00:00"}