pith:TOWXDDJW
REALM: Retrieval-Augmented Language Model Pre-Training
Language models pre-trained with an integrated retriever over a document corpus outperform prior methods on open-domain question answering by 4 to 16 percent.
arxiv:2002.08909 v1 · 2020-02-10 · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TOWXDDJWQSC2EPTIXWHQYSZV5P}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy).
That back-propagation through a retrieval step over millions of documents is numerically stable and provides a useful unsupervised learning signal for the retriever parameters.
REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:52.829438Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9bad718d368485a23e68bd8f0c4b35ebf7fe612a69c5579e3ba4838f495dc45e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TOWXDDJWQSC2EPTIXWHQYSZV5P \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9bad718d368485a23e68bd8f0c4b35ebf7fe612a69c5579e3ba4838f495dc45e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a232dfc972dce5d964abce327735d56ef564c47b6287fe4dc0ed536c127173cb",
"cross_cats_sorted": [
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2020-02-10T18:40:59Z",
"title_canon_sha256": "4cc1d8fa32eb3843fc491ea769947edd01955ffa807b5f78126a614095724e51"
},
"schema_version": "1.0",
"source": {
"id": "2002.08909",
"kind": "arxiv",
"version": 1
}
}