Pith Number

pith:E3E5YQVF

pith:2023:E3E5YQVFQOX2S36HL2ZHEROPSO

not attested not anchored not stored refs resolved

REPLUG: Retrieval-Augmented Black-Box Language Models

Luke Zettlemoyer, Michihiro Yasunaga, Mike Lewis, Minjoon Seo, Rich James, Sewon Min, Weijia Shi, Wen-tau Yih

REPLUG augments frozen black-box LMs like GPT-3 with a tunable retriever by prepending documents and training the retriever on the LM's own predictions.

arxiv:2301.12652 v4 · 2023-01-30 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{E3E5YQVFQOX2S36HL2ZHEROPSO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.

C2weakest assumption

That the frozen LM can reliably supervise the retriever to surface documents that genuinely improve its own predictions without introducing evaluation bias or requiring task-specific labels.

C3one line summary

REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.

References

300 extracted · 300 resolved · 16 Pith anchors

[1] International Conference on Machine Learning , pages= 2022

[2] 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , year= 2017

[3] Meta AI , year=

[4] Yuan 1.0: Large- scale pre-trained language model in zero-shot and few-shot learning

[5] Language Models are Few-Shot Learners , url =

Formal links

1 machine-checked theorem link

Cited by

31 papers in Pith

Retrieval-Augmented Generation for Large Language Models: A Survey

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Receipt and verification

First computed	2026-05-17T23:38:14.073876Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

26c9dc42a583afa96fc75eb27245cf938a2ff384cee2bb3a0697f22b966fdff9

Aliases

arxiv: 2301.12652 · arxiv_version: 2301.12652v4 · doi: 10.48550/arxiv.2301.12652 · pith_short_12: E3E5YQVFQOX2 · pith_short_16: E3E5YQVFQOX2S36H · pith_short_8: E3E5YQVF

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/E3E5YQVFQOX2S36HL2ZHEROPSO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 26c9dc42a583afa96fc75eb27245cf938a2ff384cee2bb3a0697f22b966fdff9

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "73a98e164e6404904005b5333a959035c0df5b588250f9770de74d53af6aac50",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-01-30T04:18:09Z",
    "title_canon_sha256": "fa983c86e0d5244a202381984860f5408f78ee6d60eb1d653ba566c65bd3d062"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2301.12652",
    "kind": "arxiv",
    "version": 4
  }
}