pith. sign in
Pith Number

pith:6OZJYRIW

pith:2024:6OZJYRIWZULFKVUIJ26HJP7DQK
not attested not anchored not stored refs resolved

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Aditi Tuli, Anna Goldie, Christopher D. Manning, Parth Sarthi, Salman Abdullah, Shubh Khanna

Recursive clustering and summarization builds a tree that improves retrieval-augmented reasoning over long documents.

arxiv:2401.18059 v1 · 2024-01-31 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6OZJYRIWZULFKVUIJ26HJP7DQK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

C2weakest assumption

The recursive clustering and summarization process effectively captures and preserves all relevant information from the original document without significant loss or distortion.

C3one line summary

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

References

124 extracted · 124 resolved · 25 Pith anchors

[1] On the S urprising B ehavior of D istance M etrics in H igh D imensional S pace 2001 · doi:10.1007/3-540-44503-x_27
[7] Improving language models by retrieving from trillions of tokens 2022 · arXiv:2112.04426
[8] L anguage M odels are F ew- S hot L earners 1901
[12] PaLM: Scaling Language Modeling with Pathways 2022 · arXiv:2204.02311
[13] Contextualizing citations for scientific summarization using word embeddings and domain knowledge 2017 · doi:10.1145/3077136.3080740

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.495670Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f3b29c4516cd165556884ebc74bfe382b5fb70fce695763d78937257b83660d2

Aliases

arxiv: 2401.18059 · arxiv_version: 2401.18059v1 · doi: 10.48550/arxiv.2401.18059 · pith_short_12: 6OZJYRIWZULF · pith_short_16: 6OZJYRIWZULFKVUI · pith_short_8: 6OZJYRIW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6OZJYRIWZULFKVUIJ26HJP7DQK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f3b29c4516cd165556884ebc74bfe382b5fb70fce695763d78937257b83660d2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4b4e8795849e2575548f7f664081a1408d6d307b2d61c069211c535628b89293",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-01-31T18:30:21Z",
    "title_canon_sha256": "a7ff4b20c4ee1b6df25b8a18f737d4ad1efe8efe0ae5bf44c02ac57cc8869243"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2401.18059",
    "kind": "arxiv",
    "version": 1
  }
}