pith:Y4AG2MRS
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
Frontier language models extract full experiments from papers 0.37 F1 better than multi-turn pipelines by tying measurements to processing steps.
arxiv:2604.07649 v4 · 2026-04-08 · cs.IR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y4AG2MRSBBK4P3TIVF5NRCUW56}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
frontier language models, such as Gemini 3.1 Pro Preview, outperform existing multi-turn extraction pipelines by up to 0.37 F1. Our results suggest that this performance gap arises because extraction pipelines associate measurements with compositions rather than the processing steps that define a material.
That the 19 alloy papers and 1426 measurements in LitXAlloy form a representative and unbiased sample of real-world extraction challenges, and that the observed F1 gap is causally due to the association with processing steps rather than other factors like prompt design or model scale.
LitXBench shows frontier LLMs like Gemini 3.1 Pro Preview outperform extraction pipelines by 0.37 F1 because they link measurements to processing steps rather than just compositions.
Receipt and verification
| First computed | 2026-05-20T00:05:44.461720Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c7006d32320855c7ee68a97ad88a96efbd786a0e2fcf5b172a6708d8ebfa2b1d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y4AG2MRSBBK4P3TIVF5NRCUW56 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c7006d32320855c7ee68a97ad88a96efbd786a0e2fcf5b172a6708d8ebfa2b1d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "703fb0a62f6c6d2b4abed635a73a94ca4c3812e135768350574490889296af3e",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.IR",
"submitted_at": "2026-04-08T23:31:31Z",
"title_canon_sha256": "672b0eebe7710de71e1ceb84bdc74e8939844f7a745f9525e5b398750ce87187"
},
"schema_version": "1.0",
"source": {
"id": "2604.07649",
"kind": "arxiv",
"version": 4
}
}