pith:G2XA7TKE
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks
LongBench v2 shows current LLMs score 50% on long-context reasoning tasks while reasoning models exceed the 54% human baseline.
arxiv:2412.15204 v2 · 2024-12-19 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{G2XA7TKEYV53AO5KYTG6OUGBP5}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The best-performing model, when directly answers the questions, achieves only 50.1% accuracy. In contrast, the o1-preview model, which includes longer reasoning, achieves 57.7%, surpassing the human baseline by 4%.
That the 503 questions genuinely require deep understanding and multi-step reasoning rather than being solvable through surface cues or training-data leakage, and that the 15-minute human time limit produces a fair comparison to model performance.
LongBench v2 benchmark shows current LLMs underperform humans on deep long-context reasoning tasks, but extended inference-time reasoning enables surpassing the human baseline.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.654233Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
36ae0fcd44c57bb03baac4cde750c17f5fc633d6e2b8c874bcd85ed980ec3b75
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/G2XA7TKEYV53AO5KYTG6OUGBP5 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 36ae0fcd44c57bb03baac4cde750c17f5fc633d6e2b8c874bcd85ed980ec3b75
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "7765192ced9a40be15cb5d5ecd09e4647b36f808cf665312205b0b87976cb5f6",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2024-12-19T18:59:17Z",
"title_canon_sha256": "4998e049c23af4c78fd2e5f612dad7ae2284185f686b6fa03754a436ae679944"
},
"schema_version": "1.0",
"source": {
"id": "2412.15204",
"kind": "arxiv",
"version": 2
}
}