Pith Number

pith:G2XA7TKE

pith:2024:G2XA7TKEYV53AO5KYTG6OUGBP5

not attested not anchored not stored refs resolved

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

Hao Peng, Jiajie Zhang, Jiazheng Xu, Jie Tang, Juanzi Li, Lei Hou, Shangqing Tu, Shulin Cao, Xiaozhi Wang, Xin Lv, Yushi Bai, Yuxiao Dong

LongBench v2 shows current LLMs score 50% on long-context reasoning tasks while reasoning models exceed the 54% human baseline.

arxiv:2412.15204 v2 · 2024-12-19 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{G2XA7TKEYV53AO5KYTG6OUGBP5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The best-performing model, when directly answers the questions, achieves only 50.1% accuracy. In contrast, the o1-preview model, which includes longer reasoning, achieves 57.7%, surpassing the human baseline by 4%.

C2weakest assumption

That the 503 questions genuinely require deep understanding and multi-step reasoning rather than being solvable through surface cues or training-data leakage, and that the 15-minute human time limit produces a fair comparison to model performance.

C3one line summary

LongBench v2 benchmark shows current LLMs underperform humans on deep long-context reasoning tasks, but extended inference-time reasoning enables surpassing the human baseline.

References

23 extracted · 23 resolved · 3 Pith anchors

[1] Agrawal, P., Craig, N., Madden, A., and Lombera, I 2024

[2] The Llama 3 Herd of Models 2021 · arXiv:2407.21783

[3] ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools 2024 · arXiv:2406.12793

[4] RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems 2024 · arXiv:2306.03091

[5] In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 11621–11640, Bangkok, Thailand 2024

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

A Survey on LLM-as-a-Judge

Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding

CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

Receipt and verification

First computed	2026-05-17T23:38:46.654233Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

36ae0fcd44c57bb03baac4cde750c17f5fc633d6e2b8c874bcd85ed980ec3b75

Aliases

arxiv: 2412.15204 · arxiv_version: 2412.15204v2 · doi: 10.48550/arxiv.2412.15204 · pith_short_12: G2XA7TKEYV53 · pith_short_16: G2XA7TKEYV53AO5K · pith_short_8: G2XA7TKE

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/G2XA7TKEYV53AO5KYTG6OUGBP5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 36ae0fcd44c57bb03baac4cde750c17f5fc633d6e2b8c874bcd85ed980ec3b75

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7765192ced9a40be15cb5d5ecd09e4647b36f808cf665312205b0b87976cb5f6",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-12-19T18:59:17Z",
    "title_canon_sha256": "4998e049c23af4c78fd2e5f612dad7ae2284185f686b6fa03754a436ae679944"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.15204",
    "kind": "arxiv",
    "version": 2
  }
}