Pith Number

pith:WPWLP2AF

pith:2023:WPWLP2AFNH6A5CCMKTPPTQYGSQ

not attested not anchored not stored refs resolved

FinanceBench: A New Benchmark for Financial Question Answering

Anand Kannappan, Bertie Vidgen, Douwe Kiela, Nino Scherrer, Pranab Islam, Rebecca Qian

Existing LLMs fail to correctly answer or refuse 81 percent of financial questions even with retrieval support.

arxiv:2311.11944 v1 · 2023-11-20 · cs.CL · cs.AI · cs.CE · stat.ML

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{WPWLP2AFNH6A5CCMKTPPTQYGSQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GPT-4-Turbo used with a retrieval system incorrectly answered or refused to answer 81% of questions.

C2weakest assumption

The 150 sampled cases are representative of the full 10,231 questions and that all questions are ecologically valid and clear-cut as stated.

C3one line summary

FinanceBench shows state-of-the-art LLMs incorrectly answer or refuse 81% of tested financial QA cases even with retrieval augmentation.

References

18 extracted · 18 resolved · 0 Pith anchors

[1] In Findings of the Association for Computational Linguistics: ACL 2023 , pages 1298–1313, Toronto, Canada 2023

[2] Qa dataset explosion: A taxonomy of nlp resources for question answering and reading com- prehension. ACM Comput. Surv., 55(10). Julio Cesar Salinas Alvarado, Karin Verspoor, and Tim- othy Baldwin. 20 2015

[3] Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat- Seng Chua 2021

[4] fi- nacebench_id_0000

[5] A value for whether it is in the eval sample of 298 cases (‘1’), in the open source sample (‘2’) or in neither (‘0’)

Formal links

1 machine-checked theorem link

Cited by

30 papers in Pith

FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

Design and Report Benchmarks for Knowledge Work

Bridging Language Models and Financial Analysis

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

FinDocMRE: A Benchmark for Document-Level Financial Multimodal Reasoning Evaluation

Receipt and verification

First computed	2026-05-17T23:38:49.037230Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b3ecb7e80569fc0e884c54def9c306940cc8af16666c5227b5b02cc34ae29d57

Aliases

arxiv: 2311.11944 · arxiv_version: 2311.11944v1 · doi: 10.48550/arxiv.2311.11944 · pith_short_12: WPWLP2AFNH6A · pith_short_16: WPWLP2AFNH6A5CCM · pith_short_8: WPWLP2AF

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/WPWLP2AFNH6A5CCMKTPPTQYGSQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b3ecb7e80569fc0e884c54def9c306940cc8af16666c5227b5b02cc34ae29d57

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "b93fb6e2e2c745257d9732d756d1bf963b7deaf484def9c20e8cfb5acf1f0834",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CE",
      "stat.ML"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-11-20T17:28:02Z",
    "title_canon_sha256": "0065779b111e2415d7e651f7202b8f5738a55baee518adfb61aae403754ff70d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.11944",
    "kind": "arxiv",
    "version": 1
  }
}