pith. sign in
Pith Number

pith:3GI7AM4O

pith:2025:3GI7AM4OLOM4VAWUJMNPO3UUPK
not attested not anchored not stored refs resolved

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Bowen Zhou, Ermo Hua, Kaiyan Zhang, Ning Ding, Shang Qu, Xuekai Zhu, Yifei Li, Yuxin Zuo, Zhangren Chen

MedXpertQA supplies 4,460 expert-reviewed medical questions across 17 specialties to test genuine clinical reasoning in AI systems.

arxiv:2501.18362 v3 · 2025-01-30 · cs.AI · cs.CL · cs.CV · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3GI7AM4OLOM4VAWUJMNPO3UUPK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MedXpertQA provides a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning, with rigorous filtering, expert reviews, and a multimodal subset that includes diverse images and rich clinical information.

C2weakest assumption

That the selected and augmented questions, after expert review and synthesis, accurately represent genuine expert-level clinical reasoning without residual data leakage or selection bias that would inflate model performance.

C3one line summary

MedXpertQA is a new benchmark of 4,460 rigorously filtered expert medical questions, including multimodal cases with patient records and images, designed to evaluate advanced AI reasoning more stringently than prior datasets like MedQA.

References

30 extracted · 30 resolved · 1 Pith anchors

[1] org/CorpusID:268232499
[2] HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs 2019 · arXiv:2412.18925
[3] The reduction was successful, as indicated by follow-up x-rays
[4] Symptoms Post-Reduction: After 10 days, the patient demonstrates an inability to abduct the shoulder
[5] Common Cause of Inability to Abduct: Injury to the axillary nerve can cause an inability to abduct the shoulder, as it innervates the deltoid muscle

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.608492Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d991f0338e5b99ca82d44b1af76e947a9410b5aced7774061bcc00fa4bf4d067

Aliases

arxiv: 2501.18362 · arxiv_version: 2501.18362v3 · doi: 10.48550/arxiv.2501.18362 · pith_short_12: 3GI7AM4OLOM4 · pith_short_16: 3GI7AM4OLOM4VAWU · pith_short_8: 3GI7AM4O
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3GI7AM4OLOM4VAWUJMNPO3UUPK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d991f0338e5b99ca82d44b1af76e947a9410b5aced7774061bcc00fa4bf4d067
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "bfe215abe5d4bfad4b0052e221edc2f73a70a1aaa86cf2b7509c431dc793725a",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.CV",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-01-30T14:07:56Z",
    "title_canon_sha256": "1f897365171c0fd28e2297578fe1cdd416da532f3bfddceec8cebec5485dde81"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.18362",
    "kind": "arxiv",
    "version": 3
  }
}