pith:3GI7AM4O
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
MedXpertQA supplies 4,460 expert-reviewed medical questions across 17 specialties to test genuine clinical reasoning in AI systems.
arxiv:2501.18362 v3 · 2025-01-30 · cs.AI · cs.CL · cs.CV · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3GI7AM4OLOM4VAWUJMNPO3UUPK}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
MedXpertQA provides a highly challenging and comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning, with rigorous filtering, expert reviews, and a multimodal subset that includes diverse images and rich clinical information.
That the selected and augmented questions, after expert review and synthesis, accurately represent genuine expert-level clinical reasoning without residual data leakage or selection bias that would inflate model performance.
MedXpertQA is a new benchmark of 4,460 rigorously filtered expert medical questions, including multimodal cases with patient records and images, designed to evaluate advanced AI reasoning more stringently than prior datasets like MedQA.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.608492Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d991f0338e5b99ca82d44b1af76e947a9410b5aced7774061bcc00fa4bf4d067
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3GI7AM4OLOM4VAWUJMNPO3UUPK \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d991f0338e5b99ca82d44b1af76e947a9410b5aced7774061bcc00fa4bf4d067
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "bfe215abe5d4bfad4b0052e221edc2f73a70a1aaa86cf2b7509c431dc793725a",
"cross_cats_sorted": [
"cs.CL",
"cs.CV",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2025-01-30T14:07:56Z",
"title_canon_sha256": "1f897365171c0fd28e2297578fe1cdd416da532f3bfddceec8cebec5485dde81"
},
"schema_version": "1.0",
"source": {
"id": "2501.18362",
"kind": "arxiv",
"version": 3
}
}