pith. sign in
Pith Number

pith:FZBVN7JB

pith:2026:FZBVN7JBPADGNT65URBOQ44VLE
not attested not anchored not stored refs resolved

CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications

Blanca Gallego, Jamie Novak, Mathew Miller, Sze-yuan Ooi, Victoria Blake

CUICurate automates UMLS concept set curation with GraphRAG to yield larger and more complete sets than manual methods.

arxiv:2602.17949 v2 · 2026-02-20 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FZBVN7JBPADGNT65URBOQ44VLE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CUICurate produced substantially larger and more complete concept sets than the manual benchmarks. GPT-5 outperformed manual curation for all concepts and retained at least 95% of definitive gold-standard CUIs.

C2weakest assumption

The assumption that LLM-based filtering (GPT-5 and Qwen3-32B) accurately distinguishes clinically meaningful relations from noise without systematic bias or hallucination, especially for concepts not observed in the 10,000 MIMIC-III notes used for validation.

C3one line summary

CUICurate uses GraphRAG on a UMLS knowledge graph plus LLMs to generate larger, higher-recall concept sets than manual curation for five clinical concepts.

References

17 extracted · 17 resolved · 0 Pith anchors

[1] Medical Concept Normalization 2024
[2] The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis 2021
[3] Clinical Concept Value Sets and Interoperability in Health Data Analytics 2018
[4] Quickumls: a fast, unsupervised approach for medical concept extraction 2016
[5] MedCAT -- Medical Concept Annotation Tool 2019
Receipt and verification
First computed 2026-05-17T23:39:16.057378Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2e4356fd21780666cfdda442e873955914ef7c1438700699ae9523bbc878e414

Aliases

arxiv: 2602.17949 · arxiv_version: 2602.17949v2 · doi: 10.48550/arxiv.2602.17949 · pith_short_12: FZBVN7JBPADG · pith_short_16: FZBVN7JBPADGNT65 · pith_short_8: FZBVN7JB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FZBVN7JBPADGNT65URBOQ44VLE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2e4356fd21780666cfdda442e873955914ef7c1438700699ae9523bbc878e414
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4c8f61648c8fc4ffe01b40f8a68ae90647463da7237fc342166071f7c902b8fd",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-02-20T03:00:13Z",
    "title_canon_sha256": "aa5ba4e93ae3d7f6dc98f6c03eb6250e07c2164286fd7dc6dae919221b430b8a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.17949",
    "kind": "arxiv",
    "version": 2
  }
}