pith:CGHDYSAH
Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction
A retrieval-augmented pipeline with schema-constrained prompts extracts structured clinical observations from transcripts at 80.36 percent F1.
arxiv:2605.15467 v1 · 2026-05-14 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CGHDYSAHUYW2JMKCBYUP4DRPMX}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our best configuration uses GPT-5.2 with full schema, RAG, and a second-pass auditing, achieving 80.36% F1 score. Overall, our results show that RAG consistently improves performance, while the optimal degree of schema constraint depends on the model, and second-pass auditing yields modest additional gains by correcting residual schema-adherence errors.
The training set can serve as an effective exemplar corpus for retrieval that meaningfully improves the model's ability to produce schema-adherent outputs when combined with prompting and post-processing.
A modular RAG pipeline with schema-constrained prompting, deterministic post-processing, and second-pass auditing reaches 80.36% F1 on observation extraction from nurse-patient transcripts using GPT-5.2.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:01:00.096241Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
118e3c4807a62da4b1420e28fe0e2f65f06037753acbd26f17eff99864afbac8
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CGHDYSAHUYW2JMKCBYUP4DRPMX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 118e3c4807a62da4b1420e28fe0e2f65f06037753acbd26f17eff99864afbac8
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "e0e00ec6e4c07ba565efa608336d1d5aa239e52aa1869561a3e75d5920b7b8d4",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-14T23:13:05Z",
"title_canon_sha256": "482151a2ab80529a188b5715f43911a7643f0dcbb7ebe8ebc6afe9b938365c42"
},
"schema_version": "1.0",
"source": {
"id": "2605.15467",
"kind": "arxiv",
"version": 1
}
}