Therearealtogether 10,053 + 1,047 = 11,100 notes in the train and test sets

Patient notes are real, deidentified patient cases they scraped from journal-published medicalpapersthatarearchivedinthePubMedCentraldatabase

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

cs.AI · 2025-12-22 · conditional · novelty 6.0

Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.

citing papers explorer

Showing 1 of 1 citing paper.

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight cs.AI · 2025-12-22 · conditional · none · ref 53
Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.

Therearealtogether 10,053 + 1,047 = 11,100 notes in the train and test sets

fields

years

verdicts

representative citing papers

citing papers explorer