Title resolution pending

Zhao, Z · 2023 · DOI 10.1038/s41597-023-02814-8

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification

cs.CL · 2026-06-28 · unverdicted · novelty 7.0

In a 2.51B-token LLM clinical extraction corpus, only 10.9% is trainable-unique while 79.4% is redundant from copying and duplication; de-duplication improves downstream disease recognition at fixed token budget.

citing papers explorer

Showing 1 of 1 citing paper.

How much of an LLM-generated clinical corpus is actually new? A production-scale measurement of content redundancy for provenance classification cs.CL · 2026-06-28 · unverdicted · none · ref 7
In a 2.51B-token LLM clinical extraction corpus, only 10.9% is trainable-unique while 79.4% is redundant from copying and duplication; de-duplication improves downstream disease recognition at fixed token budget.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer