Using GPT-5.4 to clean labels in the CT-RATE chest CT dataset revealed 3.6% discordance with original labels, with radiologists supporting the LLM labels in 74-92% of reviewed cases.
Exploring Large -scale Public Medical Image Datasets
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
citing papers explorer
-
Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset
Using GPT-5.4 to clean labels in the CT-RATE chest CT dataset revealed 3.6% discordance with original labels, with radiologists supporting the LLM labels in 74-92% of reviewed cases.