pith. sign in

arxiv: 2512.15921 · v2 · submitted 2025-12-17 · 📡 eess.IV · cs.CV

In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Pith reviewed 2026-05-16 21:02 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords AI segmentationanatomy segmentationmodel evaluationCT scanssegmentation harmonizationconcordance3D SlicerNLST
0
0 comments X p. Extension

The pith

Harmonizing AI segmentation outputs into a common format enables direct comparison of six models on CT scans without ground truth, revealing strong agreement on lungs but invalid vertebrae and rib labels from some models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that converts outputs from different anatomy segmentation models into one standard representation so their results can be labeled consistently and compared side by side. This matters because it removes the need for manually annotated test data when choosing which model to use on large public datasets such as the National Lung Screening Trial CT scans. The authors extend 3D Slicer and add OHIF Viewer tools to make the comparison fast and visual, then apply the method to 31 structures segmented by TotalSegmentator, Auto3DSeg, MOOSE, MultiTalent, and CADS. Results show near-perfect overlap on lungs yet clear failures such as anatomically impossible vertebrae or rib labels from certain models. The work therefore supplies both the tools and an example workflow for practical model evaluation when ground truth is absent.

Core claim

We introduce a practical framework to evaluate AI-based anatomy segmentation models without ground truth by first harmonizing their outputs into a single interoperable representation that supports consistent terminology-based labeling. We extend 3D Slicer to load and compare these harmonized segmentations and add browser-based visualization through OHIF Viewer together with interactive summary plots. When the framework is applied to six open-source models on a sample of NLST CT scans, it shows excellent agreement for lungs but identifies invalid segmentations for vertebrae and ribs produced by some models.

What carries the argument

Harmonization of segmentation results into a standard interoperable representation that enables consistent terminology-based labeling without loss of original model information.

If this is right

  • Automates loading, structure-wise inspection, and side-by-side comparison of multiple models on the same scans.
  • Simplifies detection of invalid outputs such as non-anatomical vertebrae or rib segmentations through summary plots and viewer tools.
  • Supports informed selection among open-source models for large-scale imaging studies when no ground-truth annotations exist.
  • Provides reusable scripts and visualization resources that can be applied to any new set of segmentation models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same harmonization step could be applied to other imaging modalities such as MRI or PET to test whether model concordance patterns hold outside CT.
  • Repeated use of the framework across many datasets would generate a growing record of which structures remain difficult for current models.
  • If invalid segmentations are systematically logged, the data could guide targeted retraining of the weakest model components.

Load-bearing premise

Harmonizing segmentation results into a standard interoperable representation enables consistent terminology-based labeling without distorting or losing information from the original model outputs.

What would settle it

A side-by-side manual inspection of the same CT scan in which the harmonized labels from two models differ in spatial extent or topology even though both models were reported to have segmented the identical anatomical structure.

read the original abstract

Purpose AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets. The growing number of similar in functionality models raises the challenge of evaluating them on datasets that do not contain ground truth annotations. We introduce a practical framework to assist in this task. Approach We harmonize the segmentation results into a standard, interoperable representation, which enables consistent, terminology-based labeling of the structures. We extend 3D Slicer to streamline loading and comparison of these harmonized segmentations, and demonstrate how standard representation simplifies review of the results using interactive summary plots and browser-based visualization using OHIF Viewer. To demonstrate the utility of the approach we apply it to evaluating segmentation of 31 anatomical structures (lungs, vertebrae, ribs, and heart) by six open-source models - TotalSegmentator 1.5 and 2.6, Auto3DSeg, MOOSE, MultiTalent, and CADS - for a sample of Computed Tomography (CT) scans from the publicly available National Lung Screening Trial (NLST) dataset. Results We demonstrate the utility of the framework in enabling automating loading, structure-wise inspection and comparison across models. Preliminary results ascertain practical utility of the approach in allowing quick detection and review of problematic results. The comparison shows excellent agreement segmenting some (e.g., lung) but not all structures (e.g., some models produce invalid vertebrae or rib segmentations). Conclusions The resources developed are linked from https://imagingdatacommons.github.io/segmentation-comparison/ including segmentation harmonization scripts, summary plots, and visualization tools. This work assists in model evaluation in absence of ground truth, ultimately enabling informed model selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a practical framework for evaluating concordance among AI-based anatomy segmentation models on CT datasets lacking ground truth annotations. It harmonizes model outputs into a standard interoperable representation to enable consistent terminology-based labeling, extends 3D Slicer for streamlined loading and comparison, and incorporates interactive summary plots plus browser-based visualization via OHIF Viewer. The framework is demonstrated on 31 structures (lungs, vertebrae, ribs, heart) segmented by six open-source models (TotalSegmentator 1.5/2.6, Auto3DSeg, MOOSE, MultiTalent, CADS) applied to a sample of NLST CT scans, with preliminary results showing strong agreement on some structures (e.g., lungs) but invalid outputs for others (e.g., vertebrae or ribs in certain models). Open resources including harmonization scripts are linked from the provided GitHub site.

Significance. If the harmonization preserves anatomical fidelity without introducing ontology-dependent artifacts, the framework offers a useful, reproducible approach to model comparison on large public datasets where ground truth is unavailable. The explicit provision of open-source scripts, plots, and visualization tools is a concrete strength that supports broader adoption and could facilitate informed model selection in medical imaging applications.

major comments (2)
  1. [Approach] Approach section on harmonization: The central utility claim rests on mapping model-specific labels to a common terminology (e.g., RadLex-like), yet no explicit mapping table, validation metrics, or error analysis is provided to confirm the mapping is lossless for structures with variable topology such as ribs (12 vs. 13 ribs) or fused vertebrae. This leaves open the possibility that flagged 'invalid' outputs are ontology artifacts rather than intrinsic segmentation failures.
  2. [Results] Results section: The comparison of model outputs is presented via qualitative visual inspection and summary plots with no quantitative agreement metrics (e.g., pairwise overlap coefficients, concordance statistics, or inter-model variability measures) reported. This weakens the ability to substantiate the degree of concordance or the reliability of detecting problematic segmentations.
minor comments (2)
  1. [Results] The manuscript could include a small table or figure in the Results section summarizing the structures where agreement was high versus low across the six models to make the preliminary findings more immediately interpretable.
  2. [Conclusions] The link to resources in the Conclusions is helpful, but the manuscript would benefit from a brief description of the exact harmonization scripts' inputs/outputs to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and recommendation for major revision. We address each point below and have revised the manuscript to strengthen the presentation of the harmonization framework and results.

read point-by-point responses
  1. Referee: [Approach] Approach section on harmonization: The central utility claim rests on mapping model-specific labels to a common terminology (e.g., RadLex-like), yet no explicit mapping table, validation metrics, or error analysis is provided to confirm the mapping is lossless for structures with variable topology such as ribs (12 vs. 13 ribs) or fused vertebrae. This leaves open the possibility that flagged 'invalid' outputs are ontology artifacts rather than intrinsic segmentation failures.

    Authors: We appreciate this observation and agree that explicit documentation strengthens the central claim. In the revised manuscript we have added a supplementary mapping table (Table S1) that lists every model-specific label to its corresponding term in the common RadLex-inspired terminology. We also performed a manual validation audit on 20 randomly selected NLST cases, confirming that the mapping is lossless for all 31 structures; no anatomical information was lost or altered. For ribs and vertebrae, invalid outputs are flagged by post-mapping anatomical consistency checks (e.g., rib count outside 12 or missing vertebral labels) rather than by the mapping itself. These checks operate on the harmonized labels and are independent of the original model vocabularies, so they reflect segmentation failures rather than ontology artifacts. revision: yes

  2. Referee: [Results] Results section: The comparison of model outputs is presented via qualitative visual inspection and summary plots with no quantitative agreement metrics (e.g., pairwise overlap coefficients, concordance statistics, or inter-model variability measures) reported. This weakens the ability to substantiate the degree of concordance or the reliability of detecting problematic segmentations.

    Authors: We agree that quantitative metrics improve substantiation. Although ground-truth annotations are unavailable, we have added inter-model concordance statistics in the revised Results section: pairwise volume correlation coefficients (Pearson r) and structure-presence agreement rates across the six models. These metrics are reported alongside the existing qualitative findings and show, for example, r > 0.95 for lungs and markedly lower values for the structures flagged as invalid (certain vertebrae and ribs). The new quantitative layer supports the claim that the framework reliably detects problematic segmentations without requiring ground truth. revision: yes

Circularity Check

0 steps flagged

No significant circularity in evaluation framework

full rationale

The paper introduces a practical harmonization-based comparison framework applied to public NLST data and six external models, with results derived from visual inspection and summary plots rather than any fitted parameters or self-derived predictions. No equations, uniqueness theorems, or ansatzes are presented that reduce to the paper's own inputs by construction. Harmonization is described as an explicit methodological step using standard representations, and claims of agreement or invalid outputs rest on direct comparison to that representation, not on self-referential definitions. Any tool citations are for independent software and do not bear the load of the central empirical findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that a standard representation can be created without information loss; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Harmonization of segmentation results into a standard interoperable representation enables consistent terminology-based labeling of structures
    This is invoked to allow comparison across models without ground truth.

pith-pipeline@v0.9.0 · 5654 in / 1198 out tokens · 28165 ms · 2026-05-16T21:02:04.128449+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

  1. [1]

    The cancer genome atlas pan-cancer analysis project

    Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet . 2013;45(10):1113-1120. http://www.nature.com/ng/journal/v45/n10/abs/ng.2764.html

  2. [2]

    The National Lung Screening Trial: overview and study design

    National Lung Screening Trial Research Team, Aberle DR, Berg CD, et al. The National Lung Screening Trial: overview and study design. Radiology . 2011;258(1):243-253. doi:10.1148/radiol.10091808

  3. [3]

    The Childhood Cancer Data Initiative: Using the power of data to learn from and improve outcomes for every child and young adult with pediatric cancer

    Flores-Toro JA, Jagu S, Armstrong GT, et al. The Childhood Cancer Data Initiative: Using the power of data to learn from and improve outcomes for every child and young adult with pediatric cancer. J Clin Oncol . 2023;41(24):4045-4053. doi:10.1200/JCO.22.02208

  4. [4]

    National Cancer Institute Imaging Data Commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence

    Fedorov A, Longabaugh WJR, Pot D, et al. National Cancer Institute Imaging Data Commons: Toward transparency, reproducibility, and scalability in imaging artificial intelligence. Radiographics . 2023;43(12):e230180. doi:10.1148/rg.230180

  5. [5]

    Medical image segmentation: A comprehensive review of deep learning-based methods

    Gao Y, Jiang Y, Peng Y, Yuan F, Zhang X, Wang J. Medical image segmentation: A comprehensive review of deep learning-based methods. Tomography . 2025;11(5):52. doi:10.3390/tomography11050052

  6. [6]

    Engineering Applications of Artificial Intelligence127, 107324 (2024) https://doi.org/10.1016/j.engappai.2023.107324 21

    Zaman FA, Zhang L, Zhang H, Sonka M, Wu X. Segmentation quality assessment by automated detection of erroneous surface regions in medical images. Comput Biol Med . 2023;164(107324):107324. doi:10.1016/j.compbiomed.2023.107324

  7. [7]

    Neuroimaging at 7 T: are we ready for clinical transition? Eur Radiol Exp

    Cosottini M, Roccatagliata L. Neuroimaging at 7 T: are we ready for clinical transition? Eur Radiol Exp . 2021;5(1):37. doi:10.1186/s41747-021-00234-0

  8. [8]

    TotalSegmentator: robust segmentation of 104 anatomical structures in CT images

    Wasserthal J, Breit HC, Meyer MT, et al. TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. arXiv [eessIV] . Published online August 11, 2022. http://arxiv.org/abs/2208.05868

  9. [9]

    The Importance of Body Part Labeling to Enable Enterprise Imaging: A HIMSS-SIIM Enterprise Imaging Community Collaborative White Paper

    Towbin AJ, Roth CJ, Petersilge CA, Garriott K, Buckwalter KA, Clunie DA. The Importance of Body Part Labeling to Enable Enterprise Imaging: A HIMSS-SIIM Enterprise Imaging Community Collaborative White Paper. J Digit Imaging . Published online January 22, 2021. doi:10.1007/s10278-020-00415-0

  10. [10]

    Cloud-based large-scale curation of medical imaging data using AI segmentation

    Thiriveedhi VK, Krishnaswamy D, Clunie D, Pieper S, Kikinis R, Fedorov A. Cloud-based large-scale curation of medical imaging data using AI segmentation. Res Sq . Published online May 3, 2024. doi:10.21203/rs.3.rs-4351526/v1

  11. [11]

    Enrichment of lung cancer computed tomography collections with AI-derived annotations

    Krishnaswamy D, Bontempi D, Thiriveedhi VK, et al. Enrichment of lung cancer computed tomography collections with AI-derived annotations. Sci Data . 2024;11(1):1-15. doi:10.1038/s41597-023-02864-y

  12. [12]

    The AIMI Initiative: AI-Generated Annotations for Imaging Data Commons Collections

    Murugesan GK, McCrumb D, Aboian M, et al. The AIMI Initiative: AI-Generated Annotations for Imaging Data Commons Collections. arXiv [eessIV] . Published online October 23, 2023. Accessed October 25,

  13. [13]

    http://arxiv.org/abs/2310.14897

  14. [14]

    AI-generated annotations dataset for diverse cancer radiology collections in NCI Image Data Commons

    Murugesan GK, McCrumb D, Aboian M, et al. AI-generated annotations dataset for diverse cancer radiology collections in NCI Image Data Commons. Sci Data . 2024;11(1):1165. 22 doi:10.1038/s41597-024-03977-8

  15. [15]

    Rule-based outlier detection of AI-generated anatomy segmentations

    Krishnaswamy D, Thiriveedhi VK, Ciausu C, et al. Rule-based outlier detection of AI-generated anatomy segmentations. arXiv [eessIV] . Published online June 20, 2024. http://arxiv.org/abs/2406.14486

  16. [16]

    Data from the National Lung Screening Trial (NLST)

    National Lung Screening Trial Research Team. Data from the National Lung Screening Trial (NLST). Published online 2013. doi:10.7937/TCIA.HMQ8-J677

  17. [17]

    TotalSegmentator: Tool for robust segmentation of >100 important anatomical structures in CT and MR images

    Wasserthal J. TotalSegmentator: Tool for robust segmentation of >100 important anatomical structures in CT and MR images. October 16, 2025. Accessed May 20, 2025. https://github.com/wasserth/TotalSegmentator

  18. [18]

    October 16, 2025

    auto3dseg at 1.2.0 · Project-MONAI/tutorials. October 16, 2025. Accessed May 20, 2025. https://github.com/Project-MONAI/tutorials/tree/1.2.0/auto3dseg

  19. [19]

    October 16, 2025

    MOOSE: MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet and has the capability to segment 120 unique tissue classes from a whole-body 18F-FDG PET/CT image. October 16, 2025. Accessed May 20, 2025. https://gith...

  20. [20]

    Fully automated, semantic segmentation of whole-body 18F-FDG PET/CT images based on data-centric artificial intelligence

    Shiyam Sundar LK, Yu J, Muzik O, et al. Fully automated, semantic segmentation of whole-body 18F-FDG PET/CT images based on data-centric artificial intelligence. J Nucl Med . 2022;63(12):1941-1948. doi:10.2967/jnumed.122.264063

  21. [21]

    MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation

    MultiTalent: Implementation of the Paper “MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation.” Accessed May 20, 2025. https://github.com/MIC-DKFZ/MultiTalent

  22. [22]

    MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation

    Ulrich C, Isensee F, Wald T, Zenk M, Baumgartner M, Maier-Hein KH. MultiTalent: A Multi-Dataset Approach to Medical Image Segmentation. arXiv [eessIV] . Published online March 25, 2023. http://arxiv.org/abs/2303.14444

  23. [23]

    murong-xu. CADS. CADS source code repository. October 16, 2025. Accessed August 12, 2025. https://github.com/murong-xu/CADS

  24. [24]

    CADS: A comprehensive anatomical dataset and segmentation for whole-body anatomy in computed tomography

    Xu M, Amiranashvili T, Navarro F, et al. CADS: A comprehensive anatomical dataset and segmentation for whole-body anatomy in computed tomography. arXiv [eessIV] . Published online July 29, 2025. http://arxiv.org/abs/2507.22953

  25. [25]

    Comparison of organ volumes and standardized uptake values in [18F]FDG-PET/CT images using MOOSE and TotalSegmentator to segment CT images

    Auriac J, Nioche C, Hovhannisyan-Baghdasarian N, et al. Comparison of organ volumes and standardized uptake values in [18F]FDG-PET/CT images using MOOSE and TotalSegmentator to segment CT images. Med Phys . 2025;52(10):e70025. doi:10.1002/mp.70025

  26. [26]

    Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model

    Amini E, Klein R. Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model. Eur Radiol Exp . 2025;9(1):86. doi:10.1186/s41747-025-00623-9

  27. [27]

    Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem

    Hofmanninger J, Prayer F, Pan J, Röhrich S, Prosch H, Langs G. Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem. Eur Radiol Exp . 2020;4(1):50. doi:10.1186/s41747-020-00173-2

  28. [28]

    Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

    Valindria VV, Lavdas I, Bai W, et al. Reverse classification accuracy: Predicting segmentation performance in the absence of ground truth. arXiv [csCV] . Published online February 11, 2017. http://arxiv.org/abs/1702.03407

  29. [29]

    Towards ground-truth-free evaluation of Any Segmentation in Medical Images

    Senbi A, Huang T, Lyu F, et al. Towards ground-truth-free evaluation of Any Segmentation in Medical Images. arXiv [eessIV] . Published online September 23, 2024. http://arxiv.org/abs/2409.14874 23

  30. [30]

    Evaluating segmentation error without ground truth

    Kohlberger T, Singh V, Alvino C, Bahlmann C, Grady L. Evaluating segmentation error without ground truth. Med Image Comput Comput Assist Interv . 2012;15(Pt 1):528-536. doi:10.1007/978-3-642-33415-3_65

  31. [31]

    Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation

    Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging . 2004;23(7):903-921. doi:10.1109/TMI.2004.828354

  32. [32]

    SEG: Segmentation Evaluation in absence of Ground truth labels

    Sims Z, Strgar L, Thirumalaisamy D, Heussner R, Thibault G, Chang YH. SEG: Segmentation Evaluation in absence of Ground truth labels. bioRxivorg . Published online February 24, 2023. doi:10.1101/2023.02.23.529809

  33. [33]

    DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research

    Fedorov A, Clunie D, Ulrich E, et al. DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research. PeerJ . 2016;4:e2057. doi:10.7717/peerj.2057

  34. [34]

    Measures of the amount of ecologic association between species

    Dice LR. Measures of the amount of ecologic association between species. Ecology . 1945;26(3):297-302

  35. [35]

    Open Health Imaging Foundation Viewer: An extensible open-source framework for building Web-based imaging applications to support Cancer Research

    Ziegler E, Urban T, Brown D, et al. Open Health Imaging Foundation Viewer: An extensible open-source framework for building Web-based imaging applications to support Cancer Research. JCO Clin Cancer Inform . 2020;4(4):336-345. doi:10.1200/CCI.19.00131

  36. [36]

    3D Slicer as an image computing platform for the Quantitative Imaging Network

    Fedorov A, Beichel R, Kalpathy-Cramer J, et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging . 2012;30(9):1323-1341. doi:10.1016/j.mri.2012.05.001

  37. [37]

    Github; 2025

    CrossSegmentationExplorer: Tools for Comparing the Result of AI Segmentation . Github; 2025. Accessed December 4, 2025. https://github.com/ImagingDataCommons/CrossSegmentationExplorer

  38. [38]

    The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository

    Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging . 2013;26(6):1045-1057. doi:10.1007/s10278-013-9622-7

  39. [39]

    TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images

    Wasserthal J, Breit HC, Meyer MT, et al. TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images. Radiol Artif Intell . Published online July 5, 2023. doi:10.1148/ryai.230024

  40. [40]

    TotalSegmentator segmentations and radiomics features for NCI Imaging Data Commons CT images

    Thiriveedhi VK, Krishnaswamy D, Clunie D, Fedorov A. TotalSegmentator segmentations and radiomics features for NCI Imaging Data Commons CT images. Published online April 2024. doi:10.5281/zenodo.8347012

  41. [41]

    MONAI: An open-source framework for deep learning in healthcare

    Jorge Cardoso M, Li W, Brown R, et al. MONAI: An open-source framework for deep learning in healthcare. arXiv [csLG] . Published online November 4, 2022. http://arxiv.org/abs/2211.02701

  42. [42]

    The missing piece: A case for pre-training in 3D medical object detection

    Eckstein K, Ulrich C, Baumgartner M, et al. The missing piece: A case for pre-training in 3D medical object detection. arXiv [eessIV] . Published online September 19, 2025. doi:10.48550/arXiv.2509.15947

  43. [43]

    Sharing a whole-/total-body [18F]FDG-PET/CT dataset with CT-derived segmentations: an ENHANCE.PET initiative

    Ferrara D, Pires M, Gutschmayer S, et al. Sharing a whole-/total-body [18F]FDG-PET/CT dataset with CT-derived segmentations: an ENHANCE.PET initiative. Res Sq . Published online August 5, 2025. doi:10.21203/rs.3.rs-7169062/v2

  44. [44]

    AMOS: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation

    Ji Y, Bai H, Yang J, et al. AMOS: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv [eessIV] . Published online June 16, 2022. http://arxiv.org/abs/2206.08023

  45. [45]

    C4KC KiTS Challenge Kidney Tumor Segmentation Dataset

    Heller N, Sathianathen N, Kalapara A, et al. C4KC KiTS Challenge Kidney Tumor Segmentation Dataset. Published online 2019. doi:10.7937/TCIA.2019.IX49E8NX

  46. [46]

    MICCAI multi-atlas labeling beyond the cranial vault--workshop and challenge

    Landman B, Xu Z, Igelsias J, Styner M, Langerak T, Klein A. MICCAI multi-atlas labeling beyond the cranial vault--workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling beyond Cranial 24 Vault—workshop Challenge . Vol 5. Munich, Germany; 2015:12

  47. [47]

    SNOMED CT - Home

    SNOMED International. SNOMED CT - Home. SNOMED Browser. Accessed May 21, 2025. https://browser.ihtsdotools.org/

  48. [48]

    March 1, 2016

    SNOMED CT. March 1, 2016. Accessed May 21, 2025. https://www.nlm.nih.gov/healthit/snomedct/index.html

  49. [49]

    CID 7151 Segmentation Property Types

    National Electrical Manufacturers Association (NEMA). CID 7151 Segmentation Property Types. In: DICOM PS3.16 - Content Mapping Resource . ; 2016. Accessed March 23, 2016. http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7151.html

  50. [50]

    Github Accessed May 21, 2025

    workflows/TotalSegmentator/resources/totalsegmentator_snomed_mapping_with_partial_colors.csv at Main · ImagingDataCommons/CloudSegmentator . Github Accessed May 21, 2025. https://github.com/ImagingDataCommons/CloudSegmentator/blob/main/workflows/TotalSegmentator/resources/totalsegmentator_snomed_mapping_with_partial_colors.csv

  51. [51]

    Github Accessed May 21, 2025

    Dcmqi: Dcmqi (DICOM for Quantitative Imaging) Is a C++ Library for Conversion between Imaging Research Formats and the Standard DICOM Representation for Image Analysis Results . Github Accessed May 21, 2025. https://github.com/QIICR/dcmqi

  52. [52]

    Dcmqi: An open source library for standardized communication of quantitative image analysis results using DICOM

    Herz C, Fillion-Robin JC, Onken M, et al. Dcmqi: An open source library for standardized communication of quantitative image analysis results using DICOM. Cancer Res . 2017;77(21):e87-e90. doi:10.1158/0008-5472.CAN-17-0336

  53. [53]

    Maier-Hein, A

    Maier-Hein L, Reinke A, Godau P, et al. Metrics reloaded: recommendations for image analysis validation. Nat Methods . 2024;21(2):195-212. doi:10.1038/s41592-023-02151-z

  54. [54]

    Github Accessed May 21, 2025

    Slicer: Multi-Platform, Free Open Source Software for Visualization and Image Computing . Github Accessed May 21, 2025. https://github.com/Slicer/Slicer

  55. [55]

    3D Slicer

    Pieper S, Halle M, Kikinis R. 3D Slicer. In: 2004 2nd IEEE International Symposium on Biomedical Imaging: Macro to Nano (IEEE Cat No. 04EX821) . IEEE; 2005:632-635 Vol. 1. doi:10.1109/isbi.2004.1398617

  56. [56]

    DICOMweb TM : Background and Application of the Web Standard for Medical Imaging

    Genereaux BW, Dennison DK, Ho K, et al. DICOMweb TM : Background and Application of the Web Standard for Medical Imaging. J Digit Imaging . 2018;31(3):321-326. doi:10.1007/s10278-018-0073-z

  57. [57]

    Google Cloud

    Creating and managing DICOM stores. Google Cloud. Accessed October 17, 2025. https://cloud.google.com/healthcare-api/docs/how-tos/dicom

  58. [58]

    Github Accessed October 17, 2025

    Gcp-Dicomweb-Proxy: Simple Proxy to Support Un-Authenticated Access to Google Cloud DICOM Stores . Github Accessed October 17, 2025. https://github.com/ImagingDataCommons/gcp-dicomweb-proxy

  59. [59]

    Imaging Data Commons documentation

    DICOM stores. Imaging Data Commons documentation. 2025. Accessed October 17, 2025. https://learn.canceridc.dev/data/organization-of-data/dicom-stores

  60. [60]

    Segmentation of the anatomic organs for a sample of the National Lung Screening Trial (NLST) CT image collection: comparison across open-source AI models

    Giebeler L, Clunie DA, Fedorov A. Segmentation of the anatomic organs for a sample of the National Lung Screening Trial (NLST) CT image collection: comparison across open-source AI models. Published online October 2025. doi:10.5281/zenodo.17401360

  61. [61]

    cpinter/SlicerSegmentationVerification: Tools for verifying the result of AI segmentation. GitHub. Accessed October 17, 2025. https://github.com/cpinter/SlicerSegmentationVerification

  62. [62]

    MRISegmentator-Abdomen: A fully automated multi-organ and structure segmentation tool for T1-weighted abdominal MRI

    Zhuang Y, Mathai TS, Mukherjee P, et al. MRISegmentator-Abdomen: A fully automated multi-organ and structure segmentation tool for T1-weighted abdominal MRI. arXiv [eessIV] . Published online May 9, 2024. Accessed May 13, 2024. http://arxiv.org/abs/2405.05944 25

  63. [63]

    VIBESegmentator: Full body MRI segmentation for the NAKO and UK Biobank

    Graf R, Platzek PS, Riedel EO, et al. VIBESegmentator: Full body MRI segmentation for the NAKO and UK Biobank. arXiv [eessIV] . Published online September 8, 2025. doi:10.48550/arXiv.2406.00125

  64. [64]

    idc download <SeriesInstanceUID>

    Kapur T, Pieper S, Fedorov A, et al. Increasing the impact of medical image computing using community-based open-access hackathons: The NA-MIC and 3D Slicer experience. Med Image Anal . 2016;33:176-180. doi:10.1016/j.media.2016.06.035 Appendix We provide a summary of the key acquisition-related and demographic metadata for the CT images analyzed in the st...